There are a lot of specific cases in your request.
- Files actually outside a git-managed directory.
- Your
TheFile fits this case.
- Files inside a directory managed by Git, with some
.git marker.
.git is not always a directory. It can be a file as well, with a path to the real GIT_DIR. We can further break these files down as follows:
- Known files, those present in the Git index.
- Ignored files, those files matching a pattern per
gitignore(5):
.gitignore
$HOME/.config/git/ignore
$GIT_DIR/info/exclude
- Files under an actual
$GIT_DIR directory, but NOT part of the repo.
.git/hooks are the most likely
- Could also be malware
So the most reliable case, is going to be generating TWO lists, relative to your given base directory $D, and comparing them (be sure to sort them and remove duplicates beforehand).
I can't think of a reliable way to generate the sub-list for 2.3 above, so I leave that as an open problem (I'd love to know about it, because I've lost hooks before).
Shell script to list known files per 2.1 above:
for g in $(find $D -name .git) ; do
echo $g
p=${g%/.git} g2=`readlink -f $g` ;
( cd $p && GIT_DIR=$g2 \
git ls-files --exclude-standard --full-name ) \
| sed "s,^,${p}/,g" ;
done > list-2.1
Shell script to list ignored files per 2.2 above:
for g in $(find $D -name .git) ; do
p=${g%/.git} g2=`readlink -f $g` ;
( cd $p && GIT_DIR=$g2 \
git ls-files \
--others -i --exclude-standard ) \
| sed "s,^,${p}/,g" ;
done > list-2.2
Shell script to list files per 2.3 above:
TODO > list-2.3
Shell script to process the lists and find what's not on side B:
comm -23 <(find $D ! -type d |sort) <(sort 2.1 2.2 2.3 | uniq)