3

I have a directory tree containing many text-files. I would like to index the full-text of all these files (ignoring files with certain file-extensions), so that I can quickly search thorugh all of them.

I do not want to index my whole Home directory, or the whole system. I just want to index this particular directory.

The index should update continuously, automatically detecting changes in the files inside.

What tool can I use for this?

a06e
  • 1,627
  • 4
  • 24
  • 31

1 Answers1

1

Except for the requirement "automatically detecting changes in the files", this can be done by GNU id-utils. This provides a tool called mkid which builds a binary database file called ID, that is used by the lid query tool, and others.

Id-utils is geared toward programming; it recognizes file types by suffixes, according to a configurable id-lang.map file. For each type that it supports, it has a separate scanner so that the tokens from the file are properly handled. There is a fallback scanner for text files, which mkid uses, I think, as a fallback for unrecognized file types. I think if you direct mkid toward a blank id-lang.map file that is blank, it will use the text scanner for everything.

mkid indexes trees reasonably quickly, and the queries are lightning fast.

I have integrated it with Vim for source code browsing; it's more convenient than using a separate tool like cscope. With these two settings:

:set grepprg=lid\ --regex\ --result=grep\ '$*'\ \\\|\ sort\ -u\ -t\ :\ -k\ 1,1\ -k\ 2,2n
:set grepformat=%f:%l:%m

I can use the ID database as the basis for Vim's :grep command. E.g. :grep foo brings up a navigable list of all the locations where foo is found.

The sort step in the above grepprg definition is required because lid puts out matches in funny orders. It's doing hashing internally or something.

Kaz
  • 7,676
  • 1
  • 25
  • 46