Is it beneficial to artificially prime the buffer cache when dealing with larger files?
Here's a scenario: A large file needs to be processed, line by line. Conceptually, it is easy to parallelize the task to saturate a multi-core machine. However, since the the lines need to be read first (before distributing them to workers round-robin), the overall process becomes IO-bound and therefore slower.
Is it reasonable to read all or portions of the file into the buffer cache in advance to have faster read times, when the actual processing occurs?
Update: I wrote a small front-end to the readahead system call. Trying to add some benchmark later...