How to time grep commands accurately?

Question

I want to compare the speed of these two commands:

grep pattern1 files* 
grep pattern2 files*

Unfortunately, the first grep reads much of files* into memory buffers, so the second grep runs very quickly, but for the wrong reason.

How do I tell Linux (Fedora 11): "please stop caching disk reads because I'm testing something."

There's probably a smarter answer... but you could duplicate the directory structure, so you won't be dealing with the same file and you won't have caching problems! — nico, Mar 01 '11 at 15:06
As an aside: Fedora 11 reached end-of-life in June 2010. It's time to upgrade. The upcoming Fedora 15 release looks really nice. Or, if you need something more stable over a longer lifespan (and it sounds like you might since you're still on 11), there's RHEL6 or any-day-now CentOS 6. — mattdm, Mar 01 '11 at 15:17
It took me forever to upgrade from RH 7.3 to that! Upgrades break things and scare me. — , Mar 01 '11 at 15:48
By turning off caching you'll benchmark not the speed of pattern matching, but the speed of your drive. As others suggesting--just run the first command two times: first to prime the cache, second to benchmark. — alex, Mar 01 '11 at 15:53
I'll try it, but my main problem is the disk speed... the hard drive goes nuts when I run the grep. Hmmm, ok, so that may mean that optimizing the grep may not help at all... I need to optimize the amount of data I'm pulling. — , Mar 01 '11 at 16:01

score 10 · Accepted Answer · answered Mar 01 '11 at 15:15

10

I don't think you can, easily, tell it "temporarily stop caching". But what you can do is tell the system to drop the cache before each run:

As root:

sync; echo 3 > /proc/sys/vm/drop_caches

(This is documented in the kernel docs at Documentation/sysctl/vm.txt, which is handy if like some of us you can't always remember offhand what the values 1, 2, or 3 do.)

Or alternately, of course, prime the cache and compare the cached performance. (I think both are useful numbers.)

answered Mar 01 '11 at 15:15

mattdm

39,535
18
99
133

1

`echo 1` will only drop the page cache, not any disk caches. – jsbillings Mar 01 '11 at 15:17
@jsbillings — er, yes. Fixed. – mattdm Mar 01 '11 at 15:18
Unbelievably minor nitpicking: I had to do ">>", not ">" – Mar 01 '11 at 15:51
@barrycarter: really? huh! – mattdm Mar 01 '11 at 16:08
3

@barrycarter: you probably have set -o noclobber in your shell, which makes it so it won't let you use > to overwrite an existing file. – jsbillings Mar 01 '11 at 17:18
@jsbillings You're right. I guess tcsh doesn't realize that /proc/* isn't a real file, which makes sense. – Mar 02 '11 at 04:31

score 0 · Answer 2 · answered Mar 01 '11 at 15:24

0

When timing things like this I usually run it first to prime the cache. Then run the command using time. In testing something like this you should be more concerned about CPU and elapsed times, and less concerned about I/O time.

In any case it is difficult to get fully accurate timings. If the input files exceed the size of memory available for buffers, then you will likely end up cycling all the files through buffer cache. Otherwise, you may just access all the data from buffer cache. In real life, there is often a mix of buffered data and data read from disk.

answered Mar 01 '11 at 15:24

BillThor

8,887
22
27

IRL, I run this command only occasionally, so the files* contents are never cached. I'm trying to optimize the grep to run fast in that situation. When files* contents are already in the cache, it runs in under a second (no point in optimizing that, since the output is intended for end user) – Mar 01 '11 at 15:49
2

@barrycarter. If the files aren't cached, and it runs in under a second when they are, then I don't think you will find much opportunity for optimization. Moving the files to faster storage would be the likely optimization. – BillThor Mar 01 '11 at 16:08

How to time grep commands accurately?

2 Answers2

Linked