head, tail, and then line?

Question

What is the simplest way to extract from a file a line given by its number. E.g., I want the 666th line of somefile. How would you do this in your terminal, or in a shell script?

I can see solutions like head -n 666 somefile | tail -n 1, or even the half-incorrect cat -n somefile | grep -F 666, but there must be something nicer, faster, and more robust. Maybe using a more obscure unix command/utility?

Related question: http://stackoverflow.com/q/12182910/1331399 — Thor, Sep 09 '15 at 12:59
It really doesn't get much faster or robust than the head/tail approach. My Perl solution is as fast or slightly faster in some cases (but the inverse will probably be true in others). The only "nicer" one will be the `awk 'NR==666` but that, while shorter, is significantly slower. — terdon, Sep 09 '15 at 14:37
@phs Please post your 'PS' as comments to the individual answerers. It has nothing to do with the question content and should not be part of one. Apart from that it triggered a spurious reopen review cycle. — Anthon, Sep 10 '15 at 11:25

dr_ · Answer 1 · 2015-09-09T13:21:53.447

23

sed (stream editor) is the right tool for this kind of job:

sed -n '666p' somefile

Edit: @tachomi's solution sed '666q;d' somefile is better when operating on a huge text file, because it makes sed exit after printing the pattern without reading the rest of the file. On all other files, the difference is irrelevant.

edited Sep 09 '15 at 13:21

answered Sep 09 '15 at 12:43

dr_

28,763
21
89
133

@dr01: Thanks! I only use `sed` for replacements, e.g. `sed -e "s/../../g"`, sometimes with regular expressions, and always felt that `sed`'s manual and its full list of commands was too painful for me. So here `-n` is don't echo, `666` is an address, and `p` is `print`? – phs Sep 09 '15 at 12:52
1

Yes. `sed` is a powerful tool, it is worth learning all its options. – dr_ Sep 09 '15 at 12:55
`sed -n '666{p;q}' somefile` unless your `sed` dialect won't accept that and requires `sed -n -e '10{p' -e 'q}' somefile` This allows you to quit early without the conceptual dissonance of "deleting" the lines that you don't want printed. It's merely a stylistic alternative. – Dennis Williamson Sep 09 '15 at 22:40

tachomi · Answer 2 · 2015-09-09T22:52:20.627

18

You can use sed

sed -n '666p' somefile

Or

sed '666!d' somefile

Or in large files

sed '666q;d' somefile

In bash script

#!/usr/bin/bash
line=666
sed "$line"'q;d' somefile

edited Sep 09 '15 at 22:52

answered Sep 09 '15 at 12:45

tachomi

7,372
4
25
45

Why distinguish between large file and non-large file when they're a couple chars different? – Nick T Sep 09 '15 at 16:40
`/usr/bin/bash` is a decidedly non-standard location. You on Arch Linux or something? O.o – muru Sep 09 '15 at 19:33
The redirection on the very last line is unnecessary. – Dennis Williamson Sep 09 '15 at 22:41

cuonglm · Answer 3 · 2015-09-09T13:19:21.537

7

POSIXly (and maybe the fastest with huge file):

tail -n +666 | head -n1

edited Sep 09 '15 at 13:19

answered Sep 09 '15 at 12:46

cuonglm

150,973
38
327
406

Why the downvote? – cuonglm Sep 09 '15 at 12:48
This solution has already been suggested by the OP and correctly discarded as it uses two processes. – dr_ Sep 09 '15 at 12:48
1

@dr01: No, the OP use `head` then `tail`, which is very different from `tail` then `head`. – cuonglm Sep 09 '15 at 12:49
Didn't downvote, but isn't the `sed` command POSIX compliant too? – Eric Renouf Sep 09 '15 at 12:50
It doesn't matter; there's no need to use two commands when you can use one. – dr_ Sep 09 '15 at 12:50
3

@dr01: Run your `sed` and mine with the file a huge file, and you can see the different. Yours is even worse than tachomy, since you read the rest of the file instead of quitting after hit the line. Read [this](http://unix.stackexchange.com/questions/47407/cat-line-x-to-line-y-on-a-huge-file) for more details. – cuonglm Sep 09 '15 at 12:51
@EricRenouf: The `sed` is compliant. – cuonglm Sep 09 '15 at 12:53
1

@cuonglm Good point. +1 for speed. – dr_ Sep 09 '15 at 13:01
I tested this on a ~6M and a ~848M file and my Perl approach was faster on both. Also, the OP's approach was exactly as fast as yours, why do you say that tail/head will be better than head/tail? – terdon Sep 09 '15 at 14:35
1

@terdon: You can see the same issue and benchmark [here](http://unix.stackexchange.com/a/47423/38906). – cuonglm Sep 09 '15 at 15:51
Hmm. I don't see the same result here. My guess is that it will depend on the size of the file and how close to the end the desired line is. Sometimes head/tail will be faster and other times tail/head will. – terdon Sep 09 '15 at 16:02

Archemar · Answer 4 · 2015-09-09T14:48:52.660

6

try

awk 'NR == 666 { print ; exit ; } '

or

awk -vline=$LINE 'NR == line { print ;  exit ; } ' 
awk 'NR == '$LINE' { print ; exit ;  } '

if you want to provide line number via a shell variable ($LINE) .

e[dx]it: as per terdon suggestion.

edited Sep 09 '15 at 14:48

answered Sep 09 '15 at 12:52

Archemar

31,183
18
69
104

1

You don't need that. Remember that awk treats any statement evaluating to true as a call to `print`. That's why `1;` will print the line. All you need is `awk 'NR==666`. – terdon Sep 09 '15 at 14:20
That wasn't my suggestion, it was yours! My suggestion was `awk 'NR==666` which is as slow as your original but shorter. The `exit;` makes all the difference! – terdon Sep 09 '15 at 14:49
the exit in perl make me think of an exit in awk. – Archemar Sep 09 '15 at 14:51

terdon · Answer 5 · 2015-09-09T14:48:54.997

2

A Perl way:

perl -ne 'print && exit if $.==666' file

I tested by creating a file with the numbers from 1 to 999999. On this file, the Perl solution above and awk with exit are the fastest of those mentioned so far:

$ perl -le 'print for 1..999999' > file

$ time perl -ne 'print && exit if $.==666' file
666

real    0m0.004s
user    0m0.000s
sys     0m0.000s

$ time awk 'NR==666 { print ; exit ; } ' file
666

real    0m0.004s
user    0m0.000s
sys     0m0.000s

$ time tail -n +666 file | head -n1
666

real    0m0.021s
user    0m0.004s
sys     0m0.000s

$ time sed -n '666p' file
666

real    0m0.125s
user    0m0.112s
sys     0m0.012s

$ time awk 'NR==666' file
666

real    0m0.161s
user    0m0.156s
sys     0m0.000s

That said, your original solution of head -n666 file | tail -n1 is also blindingly fast, very robust and completely portable. Why do you think it's not?

edited Sep 09 '15 at 14:48

answered Sep 09 '15 at 14:24

terdon

234,489
66
447
667

can you time `awk 'NR==666 { print ; exit ; } ' `? I guess it would be as fast as perl. – Archemar Sep 09 '15 at 14:46
@Archemar of course! You're quite right, that one is as fast as Perl. Add it to your answer. – terdon Sep 09 '15 at 14:47
@terdon: Yes `head -n666` is fast on your huge file because `head` stops reading after 666 lines and because 666 is much smaller than 999999. Still, that solution will have `head` output a lot of garbage for `tail` to read and dismiss. – phs Sep 09 '15 at 15:22
@phs good point about 666 being smaller. My Perl approach is significantly slower when told to print line 999999. However, the `head/tail` or `tail/head` is still fast as anything, portable and efficient. I understand the aesthetic considerations and it would be nicer not to print needless lines but this all happens in the background and it is still blindingly fast. I'd stick with it. – terdon Sep 09 '15 at 15:27
1

Run `{ head -n 665 >/dev/null; head -n 1; } – don_crissti Sep 09 '15 at 18:03
@don_crissti: That's great!! So many neat tricks popping up... – phs Sep 09 '15 at 19:30

head, tail, and then line?

5 Answers5