I am getting output from a program that first produces one line that is a bunch of column headers, and then a bunch of lines of data. I want to cut various columns of this output and view it sorted according to various columns. Without the headers, the cutting and sorting is easily accomplished via the -k option to sort along with cut or awk to view a subset of the columns. However, this method of sorting mixes the column headers in with the rest of the lines of output. Is there an easy way to keep the headers at the top?
-
1I came across the following [link](http://gadelkareem.com/2008/03/07/sort-unix-processes-on-ps-by-highest-memory-usage/). However, I can't get this technique of `{ head -1; sort; }` to work. It always deletes a bunch of the text after the first line. Does anyone know why this happens? – jonderry Apr 23 '11 at 01:02
-
2I suspect it's because `head` is reading more than one line into a buffer and throwing most of it away. My `sed` idea had the same problem. – Andy Apr 23 '11 at 01:09
-
@jonderry - that technique only works with `lseek`able input so it won't work when reading from a pipe. It will work if you redirect to a file `>outfile` and then run `{ head -n 1; sort; }
– don_crissti Sep 26 '15 at 13:40 -
@jonderry I wonder if a specific line ending is observed in your particular tool. Some "Windows" command line tools are still coded for text processing of Linux line endings – Sun Feb 04 '20 at 03:25
16 Answers
Stealing Andy's idea and making it a function so it's easier to use:
# print the header (the first line of input)
# and then run the specified command on the body (the rest of the input)
# use it in a pipeline, e.g. ps | body grep somepattern
body() {
IFS= read -r header
printf '%s\n' "$header"
"$@"
}
Now I can do:
$ ps -o pid,comm | body sort -k2
PID COMMAND
24759 bash
31276 bash
31032 less
31177 less
31020 man
31167 man
...
$ ps -o pid,comm | body grep less
PID COMMAND
31032 less
31177 less
-
`ps -C COMMAND` may be more appropriate than `grep COMMAND`, but it's just an example. Also, you can't use `-C` if you also used another selection option such as `-U`. – Mikel Apr 23 '11 at 00:51
-
Or maybe it should be called `body`? As in `body sort` or `body grep`. Thoughts? – Mikel Apr 23 '11 at 00:57
-
I tried `read` in this form first, but noticed that it was eating leading whitespace. Making this a function is a good idea. +1 – Andy Apr 23 '11 at 01:01
-
4Renamed from `header` to `body`, because you're doing the action on the body. Hopefully that makes more sense. – Mikel Apr 23 '11 at 01:02
-
-
3Remember to call `body` on all subsequent pipeline participants: `ps -o pid,comm | body grep less | body sort -k1nr` – bishop Nov 07 '16 at 20:02
-
Can you modify the function so that it can act not only on pipes but on files,e.g. `body sort -k2 foo` and not just `cat foo|body sort -k2` – Tim Sep 03 '17 at 09:53
-
2
-
Cool stuff! Just as it was mentioned by others each next command in the pipe should be "body"ed: `. . . | body cmd1 | body cmd2`. Also it can be used in rare cases when the header contains more than 1 lines (for example mysql outputs in table format): `msql -t -e "..." | body body body ...` – jsxt May 06 '21 at 13:51
-
I've just realized that avoiding multiple `body` per each command can be reached with `eval`. For example: `ps | body "grep firefox | sort"` is a bit simpler than `ps | body grep firefox | body sort` and is still working. It's just needed to replace `"$@"` with `eval "$@"` in the function suggested by @Mikel. – jsxt May 07 '21 at 15:27
-
1Slight side note: I know this is a generic solution, but I just wanted to point out that the `ps` command has the ability to sort (at least in some versions). You can do `ps -o pid,comm --sort comm` and it'll sort by that column. Also `--sort -comm` will sort in reverse order. – HerbCSO Aug 23 '22 at 22:13
-
This is like [keep-header from tsv-utils](https://github.com/eBay/tsv-utils#keep-header) but portable. – Beni Cherniavsky-Paskin Jun 20 '23 at 09:28
You can keep the header at the top like this with bash:
command | (read -r; printf "%s\n" "$REPLY"; sort)
Or do it with perl:
command | perl -e 'print scalar (<>); print sort { ... } <>'
- 2,839
- 23
- 14
-
1(read;...) seems to lose the spacing between the fields of the header for me. Any suggestions? – jonderry Apr 23 '11 at 01:17
-
-
@Mikel: OK, changing to `IFS=` didn't fix this problem. However, changing to `printf '%s\n' "$REPLY"` fixed it for this approach. I haven't noticed an effect from setting `IFS`. What is this fixing? – jonderry Apr 23 '11 at 01:33
-
@jonderry: Any spaces at the start of the line. Without `IFS`, leading spaces are stripped out. With `IFS=`, the line is printed verbatim. – Mikel Apr 23 '11 at 01:49
-
3`IFS=` disables word splitting when reading the input. I don't think it's necessary when reading to `$REPLY`. `echo` will expand backslash escapes if `xpg_echo` is set (not the default); `printf` is safer in that case. `echo $REPLY` without quotes will condense whitespace; I think `echo "$REPLY"` should be okay. `read -r` is needed if the input may contain backslash escapes. Some of this might depend on bash version. – Andy Apr 23 '11 at 01:50
-
1@Andy: Wow, you're right, different rules for `read REPLY; echo $REPLY` (strips leading spaces) and `read; echo $REPLY` (doesn't). – Mikel Apr 23 '11 at 02:44
-
1@Andy: IIRC, the default value of `xpg_echo` depends on your system, e.g. on Solaris I think it defaults to true. This is why Gilles likes `printf` so much: it's the only thing with predictable behavior. – Mikel Apr 23 '11 at 02:47
-
Great solution; in POSIX-features-only shells, use `IFS= read -r l; printf '%s\n' "$l"`, since `read` always requires a variable argument there. – mklement0 May 02 '14 at 14:06
-
@MartinThoma It just means the command you want to sort, it isn't actually `command` but any command that produces output you want to sort. e.g. `ps -o pid,comm` would be used as the command. – Elijah Lynn Sep 29 '17 at 19:11
I found a nice awk version that works nicely in scripts:
awk 'NR == 1; NR > 1 {print $0 | "sort -n"}'
- 541
- 4
- 8
-
4I like this, but it requires a bit of explanation - the pipe is inside the awk script. How does that work? Is it calling the `sort` command externally? Does anyone know of at least a link to a page explaining pipe use within awk? – Wildcard Nov 07 '15 at 01:24
-
@Wildcard you can check the official manual page or [this primer](https://en.wikibooks.org/wiki/An_Awk_Primer/Output_Redirection_and_Pipes). – lapo Nov 02 '16 at 19:52
-
This code fails when I use these arguments to `sort`: `sort -n -k 2b,2 -t $'\t'`. The problem is nesting `'\t'` inside `'NR...{print...}'`. The explanation of how to escape the `'`s is [here](https://stackoverflow.com/questions/9899001/how-to-escape-a-single-quote-inside-awk) – Josh Mar 28 '20 at 17:30
-
For fixed-width output, use the `-b` option, as it will make `sort` ignore leading blanks in the sort key. The default field separator is non-blank-to-blank transitions, so fields will start with leading blanks. For example, this command lists installed Python packages first by location, then by package name: `pip list -v | awk 'NR <= 2; NR > 2 { print $0 | "sort -b -k 3,3 -k 1,1" };'` – aparkerlue May 13 '21 at 16:38
-
Note, pipes inside `awk` may need to be followed by `close("sort --exact-args...")` to prevent buffering from printing this after later prints. – Excalibur Dec 29 '21 at 18:31
Hackish but effective: prepend 0 to all header lines and 1 to all other lines before sorting. Strip the prefix after sorting.
… |
awk '{print (NR <= 2 ? "0 " : "1 ") $0}' |
sort -k 1 -k… |
cut -b 3-
- 807,993
- 194
- 1,674
- 2,175
The pee command from moreutils is designed for tasks like this.
Example:
To keep one header line, and sort the second (numeric) column in stdin:
<your command> | pee 'head -n 1' 'tail -n +2 | sort -k 2,2 -n'
Explanation:
pee : pipe stdin to one or more commands and concatenate the results.
head -n 1 : Print the first line of stdin.
tail -n +2 : Print the second and following lines from stdin.
sort -k 2,2 -n : Numerically sort by the second column.
Test:
printf "header\na 1\nc 3\nb 2\n" | pee 'head -n 1' 'tail -n +2 | sort -k 2,2 -n'
gives
header
a 1
b 2
c 3
- 91
- 1
- 3
-
1This is a great solution because it's easily memorizable: I just have to remember `pee` and then use regular commands I already know like `head` or `sort`. That also makes it easily adaptable to other use cases. Thanks a lot! – Jens Bannmann Jun 03 '23 at 08:03
Here's some magic perl line noise that you can pipe your output through to sort everything but keep the first line at the top: perl -e 'print scalar <>, sort <>;'
- 5,398
- 6
- 29
- 23
I think this is easiest.
ps -ef | ( head -n 1 ; sort )
or this which is possibly faster as it does not create a sub shell
ps -ef | { head -n 1 ; sort ; }
Other cool uses
shuffle lines after header row
cat file.txt | ( head -n 1 ; shuf )
reverse lines after header row
cat file.txt | ( head -n 1 ; tac )
- 79,330
- 30
- 216
- 245
- 69
- 2
-
2See http://unix.stackexchange.com/questions/11856/sort-but-keep-header-line-at-the-top#comment15824_11856. This is not actually a good solution. – Wildcard Nov 06 '15 at 21:43
-
3Not working, `cat file | { head -n 1 ; sort ; } > file2` only show head – Peter Krauss Jul 06 '18 at 19:19
I tried the command | {head -1; sort; } solution and can confirm that it really screws things up--head reads in multiple lines from the pipe, then outputs just the first one. So the rest of the output, that head did not read, is passed to sort--NOT the rest of the output starting from line 2!
The result is that you are missing lines (and one partial line!) that were in the beginning of your command output (except you still have the first line) - a fact that is easy to confirm by adding a pipe to wc at the end of the above pipeline - but that is extraordinarily difficult to trace down if you don't know this! I spent at least 20 minutes trying to work out why I had a partial line (first 100 bytes or so cut off) in my output before solving it.
What I ended up doing, which worked beautifully and didn't require running the command twice, was:
myfile=$(mktemp)
whatever command you want to run > $myfile
head -1 $myfile
sed 1d $myfile | sort
rm $myfile
If you need to put the output into a file, you can modify this to:
myfile=$(mktemp)
whatever command you want to run > $myfile
head -1 $myfile > outputfile
sed 1d $myfile | sort >> outputfile
rm $myfile
- 35,316
- 26
- 130
- 258
-
You can use ksh93's `head` builtin or the `line` utility (on systems that still have one) or `gnu-sed -u q` or `IFS=read -r line; printf '%s\n' "$line"`, that read the input one byte at a time to avoid that. – Stéphane Chazelas Jan 11 '18 at 21:58
Simple and straightforward!
<command> | head -n 1; <command> | sed 1d | sort <....>
- sed nd ---> 'n' specifies line no., and 'd' stands for delete.
- 95
- 3
-
1Just as jofel commented a year and a half ago on Sarva's answer, this starts `command` twice. So not really suitable for use in a pipeline. – Wildcard Nov 06 '15 at 02:36
I came here looking for a solution for the command w. This command shows details of who is logged in and what they are doing.
To show the results sorted, but with the headers kept at the top (there are 2 lines of headers), I settled on:
w | head -n 2; w | tail -n +3 | sort
Obviously this runs the command w twice and therefore may not be suitable for all situations. However, to its advantage it is substantially easier to remember.
Note that the tail -n +3 means 'show all lines from the 3rd onwards' (see man tail for details).
- 101
- 2
Using Raku (formerly known as Perl_6)
~$ raku -e '.put for "\x0061".."\x07A";' | raku -e 'put get; .put for lines.sort.reverse.head(10);'
#OR
~$ raku -e '.put for "\x0061".."\x07A";' | raku -e 'put lines[0]; .put for lines[1..*].sort.reverse.head(10);'
Sample Input: English alphabet, one letter per line
Sample Output (truncated to first 10 lines via .head(10):
a
z
y
x
w
v
u
t
s
r
q
Answering this to complement Perl answers already posted. The put get call 1. 'gets' a single line and out-'puts' it, then 2. advances the read cursor so the first line isn't read again (e.g. by lines). If you need to read a 2-line header (for example), use (put get) xx 2.
When sorting a file, sometimes you want to filter a little first--an example is removing blank lines. That's easy with Raku, simply interpose a call to .map({$_ if .chars}) after the call to lines (and before the call to sort).
A nice advantage of Raku is built-in, high-level support for Unicode. A Cyrillic alphabet equivalent of the Raku code at top is as follows:
~$ raku -e '.put for "\x0430".."\x044F";' | raku -e 'put get; .put for lines.sort.reverse.head(10);'
OR, taking input off the command line:
~$ raku -e '.put for "\x0430".."\x044F";' > Cyrillic.txt
~$ raku -e 'put lines[0]; .put for lines[1..*].sort.reverse.head(10);' Cyrillic.txt
Sample Output (either Cyrillic example above):
а
я
ю
э
ь
ы
ъ
щ
ш
ч
ц
See URLs below for further discussion on the Raku/Perl6 mailing list regarding how to translate Perl(5) file-input idioms into Raku.
https://www.nntp.perl.org/group/perl.perl6.users/2018/11/msg6295.html
https://www.nntp.perl.org/group/perl.perl6.users/2019/07/msg6825.html
- 2,385
- 8
- 16
Expanding on @Mikel's answer, here is a version of the body() function that adds a few features:
It detects if there is input coming in on a pipe, and if not prints out usage information to STDERR.
If no command is given, it uses
sortas the default.If the first parameter is a number, it uses that number as the number of header lines (default 1)
In testing, it works on Linux bash and macOS zsh
I made a gist at github: https://gist.github.com/alanhoyle/7ec6bd445a790b62567d8b1ff6941c66
Thus:
body() {
local HEADER_LINES=1
local COMMAND="sort"
if [ -t 0 ]; then
>&2 echo "ERROR: body requires piped input!"
>&2 echo "body: prints the header from a STDIN and sends the 'body' to another command for"
>&2 echo " additional processing. Useful for sort/grep when you want to keep headers"
>&2 echo "USAGE: COMMAND | body [ N ] [ COMMAND_TO_PROCESS_OUTPUT ]"
>&2 echo " if the first parameter N is a whole number, it prints that number of lines"
>&2 echo " before proceeding [ default: skip $HEADER_LINES ]"
>&2 echo " if the [ COMMAND_TO PROCESS_OUTPUT ] is omitted, '$COMMAND' is used"
return 1
fi
local re='^[0-9]+$'
if [[ $1 =~ $re ]] ; then
HEADER_LINES=$1
shift
>&2 echo "body: skipping $HEADER_LINES"
fi
local THIS_COMMAND=$@
if [ -z "$THIS_COMMAND" ] ; then
>&2 echo "body: running default $COMMAND"
fi
for line in $(eval echo "{1..$HEADER_LINES}")
do
IFS= read -r header
printf '%s\n' "$header"
done
if [ -z "$THIS_COMMAND" ] ; then
( $COMMAND )
else
"$@"
fi
}
Example:
$ body
ERROR: body requires piped input!
body: prints the header from a STDIN and sends the 'body' to another command for
additional processing. Useful for sort/grep when you want to keep headers
USAGE: COMMAND | body [ N ] [ COMMAND_TO_PROCESS_OUTPUT ]
if the first parameter N is a whole number, it prints that number of lines
before proceeding [ default: skip 1 ]
if the [ COMMAND_TO PROCESS_OUTPUT ] is omitted, 'sort' is used
$ echo -e "header\n30\n33\n20"
header
30
33
20
$ echo -e "header\n30\n33\n20" | body
body: running sort by default
header
20
30
33
$ echo -e "header\n30\n33\n20" | body grep 0
header
30
20
$ echo -e "header\n30\n33\n20" | body 2
body: skipping 2
body: running sort by default
header
30
20
33
- 213
- 1
- 2
- 6
Basically, you need something that reads one line and only one line from the input and outputs it and then leave the rest of the input to sort.
There are quite a few utilities that can read one line and print it:
head -n 1sed qawk '{print; exit}'
But most implementations of those read their input in chunks, and will generally end up reading more than one line. On seekable input, they're able to rewind upon exit to just after the first line, but they can't do that on pipes or other non-seekable input.
You need an utility that give you a guarantee they don't read past the end of the first line. The options are:
line: that used to be a standard utility but was obsoleted by POSIX on the ground that it was redundant with thereadbuiltin ofsh. That read lines one byte at a time, and output it. It was always outputting a line, even when there was none or a non-delimited one on input.sed -u q: somesedimplementations support a-uoption for unbuffered and some of those that support it, with it also read their input one byte at a time. You also need asedimplementation that doesn't read one line in advance when the$address is not used. Which probably doesn't leave many implementations besides GNUsed. GNUsedalso outputs a full line if the input only had a non-delimited line.IFS= read -r line: that reads up to one line and is guaranteed not read past the end of the line. Except for zsh'sreadbuiltin, it can't cope with NUL bytes. It doesn't print the line it has read, but you can useprintffor that. Withzsh,read -rereads the line andechoes it; it adds a newline character if missing on input.
So your best bet in sh-like shells would be:
sort_body() (
if IFS= read -r line; then
printf '%s\n' "$line" &&
exec sort "$@"
else # no input or only a non-delimited header line
printf %s "$line"
# no point in running sort as there's no input left
fi
)
Then:
cmd | sort_body -nk1,1 ..
<file sort_body -u
(not sort_body -u file, the thing to sort has to be passed on sort_body's stdin).
- 522,931
- 91
- 1,010
- 1,501
If those are CSVs or TSVs (or more see manual), that sounds like a job for mlr (miller).
Like with a file looking like:
$ cat /usr/share/distro-info/debian.csv
version,codename,series,created,release,eol,eol-lts,eol-elts
1.1,Buzz,buzz,1993-08-16,1996-06-17,1997-06-05
1.2,Rex,rex,1996-06-17,1996-12-12,1998-06-05
1.3,Bo,bo,1996-12-12,1997-06-05,1999-03-09
2.0,Hamm,hamm,1997-06-05,1998-07-24,2000-03-09
2.1,Slink,slink,1998-07-24,1999-03-09,2000-10-30
2.2,Potato,potato,1999-03-09,2000-08-15,2003-07-30
[...]
$ mlr --ragged --csv cut -f codename,created then sort -f codename /usr/share/distro-info/debian.csv
codename,created
Bo,1996-12-12
Bookworm,2021-08-14
Bullseye,2019-07-06
Buster,2017-06-17
Buzz,1993-08-16
Etch,2005-06-06
[...]
That is, the order is not only preserved, but the field names in there can also be used in the cut or sort specifications.
- 522,931
- 91
- 1,010
- 1,501
command | head -1; command | tail -n +2 | sort
- 1
-
4This starts `command` two times. Therefore it is limited to some specific commands. However, for the requested `ps` command in the example, it would work. – jofel May 20 '14 at 12:00
Try doing:
wc -l file_name | tail -n $(awk '{print $1-1}') file_name | sort