If I have really long output from a command (single line) but I know I only want the first [x] (let's say 8) characters of the output, what's the easiest way to get that? There aren't any delimiters.
7 Answers
One way is to use cut:
command | cut -c1-8
This will give you the first 8 characters of each line of output. Since cut is part of POSIX, it is likely to be on most Unices.
- 45,310
- 13
- 119
- 114
-
7Note that `cut -c` selects characters; `cut -b` or `head -c` selects bytes. This makes a difference in some locales (in practice, when using UTF-8). – Gilles 'SO- stop being evil' Oct 24 '10 at 22:07
-
1You also don't have to specify the start index in this case. Saying `cut -c-8` will select from character 1 to 8. – Sparhawk May 09 '14 at 05:08
-
@Steven, `cut`'s equivalent on Windows is? – Pacerier Aug 25 '15 at 13:06
-
Also `command | dd bs=8 count=1 2>/dev/null`. Not saying it's shorter or superior. Just another alternative. – dubiousjim Sep 24 '15 at 03:50
-
1@Gilles, but note that with current versions of GNU `cut`, `cut -c` works like `cut -b` (that is, it doesn't work properly for multi-byte characters). – Stéphane Chazelas Aug 09 '16 at 13:49
These are some other ways to get only first 8 characters.
command | head -c8
command | awk '{print substr($0,1,8);exit}'
command | sed 's/^\(........\).*/\1/;q'
And if you have bash
var=$(command)
echo ${var:0:8}
- 889
- 5
- 3
-
3I think the following sed formulation is a bit easier to read: `command | sed 's/\(.\{8\}\).*/\1/'` or if your sed supports it: `command | sed -r 's/(.{8}).*/\1/'`; Otherwise, +1 – Steven D Oct 24 '10 at 04:48
-
2Good stuff, but note that `head -c` counts _bytes_, not characters. Similarly, among the major Awk implementations, only _GNU_ awk handles multi-byte characters correctly - FreeBSD Awk and Mawk do not. – mklement0 Jul 05 '15 at 17:30
Another one liner solution by using parameter expansion
echo ${word:0:x}
EG: word="Hello world"
echo ${word:0:3} or echo ${word::3}
o/p: Hel
EG.2: word="Hello world"
echo ${word:1:3}
o/p: ell
- 303
- 3
- 5
-
1You can also use a variable holding the length, e.g.: `x=8; echo ${word:0:$x}` instead of hard-coding the integer. – Cometsong Apr 25 '19 at 14:58
-
-
1@Cometsong Testing with the Bash shell that came with "Git for Windows", it looks like you don't need to prefix x with the $ sign in this case: `x=8; echo ${word:0:x}` will work the same. – AJM Mar 26 '21 at 11:10
-
If you have a sufficiently advanced shell (for example, the following will work in Bash, not sure about dash), you can do:
read -n8 -d$'\0' -r <(command)
After executing read ... <(command), your characters will be in the shell variable REPLY. Type help read to learn about other options.
Explanation: the -n8 argument to read says that we want up to 8 characters. The -d$'\0' says read until a null, rather than to a newline. This way the read will continue for 8 characters even if one of the earlier characters is a newline (but not if its a null). An alternative to -n8 -d$'\0' is to use -N8, which reads for exactly 8 characters or until the stdin reaches EOF. No delimiter is honored. That probably fits your needs better, but I don't know offhand how many shells have a read that honors -N as opposed to honoring -n and -d. Continuing with the explanation: -r says ignore \-escapes, so that, for example, we treat \\ as two characters, rather than as a single \.
Finally, we do read ... <(command) rather than command | read ... because in the second form, the read is executed in a subshell which is then immediately exited, losing the information you just read.
Another option is to do all your processing inside the subshell. For example:
$ echo abcdefghijklm | { read -n8 -d$'\0' -r; printf "REPLY=<%s>\n" "$REPLY"; }
REPLY=<abcdefgh>
- 2,648
- 19
- 27
-
1If you just want to output the 8 chars, and don't need to process them in the shell, then just use `cut`. – dubiousjim Sep 08 '12 at 14:04
-
Good to know about `read -n
`; small caveat: Bash 3.x (still current on OS) mistakenly interprets ` – mklement0 Jul 06 '15 at 01:41` as a _byte_ count and thus fails with multi-byte characters; this has been fixed in Bash 4.x. -
This is a great and useful answer. Much more general than the others. – not2qubit Oct 25 '19 at 10:08
-
On my git bash, I have the "-N" flag, which reads exactly N chars until EOF or timeout. Isn't that what you try to achieve your "d" flag ? – Itération 122442 May 03 '22 at 08:16
-
@Itération122442 yes but as I wrote "I don't know offhand how many shells have a read that honors -N as opposed to honoring -n and -d." – dubiousjim May 04 '22 at 09:43
-
This is portable:
a="$(command)" # Get the output of the command.
b="????" # as many ? as characters are needed.
echo ${a%"${a#${b}}"} # select that many chars from $a
To build a string of variable length of characters has its own question here.
I had this problem when manually generating checksum files in maven repository.
Unfortunately cut -c always prints out a newline at the end of output.
To suppress that I use xxd:
command | xxd -l$BYTES | xxd -r
It outputs exactly $BYTES bytes, unless the command's output is shorter, then exactly that output.
- 143
- 6
-
another method to take off `cut`'s trailing newline is to pip it into: `| tr -d '\n'` – Cometsong Apr 25 '19 at 15:00
How to consider Unicode + UTF-8
Let's do a quick test for those interested in Unicode characters rather than just bytes. Each character of áéíóú (acute accented vowels) is made up of two bytes in UTF-8. With:
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 awk '{print substr($0,1,3);exit}'
printf 'áéíóú' | LC_CTYPE=C awk '{print substr($0,1,3);exit}'
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 head -c3
echo
printf 'áéíóú' | LC_CTYPE=C head -c3
we get:
áéí
á
á
á
so we see that only awk + LC_CTYPE=en_US.UTF-8 considered the UTF-8 characters. The other approaches took only three bytes. We can confirm that with:
printf 'áéíóú' | LC_CTYPE=C head -c3 | hd
which gives:
00000000 c3 a1 c3 |...|
00000003
and the c3 by itself is trash, and does not show up on the terminal, so we saw only á.
awk + LC_CTYPE=en_US.UTF-8 actually returns 6 bytes however.
We could also have equivalently tested with:
printf '\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba' | LC_CTYPE=en_US.UTF-8 awk '{print substr($0,1,3);exit}'
and if you want a general parameter:
n=3
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 awk "{print substr(\$0,1,$n);exit}"
Question more specific about Unicode + UTF-8: https://superuser.com/questions/450303/unix-tool-to-output-first-n-characters-in-an-utf-8-encoded-file
Tested on Ubuntu 21.04.
- 17,176
- 4
- 113
- 99