50

NOTE: This question is the complement of this Q&A: How to "grep" for line length *not* in a given range?


I need to get only the lines from a textfile (a wordlist, separated with newline) that has a length range of minimum or equal than 3 characters, but not longer or equal than 10.

Example:

INPUT:

egyezményét
megkíván
ki
alma
kevesen
meghatározó

OUTPUT:

megkíván
alma
kevesen

Question: How can I do this in bash?

agc
  • 7,045
  • 3
  • 23
  • 53

5 Answers5

74
grep -x '.\{3,10\}'

where

  • -x (also --line-regexp with GNU grep) match pattern to whole line
  • . any single character
  • \{3,10\} quantify from 3 to 10 times previous symbol (in the case any ones)
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Costas
  • 14,806
  • 20
  • 36
16

Using grep -E:

grep -E '^.{3,10}$'

This matches lines consisting of between three and 10 characters.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
repzero
  • 484
  • 5
  • 13
7

Using awk (and assuming that it is an implementation that is locale-aware, such as GNU awk, so that lines with multi-byte characters that are shorter than three characters, like "Ők", are not matched):

LC_ALL=hu_HU.UTF-8 awk 'length >= 3 && length <= 10' file

The length statement would return the length of $0 (the current record/line) by default, and this is used by the code to test wether the line's length is within the given range. If a test like this has no corresponding action block, then the default action is to print the record.

Testing on the given data:

$ LC_ALL=hu_HU.UTF-8 awk 'length >= 3 && length <= 10' file
megkíván
alma
kevesen

Similarly with Perl:

$ LC_ALL=hu_HU.UTF-8 perl -C -lne '$l=length($_); print if ($l >= 3 && $l <= 10)' file
megkíván
alma
kevesen
Kusalananda
  • 320,670
  • 36
  • 633
  • 936
  • Excellent solution, much better to remember than the dark magic that is regex. – Hashim Aziz Nov 27 '20 at 02:45
  • FYI, a line with `Ők` would be matched if that `Ő` was expressed as `O\u030B`. See `grep -Px '\X{3,10}'` with greps with PCRE support to match on grapheme clusters (that's still different from matching based on display width) – Stéphane Chazelas Feb 07 '23 at 09:25
2

Using sed:

sed '/^.\{3,10\}$/!d'

Or, with GNU sed or compatible:

sed -r '/^.{3,10}$/!d'

Though nowadays, you'd use -E instead of -r as that's the one that is going to be specified by POSIX (and already made its way to most sed implementations including GNU sed).

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
agc
  • 7,045
  • 3
  • 23
  • 53
0

I think this will be useful to someone. By extension, if you want to match a specific string within a line that is no longer than say 255 characters, this would be a solution.

Usage: looking for a string but wanting to exclude long lines like minified JS files which you didn't write or don't need

grep -x '.\{1,255\}theStringIWant.\{1,255\}'

A bit of a hack as you can't really control the length on both ends to be no more than a certain number (it could be 1 and 255, 255 and 1, or 255 and 255) but this works in most cases to exclude minified long lines

BASH NEWBIE TIP: the \ backslashes are escape characters for the {} braces.

Example/Proof:

echo "aaaalocalStoragebbbbccccdd" | grep -x '.\{3,10\}localStorage.\{3,10\}' #works
echo "aaaalocalStoragebbbbccccdddd" | grep -x '.\{3,10\}localStorage.\{3,10\}' #doesn't work, dddd puts end string to 12 chars
Oliver Williams
  • 1,325
  • 2
  • 14
  • 21
  • That matches on lines that are up to 524 characters long and also mandate at least one character be on either side of that string. Would make more sense to do `grep -Ev '.{256}' | grep theStringIWant` or `awk 'length < 256 && /theStringIWant/'` – Stéphane Chazelas Feb 07 '23 at 08:23
  • \ is not escape for `{`. It's that `\{x,y\}` is the regex operator in BRE while `{`, and `}` matches literally. In `grep -E '\{'`,\ escapes the { as in ERE, `{` is a regexp operator and \ prevents it (escapes it) from being taken as a special operator. – Stéphane Chazelas Feb 07 '23 at 08:25