0

I have a lot of .gz files in a folder

/a/b/c1.gz
/a/b/c2.gz
/a/b/c3.gz

and so on.

Some of the files have a single pipe delimiter, some have two, three and four and so on, in such a way:

xyz|abc
xyz|abc|wty
xyz|abc|wty|asd

and so on.

How do I find all the files that have two pipe delimiters overall, three delimiters overall etc ?

Pierre.Vriens
  • 1,088
  • 21
  • 13
  • 16

3 Answers3

1

Assuming that in any given file, the number of |-delimited columns is constant, then it's enough to inspect the first line of a file to determine the number of columns in it.

The following will do that for a file called name.gz:

gzip -dc name.gz | awk -F '|' -v name="name.gz" '{ print NF, name } { exit }'

So, with a simple loop, you would be able to output the number of columns and the filenames of, e.g., all files matching the pattern /a/b/c*.gz:

for name in /a/b/c*.gz; do
    gzip -dc "$name" |
    awk -F '|' -v name="$name" '{ print NF, name } { exit }'
done

If you want to only output the names of the files with a certain number of columns (n=3, for example), then use

n=3
for name in /a/b/c*.gz; do
    gzip -dc "$name" |
    awk -F '|' -v n="$n" -v name="$name" 'NF == n { print name } { exit }'
done
Kusalananda
  • 320,670
  • 36
  • 633
  • 936
0

You can pipe file names to awk and find number of |-s in each. For example: echo 'A|B|C' |awk -F\| '{print NF-1}'

kosolapyj
  • 16
  • 1
0

Let's create three test files:

echo 'xyz|abc' > c1
echo 'xyz|abc|wty' > c2
echo 'xyz|abc|wty|asd' > c3
gzip c*

Files containing one pipe in a line:

$ zgrep '^[^|]*|[^|]*$' *.gz
c1.gz:xyz|abc

For any other numbers (including one pipe in a line), you can use the following pattern:

Two pipes in a line:

$ zgrep -E '^([^|]*\|){2}[^|]*$' *.gz
c2.gz:xyz|abc|wty

Three pipes in a line:

$ zgrep -E '^([^|]*\|){3}[^|]*$' *.gz
c3.gz:xyz|abc|wty|asd

Two or three pipes in a line:

$ zgrep -E '^([^|]*\|){2,3}[^|]*$' *.gz
c2.gz:xyz|abc|wty
c3.gz:xyz|abc|wty|asd

Max. three pipes in a line:

$ zgrep -E '^([^|]*\|){,3}[^|]*$' *.gz
c1.gz:xyz|abc
c2.gz:xyz|abc|wty
c3.gz:xyz|abc|wty|asd

If you only need the filename, add option -l, i.e. zgrep -lE ...


My zgrep version doesn't support the recursive -r option.

You could use find for a recusive search and run zgrep on the result:

$ find . -type f -name '*.gz' -exec zgrep -lE '^([^|]*\|){3}[^|]*$' {} \;
./c3.gz
Freddy
  • 25,172
  • 1
  • 21
  • 60