1

I am trying to match pattern for a particular column in the thousands of gzipped files on a Linux machine, and based on the match I want to print the file name how to do that. Below options are not working for me , any suggestions please. Thanks

zgrep 12345 *|  awk -F"^" '{if($8==12345) print}' 
find . -type f |xargs zcat |  awk -F"^" '{if($8==12345) print}' 
terdon
  • 234,489
  • 66
  • 447
  • 667
Forever Learner
  • 729
  • 1
  • 10
  • 17
  • 1
    Doesn't the first one, `zgrep 12345 *| awk -F"^" '{if($8==12345) print}'` include the file name in the output? You should be seeing something like `file1.gz:^1^2^3^4^5^6^7^12345^9`. Do you not? Are you maybe testing it on just a single file? – terdon Sep 24 '20 at 16:07
  • terdon: ohh, you are right. my bad :( sorry. Just realized after trying on multiple files. – Forever Learner Sep 24 '20 at 16:17

1 Answers1

3

Clearest/simplest IMHO is:

while IFS= read -r fname; do
    zcat "$fname" | awk -F'^' -v fname="$fname" '$8==12345{print fname, $0}'
done < <(find . -type f)

but there's also the option of printing the file name from zgrep and reading it with awk which may be more efficient (but relies on the file name not containing any :s):

zgrep -H '12345' * |
awk -F'^' '{fname=$0; sub(/:.*/,"",fname); sub(/[^:]+:/,"")} $8==12345{print fname, $0}'

Both solutions assume you don't have newlines in your file names and the first one also assumes no escape sequences like \t in your file names.

Ed Morton
  • 28,789
  • 5
  • 20
  • 47