print all matches or replace all strings in a BIG file which is NOT line organised (no line separators)

Question

The 5GB files I have are streams of data rows formed:

    {datarow1...},{datarow2...},...,{datarowN...}

so actually could say that there are lines {}, and even line separators, but coming as a three char sequence: },{

I want to do two things:

print "lines" that have string "error" in it:

grep -o -P {[^{}]+?error.+?} ES01.log > ES01.err.log

make the file more "friendly" by explicitly producing files with new line separators
```
<ES01.log sed -e 's/},{/}\n{/g' > ESnl01.log
```

While the above works for relatively small files (up to ~100MB), my files are unfortunately a lot bigger therefore hitting the memory problems here:

    grep: memory exhausted
    sed: couldn't re-allocate memory

as both grep and sed try to read/process files line by line which in this case (no separators) leads to loading whole files into memory.

Any idea how to approach this using some another smart one-liner?

Stéphane Chazelas · Answer 1 · 2014-02-08T17:27:56.053

2

With gawk:

gawk -v 'RS=},{' '{sub(",", "\n", RT); printf "%s", $0 RT}' < file

perl equivalent:

perl -pe 'BEGIN{$/="},{"}; s/\,{$/\n{/' < file

Otherwise, POSIXly:

tr , '\n' < file | awk '{
  if (/^{/ && e) print ""
  printf "%s", $0
  if (/}$/) e=1
  else {e=0; printf ","}}
  END {print ""}'

Pipe those to grep error to see the records with errors, and to paste -sd, - to restore to original format.

edited Feb 08 '14 at 17:27

answered Feb 08 '14 at 16:42

Stéphane Chazelas

522,931
91
1,010
1,501

terdon · Answer 2 · 2014-02-08T18:42:18.857

1

You could also do this in Perl:

perl -ne 'BEGIN{$/="},{"} chomp; 
          s/\n$//; s/^{//; s/}$//; 
          print "{$_}\n"; ' k

This is the same principle as the gawk one that StephaneChazelas suggested, in Perl, $/ is the record separator, so we set that to },{ to read the records correctly and then print them with newlines.

You could easily expand this to do both of the operations you ask for:

perl -i -ne 'BEGIN{$/="},{"}
             chomp; 
             s/\n$//; s/^{//; s/}$//; print "{$_}\n"; 
             print STDERR "{$_}\n" if /error/' file 2> ES01.err.log

edited Feb 08 '14 at 18:42

answered Feb 08 '14 at 17:13

terdon

234,489
66
447
667

@StephaneChazelas ah, yes, so it does. Sod it then, I'll just leave the perl one. – terdon Feb 08 '14 at 17:18
Note that it adds an extra `{` to the first record and `}\n` to the last one. – Stéphane Chazelas Feb 08 '14 at 17:21
@StephaneChazelas this one should work correctly but not really worth it anymore, too complex. Your awk is much better. – terdon Feb 08 '14 at 18:43

score 0 · Answer 3 · edited Apr 13 '17 at 12:36

If you are willing to try a program that is probably not yet installed on your system, try gsar, explained in this answer to the same problem.

gsar is a search and (optionally) replace utility that operates on binary files. It cannot however search with regular expressions.

This command:

gsar '-s},{' '-r}:x0A{' ES01.log > ESnl01.log

replaces the comma between }{ with a newline character, reading from ES01.log and redirecting output to ESnl01.log.

The search (-s) and replacement (-r) strings do not be of the same length.

score 0 · Answer 4 · answered Oct 19 '14 at 11:53

0

You could do this simply through Perl using regex.

perl -pe 's/(?<=}),(?=\{)/\n/g' file

answered Oct 19 '14 at 11:53

Avinash Raj

3,653
4
20
34

print all matches or replace all strings in a BIG file which is NOT line organised (no line separators)

4 Answers4