The 5GB files I have are streams of data rows formed:
{datarow1...},{datarow2...},...,{datarowN...}
so actually could say that there are lines {}, and even line separators, but coming as a three char sequence: },{
I want to do two things:
print "
lines" that have string"error"in it:grep -o -P {[^{}]+?error.+?} ES01.log > ES01.err.logmake the file more "friendly" by explicitly producing files with new line separators
<ES01.log sed -e 's/},{/}\n{/g' > ESnl01.log
While the above works for relatively small files (up to ~100MB), my files are unfortunately a lot bigger therefore hitting the memory problems here:
grep: memory exhausted
sed: couldn't re-allocate memory
as both grep and sed try to read/process files line by line which in this case (no separators) leads to loading whole files into memory.
Any idea how to approach this using some another smart one-liner?