Erasing 2-lines pattern with sed/grep/whatever

Question

I have a huge cvs log file which, cleaned from the useless info, reads something like

Working file: unmodifiedfile1.c
================
Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: unmodifiedfile2.h
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================
Working file: unmodifiedfile3.h

I would like to clean the lines related to unmodified files:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================

The pattern to match is

Working file: FILENAME
================

What i've been able to do up to now is the following:

sed '/Working file:/ N ; s/\n/PLACEHOLDER/' changelog.txt |
grep -v 'PLACEHOLDER===' |
sed 's/PLACEHOLDER/\n/

I'm sure however there is a cleaner solution that my sed ignorance precludes me... ( also, a bonus would be being able to erase the very latest line if necessary)

P.S.

An output ending with:

================
Working file: unmodifiedfile3.h

is also acceptable

can you add the expected output for clarity? add a separate one if it will be different for the bonus question — Sundeep, Sep 08 '16 at 13:34

Thor · Answer 1 · 2016-09-08T23:40:01.217

sed

This should come close to what you are after:

<cvslog sed -n '/Working file/ { N; /\n=\+$/b; :a; N; /\n=\+$/!ba; p; }'

Output:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================

Explanation

Here is the same sed script with comments:

/Working file/ {
  N                 # append next line to pattern space
  /\n=\+$/b         # is it a file separator -> next file
  :a
  N                 # append next line to pattern space
  /\n=\+$/!ba       # isn't it a file separator -> read next line
  p                 # otherwise print accumulated text
}

awk

If you tell awk to use the file separator line as the record separator (RS), it becomes fairly straightforward to define a sensible selection criteria:

<cvslog awk 'NF>2' RS='\n=+\n' FS='\n' ORS='\n\n'

Output:

Working file: modifiedfile1.h  
----------------------------------
revision 1.3
Fixed some bug

Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature

bash and coreutils

Just for fun:

csplit cvslog '/=\{16\}/1' '{*}'
wc -l xx* | 
head -n-1 | 
while read n f; do 
  if (( n > 2 )); then 
    cat $f
  fi
done

Output:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================

Very good answer, but for the time being I accepted the other one as to me the sed script is easier to understand — Davide, Sep 12 '16 at 10:58

Sundeep · Accepted Answer · 2016-09-12T11:35:05.103

sed '/Working file:/ N ; s/\n/PLACEHOLDER/' changelog.txt |
grep -v 'PLACEHOLDER===' |
sed 's/PLACEHOLDER/\n/

can indeed be shortened to:

$ sed '/Working file:/{N;/===/d}' changelog.txt 
Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================
Working file: unmodifiedfile3.h

To remove all lines containing Working file: and following line if it contains === as well as final line if it contains Working file:

Thanks @ilkkachu for the suggestion. If the pattern needs to be matched at beginning of line, use ^Working file:

$ cat ip.txt 
Working file: 123
================
Working file: f1
----------------------------------
revision 1.3
Fixed some bug
================
Working file: abc
================
Working file: file
----------------------------------
revision 1.1
Added some feature
================
Working file: xyz

$ sed '/Working file:/{N;/===/d}' ip.txt | sed '${/Working file:/d}' 
Working file: f1
----------------------------------
revision 1.3
Fixed some bug
================
Working file: file
----------------------------------
revision 1.1
Added some feature
================

Your initial sed solution was exactly what I was after. However the tweaks using 'unmodified' are wrong, the filenames are not made like that, I wasn't probably clear enough. — Davide, Sep 12 '16 at 10:56
@Davide, I would then suggested to undo this answer as accepted, and edit your question with a better example... that way it helps others trying to solve a similar problem — Sundeep, Sep 12 '16 at 11:00
I think you could use `${/^Working file/d};` in the beginning of the `sed` to remove the final line if it starts with `Working file`. I'd put the beginning-of-line anchor to the other `/Working file/` pattern, too, though the sed would still match that if the final line of an actual commit message started with `Working file`... — ilkkachu, Sep 12 '16 at 11:12

Erasing 2-lines pattern with sed/grep/whatever

2 Answers2

sed

Explanation

awk

bash and coreutils