2

I have a huge cvs log file which, cleaned from the useless info, reads something like

Working file: unmodifiedfile1.c
================
Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: unmodifiedfile2.h
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================
Working file: unmodifiedfile3.h

I would like to clean the lines related to unmodified files:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================

The pattern to match is

Working file: FILENAME
================

What i've been able to do up to now is the following:

sed '/Working file:/ N ; s/\n/PLACEHOLDER/' changelog.txt |
grep -v 'PLACEHOLDER===' |
sed 's/PLACEHOLDER/\n/ 

I'm sure however there is a cleaner solution that my sed ignorance precludes me... ( also, a bonus would be being able to erase the very latest line if necessary)

P.S.

An output ending with:

================
Working file: unmodifiedfile3.h

is also acceptable

Davide
  • 123
  • 4
  • can you add the expected output for clarity? add a separate one if it will be different for the bonus question – Sundeep Sep 08 '16 at 13:34

2 Answers2

3

sed

This should come close to what you are after:

<cvslog sed -n '/Working file/ { N; /\n=\+$/b; :a; N; /\n=\+$/!ba; p; }'

Output:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================

Explanation

Here is the same sed script with comments:

/Working file/ {
  N                 # append next line to pattern space
  /\n=\+$/b         # is it a file separator -> next file
  :a
  N                 # append next line to pattern space
  /\n=\+$/!ba       # isn't it a file separator -> read next line
  p                 # otherwise print accumulated text
}

awk

If you tell awk to use the file separator line as the record separator (RS), it becomes fairly straightforward to define a sensible selection criteria:

<cvslog awk 'NF>2' RS='\n=+\n' FS='\n' ORS='\n\n'

Output:

Working file: modifiedfile1.h  
----------------------------------
revision 1.3
Fixed some bug

Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature

bash and coreutils

Just for fun:

csplit cvslog '/=\{16\}/1' '{*}'
wc -l xx* | 
head -n-1 | 
while read n f; do 
  if (( n > 2 )); then 
    cat $f
  fi
done

Output:

Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================
Thor
  • 16,942
  • 3
  • 52
  • 69
1
sed '/Working file:/ N ; s/\n/PLACEHOLDER/' changelog.txt |
grep -v 'PLACEHOLDER===' |
sed 's/PLACEHOLDER/\n/ 

can indeed be shortened to:

$ sed '/Working file:/{N;/===/d}' changelog.txt 
Working file: modifiedfile1.h
----------------------------------
revision 1.3
Fixed some bug
================
Working file: modifiedfile2.h
----------------------------------
revision 1.1
Added some feature
================
Working file: unmodifiedfile3.h


  • To remove all lines containing Working file: and following line if it contains === as well as final line if it contains Working file:

Thanks @ilkkachu for the suggestion. If the pattern needs to be matched at beginning of line, use ^Working file:

$ cat ip.txt 
Working file: 123
================
Working file: f1
----------------------------------
revision 1.3
Fixed some bug
================
Working file: abc
================
Working file: file
----------------------------------
revision 1.1
Added some feature
================
Working file: xyz

$ sed '/Working file:/{N;/===/d}' ip.txt | sed '${/Working file:/d}' 
Working file: f1
----------------------------------
revision 1.3
Fixed some bug
================
Working file: file
----------------------------------
revision 1.1
Added some feature
================
Sundeep
  • 11,753
  • 2
  • 26
  • 57
  • Your initial sed solution was exactly what I was after. However the tweaks using 'unmodified' are wrong, the filenames are not made like that, I wasn't probably clear enough. – Davide Sep 12 '16 at 10:56
  • @Davide, I would then suggested to undo this answer as accepted, and edit your question with a better example... that way it helps others trying to solve a similar problem – Sundeep Sep 12 '16 at 11:00
  • 1
    I think you could use `${/^Working file/d};` in the beginning of the `sed` to remove the final line if it starts with `Working file`. I'd put the beginning-of-line anchor to the other `/Working file/` pattern, too, though the sed would still match that if the final line of an actual commit message started with `Working file`... – ilkkachu Sep 12 '16 at 11:12