1

I have an html page that contains two tables.

Both tables begin with the same tag "<table role="grid">" and I want to display the second table code.

For now, I know only display the first with:

sed -n '/<table role=\"grid\">/,/<\/table>/p' page.html

How would you do?

Dominique
  • 295
  • 1
  • 2
  • 8
Body
  • 11
  • 1
  • See http://unix.stackexchange.com/questions/6389/parse-html-on-linux and especially http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – cas Sep 27 '15 at 23:00

2 Answers2

1

This works, but I'm sure there must be a simpler solution:

sed -n '/<table role=\"grid\">/{
 x
 /^$/b
 x
:loop
 p
 /<\/table>/q
 n
 b loop
}'

When matching table the line is exchanged (x) with the hold space, and the old hold contents are compared. They will be empty (/^$/) first time, so we branch (b) to the end of the script. The next time, the hold will not be empty (it has the 1st table line). So we undo the exchange (x) and start a loop where the line is printed (p), until the end of table is matched when we quit (q). Each time we get the next line (n) and branch back to the loop label.

It's simpler in awk:

awk '/<table role=\"grid\">/,/<\/table>/ { if(n==1)print }
     /<\/table>/ { n++ }'
meuh
  • 49,672
  • 2
  • 52
  • 114
1

I'd use perl:

perl -ne 'if(/<table role="grid">/){$i++;$k=1} 
          if($i==2 && $k==1){print} 
          if(/<\/table>/){$k=0;}' file

Explanation

  • perl -ne : read the input file line by line and apply the script given by -e to each line.
  • if(/<table role="grid">/){$i++;$k=1} : if this line matches <table role="grid">, add 1 to the value of $i and set $k to 1.
  • if($i==2 && $k==1){print} : if the current value of $i is 2 and that of $k is 1 (so, if we are between a <table role="grid"> and a </table> and if this is the second time that <table role="grid"> has been seen), print the current line.
  • if(/<\/table>/){$k=0;} : set $k back to 0 if this line matches </table>.
terdon
  • 234,489
  • 66
  • 447
  • 667