-2

I have this file on a Linux machine:

<names>
<first_name>Mohammed Sani</first_name>
<last_name>ABACHA</last_name>
<aliases>
<alias>ABACHE,Mohammed Sani</alias>
<alias>SANI,Mohammed</alias>
</aliases>
<low_quality_aliases>
<alias xsi:nil="true"/>
</low_quality_aliases>
<alternative_spelling xsi:nil="true"/>
</names>

I am using the command below to print the names, but it only prints the first name:

sed -n 's:.*<first_name>\(.*\)</first_name>.*:\1:p' 'test.xml' > name.txt

How can I append the last name as well?

terdon
  • 234,489
  • 66
  • 447
  • 667

2 Answers2

2

Assuming you want the first and last name data on the same line, with a tab character in-between them:

Using xmlstarlet:

xmlstarlet sel -t -m '/names' \
    -v 'first_name' -nl \
    -v 'last_name' -nl file.xml 2>/dev/null |
paste - -

The xmlstarlet command parses out the values of the first_name and last_name nodes under the names node, and outputs these on one line each.

The two lines of output is pasted onto a single line with a tab character as delimiter using paste. Use e.g. -d ',' with paste to get comma-delimited output.

I'm redirecting the standard error stream to /dev/null because there are some bogus namespace declarations later on in the document that xmlstarlet rightly complains about.


Using xq from https://kislyuk.github.io/yq/:

xq -r '.names | [ .first_name, .last_name ] | @tsv' file.xml

This uses the @tsv operator to create tab-delimited output. It outputs the same data as the xmlstarlet code above, but instead of an XPath expression, we're using a jq expression.

Change @tsv to @csv to get fully quoted CSV output instead.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
-1

You can either add a second s command:

sed -n 's:.*<first_name>\(.*\)</first_name>.*:\1:p;s:.*<last_name>\(.*\)</last_name>.*:\1:p' 'test.xml' > name.txt

or use an extended regular expression:

sed -En 's:.*<(first|last)_name>(.*)</\1_name>.*:\2:p' 'test.xml' > name.txt

Update: Request to output both names in the same line

To have the output on the same line, you can simply pipe it through another script to join lines with a whitespace:

sed -En 's:.*<(first|last)_name>(.*)</\1_name>.*:\2:p' test.xml | sed 'H;1h;$!d;g;s/\n/ /g' > name.txt

The H,1h;$1d;g is used to join all lines in the pattern space (H appends all lines to the hold space, 1h overwrites the hold space for the first line to avoid a preceding newline, $!d stops processing for all but the the last line and g move the hold space contents to the pattern space), then s/\n/ /g replaces all newlines with white spaces; in your case you could drop the g if you are sure there will be always only two lines.

On linux, you probably have GNU sed and could do sed -z 's/\n/ /g' for the same result.

More elegantly, and capable of dealing with multiple name pairs in one file, you could also do something like

sed -e '/.*<first_name>\(.*\)<\/first_name>.*/{s//\1/;h;}' -e '/.*<last_name>\(.*\)<\/last_name>.*/!d;s//\1/;H;g;s/\n/ /' 'test.xml' > name.txt
Philippos
  • 13,237
  • 2
  • 37
  • 76