How to print last_name as well in .txt file?

Question

I have this file on a Linux machine:

<names>
<first_name>Mohammed Sani</first_name>
<last_name>ABACHA</last_name>
<aliases>
<alias>ABACHE,Mohammed Sani</alias>
<alias>SANI,Mohammed</alias>
</aliases>
<low_quality_aliases>
<alias xsi:nil="true"/>
</low_quality_aliases>
<alternative_spelling xsi:nil="true"/>
</names>

I am using the command below to print the names, but it only prints the first name:

sed -n 's:.*<first_name>\(.*\)</first_name>.*:\1:p' 'test.xml' > name.txt

How can I append the last name as well?

Does this answer your question? [Extract an attribute value from XML](https://unix.stackexchange.com/questions/529670/extract-an-attribute-value-from-xml) — Panki, Aug 18 '21 at 10:52
Do you just want to _output_ the first and last names, or do you want to do something else with them? — Kusalananda, Aug 18 '21 at 11:08

Kusalananda · Answer 1 · 2021-08-19T05:46:50.557

Assuming you want the first and last name data on the same line, with a tab character in-between them:

Using xmlstarlet:

xmlstarlet sel -t -m '/names' \
    -v 'first_name' -nl \
    -v 'last_name' -nl file.xml 2>/dev/null |
paste - -

The xmlstarlet command parses out the values of the first_name and last_name nodes under the names node, and outputs these on one line each.

The two lines of output is pasted onto a single line with a tab character as delimiter using paste. Use e.g. -d ',' with paste to get comma-delimited output.

I'm redirecting the standard error stream to /dev/null because there are some bogus namespace declarations later on in the document that xmlstarlet rightly complains about.

Using xq from https://kislyuk.github.io/yq/:

xq -r '.names | [ .first_name, .last_name ] | @tsv' file.xml

This uses the @tsv operator to create tab-delimited output. It outputs the same data as the xmlstarlet code above, but instead of an XPath expression, we're using a jq expression.

Change @tsv to @csv to get fully quoted CSV output instead.

Philippos · Answer 2 · 2021-08-18T14:28:06.870

You can either add a second s command:

sed -n 's:.*<first_name>\(.*\)</first_name>.*:\1:p;s:.*<last_name>\(.*\)</last_name>.*:\1:p' 'test.xml' > name.txt

or use an extended regular expression:

sed -En 's:.*<(first|last)_name>(.*)</\1_name>.*:\2:p' 'test.xml' > name.txt

Update: Request to output both names in the same line

To have the output on the same line, you can simply pipe it through another script to join lines with a whitespace:

sed -En 's:.*<(first|last)_name>(.*)</\1_name>.*:\2:p' test.xml | sed 'H;1h;$!d;g;s/\n/ /g' > name.txt

The H,1h;$1d;g is used to join all lines in the pattern space (H appends all lines to the hold space, 1h overwrites the hold space for the first line to avoid a preceding newline, $!d stops processing for all but the the last line and g move the hold space contents to the pattern space), then s/\n/ /g replaces all newlines with white spaces; in your case you could drop the g if you are sure there will be always only two lines.

On linux, you probably have GNU sed and could do sed -z 's/\n/ /g' for the same result.

More elegantly, and capable of dealing with multiple name pairs in one file, you could also do something like

sed -e '/.*<first_name>\(.*\)<\/first_name>.*/{s//\1/;h;}' -e '/.*<last_name>\(.*\)<\/last_name>.*/!d;s//\1/;H;g;s/\n/ /' 'test.xml' > name.txt

Thanks, @Philippos, but it's printing the last name in the next line.Can we have in same line? — Shashi Shanker, Aug 18 '21 at 11:19
@ShashiShanker please edit your question and add your expected output so we don't need to guess what you need. — terdon, Aug 18 '21 at 13:32
@ShashiShanker I did update the answer to have the names in the same line. — Philippos, Aug 19 '21 at 07:49

How to print last_name as well in .txt file?

2 Answers2