18

I am running the following 2 sed commands. The first one adds newline characters where I want them, the second also adds newline characters where I want them, BUT also adds an extra one at the end of the file where there wasn't one before.

sed -e 's|\<LIST_G_STATEMENT>|&\
|g' ${XMLDIR}/statement_tmp_1.xml > ${XMLDIR}/statement_tmp_2.xml

sed -e 's|\</LIST_G_STATEMENT>|&\
|g' ${XMLDIR}/statement_tmp_2.xml > ${XMLDIR}/statement_tmp_3.xml

Using od -c on all 3 of the files gives the following output.

statement_tmp_1.xml (no \n at end of file)

1314700    T   A   T   E   M   E   N   T   >   <   /   L   I   S   T   _
1314720    G   _   S   T   A   T   E   M   E   N   T   >   <   /   G   _
1314740    S   E   T   U   P   >   <   /   L   I   S   T   _   G   _   S
1314760    E   T   U   P   >   <   /   A   R   X   S   G   P   O   >
1314777

statement_tmp_2.xml (no \n at end of file)

1314700    S   T   A   T   E   M   E   N   T   >   <   /   L   I   S   T
1314720    _   G   _   S   T   A   T   E   M   E   N   T   >   <   /   G
1314740    _   S   E   T   U   P   >   <   /   L   I   S   T   _   G   _
1314760    S   E   T   U   P   >   <   /   A   R   X   S   G   P   O   >
1315000

statement_tmp_3.xml (\n at end of file - where did it come from?)

1314700    S   T   A   T   E   M   E   N   T   >   <   /   L   I   S   T
1314720    _   G   _   S   T   A   T   E   M   E   N   T   >  \n   <   /
1314740    G   _   S   E   T   U   P   >   <   /   L   I   S   T   _   G
1314760    _   S   E   T   U   P   >   <   /   A   R   X   S   G   P   O
1315000    >  \n
1315002

I am running AIX 5.3

Basically, I either want it to stop adding the extra \n, or find a way of removing it.

Anthon
  • 78,313
  • 42
  • 165
  • 222
jonnohudski
  • 181
  • 1
  • 1
  • 3
  • Just a question: why are you using a literal newline in your substitution pattern when you could have used `s|...|&\n|` just as well? – Joseph R. Nov 04 '13 at 10:59
  • 1
    @JosephR. `\n` in the right hand side is not portable. – Stéphane Chazelas Nov 04 '13 at 11:03
  • @StephaneChazelas That's weird. Is it a CR vs CRLF thing? – Joseph R. Nov 04 '13 at 11:05
  • 2
    A file which doesn't end in a newline character is not a text file, so the behaviour with text utilities on them is _unspecified_. Use `perl` or other tool that can deal with binary data. – Stéphane Chazelas Nov 04 '13 at 11:06
  • 4
    @JosephR. No, `\` is the traditional and POSIX way to add a LF character. `\n` would typically substitute a `n` character in anything but GNU `sed`. – Stéphane Chazelas Nov 04 '13 at 11:07
  • @JosephR, it was taken from another forum. I think I tried the \n, which works on the Linux side, but doesn't appear to work in AIX – jonnohudski Nov 04 '13 at 11:15
  • I don't know how to make your AIX sed behave, but `head -c -1` can bite off the final newline. – wingedsubmariner Nov 04 '13 at 12:50
  • @wingedsubmariner, neither `-c` nor negative numbers are POSIX. [AIX head](http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.cmds/doc/aixcmds3/head.htm) supports `-c` as an extension, but not negative numbers. – Stéphane Chazelas Nov 04 '13 at 13:43
  • `sed` and other text utilities work on [text files](http://unix.stackexchange.com/questions/18743/whats-the-point-in-adding-a-new-line-to-the-end-of-a-file/18789#18789). A non-empty file that doesn't end in a newline is not a text file. Why are you trying to work with a non-text file? – Gilles 'SO- stop being evil' Nov 04 '13 at 21:52

3 Answers3

13

You should consider yourself lucky that AIX sed added that missing newline characters.

A non-empty file that doesn't end in a newline character is not a text file (at least as per the POSIX definition of a text file) as a text file is meant to contain lines and lines are a (not-too-long) sequence of characters terminated by a newline character, so the behaviour of text utilities like sed on it is unspecified and in practice varies from implementation to implementation.

Some sed implementation would have dismissed those spurious character after the last line.

AFAIK, xml files are meant to be text files, so that means sed just fixed it for you.

If you do need that file not to end in a newline character, then you could use perl or other tools that can cope with non-text data.

perl -pe 's|<LIST_G_STATEMENT>|$&\n|g'
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • 1
    The terminating newline _is_ helpful, if you expect to pipe your `sed` output into any other standard Unix utility. Honestly, I didn't notice `sed` did this for _years_, since Bourne shell command substitutions like `$(sed 's/bas/replac/' <<<'basement')` furtively trim the final newline, if there is one. But there _are_ times when you definitely don't want it; _e.g._, manipulating X clipboard text with `sed`. FYI, GNU sed, if available, does not add a terminating newline if you use `p` it with the `-n` option, as described in [this SE answer](https://unix.stackexchange.com/a/493477/278323). – TheDudeAbides Dec 13 '19 at 22:23
0

Here's a way to remove the final newline from a file using dd:

printf "" | dd  of='/path/to/file' seek=<filesize_in_bytes - 1> bs=1 count=1

To test whether a file ends with a newline you could use:

tail -c 1 /path/to/file | tr -dc '\n' | wc -c

And to get the file size in bytes use:

wc -c < /path/to/file
chan
  • 1
0

According to this AIX manual IBM's tail does -reverse - which looks pretty cool. So long as your file is under 20KB the following should work:

tail -r <file | dd bs=1 skip=1 | tail -r >file.new
mikeserv
  • 57,448
  • 9
  • 113
  • 229