Concatenate files placing an empty line between them

Question

I have a bunch of files with the same extension (let's say .txt) and I would like to concatenate them. I am using cat *.txt > concat.txt, but I would like to add a new line between each file so to distinguish them in concat.txt.

Is it possible to do it with a single bash command rather than an implementation such as this?

Thank you

terdon · Answer 1 · 2021-01-11T15:57:56.390

13

Not a single command, but a simple one-liner:

for f in *.txt; do cat -- "$f"; printf "\n"; done > newfile.txt

That will give this error:

cat: newfile.txt: input file is output file

But you can ignore it, at least on GNU/Linux systems. Stéphane Chazelas pointed out in the comments that apparently, on other systems this could result in an infinite loop instead, so to avoid it, try:

for f in *.txt; do 
    [[ "$f" = newfile.txt ]] || { cat -- "$f"; printf "\n"; }
done > newfile.txt

Or just don't add a .txt extension to the output file (it isn't needed and doesn't make any difference at all, anyway) so that it won't be included in the loop:

for f in *.txt; do cat -- "$f"; printf "\n"; done > newfile

edited Jan 11 '21 at 15:57

answered Jan 11 '21 at 12:32

terdon

234,489
66
447
667

1

Not all `cat` implementations will give you that *input file is output file*. Some others will happily run here potentially causing an infinite loop that fills up the filesystem. – Stéphane Chazelas Jan 11 '21 at 15:36
Note that `[[ "$f" = "newfile.txt" ]]` is a kshism. POSIXly, you'd use `[ "$f" = newfile.txt ]`. – Stéphane Chazelas Jan 11 '21 at 15:38
@StéphaneChazelas wait, what? That's a `cat` issue? I always thought it was the shell, not `cat`. Then why doesn't `cat file1 file2 > file1` complain? As for the quotes, thanks fixed. Having unquoted strings feels weird to me. – terdon Jan 11 '21 at 15:45
1

For `cat file > file`, I suppose your `cat` detects `file` is empty and does nothing instead of reporting an error. Solaris `cat` still reports an error there. Note how the error message starts with `cat:`. I can't see how the shell could detect the condition. – Stéphane Chazelas Jan 11 '21 at 15:50
@StéphaneChazelas looks like you're right, unsurprisingly enough. This will reproduce the error: `( echo foo> newfile.txt; cat newfile.txt; ) > newfile.txt` while this does not `( cat newfile.txt ) > newfile.txt`. So my `cat` (GNU coreutils, 8.32) seems to detect that the file is empty and doesn't complain in the second one. TIL, thanks! – terdon Jan 11 '21 at 16:01

Kusalananda · Accepted Answer · 2021-01-12T07:30:50.667

11

Using GNU sed:

sed -s -e $'$a\\\n' ./*.txt >concat.out

This concatenates all data to concat.out while at the same time appending an empty line to the end of each file processed.

The -s option to GNU sed makes the $ address match the last line in each file instead of, as usual, the last line of all data. The a command appends one or several lines at the given location, and the data added is a newline. The newline is encoded as $'\n', i.e. as a "C-string", which means we're using a shell that understands these (like bash or zsh). This would otherwise have to be added as a literal newline:

sed -s -e '$a\
' ./*.txt >concat.out

Actually, '$a\\' and '$a\ ' seems to work too, but I'm not entirely sure why.

This also work, if one thinks the a command is too bothersome to get right:

sed -s -e '${p;g;}' ./*.txt >concat.out

Any of these variation would insert an empty line at the end of the output of the last file too. If this final newline is not wanted, deletede it by passing the overall result through sed '$d' before redirecting to your output file:

sed -s -e '${p;g;}' ./*.txt | sed -e '$d' >concat.out

edited Jan 12 '21 at 07:30

answered Jan 11 '21 at 15:51

Kusalananda

320,670
36
633
936

1

@StéphaneChazelas You know, GNU software tries to be _so_ convenient that it's sometimes difficult to understand the magic that they implement... – Kusalananda Jan 11 '21 at 15:59
@StéphaneChazelas. `sed -s -e $'a\\\n'` adds an extra newline to every line of every file - not just the last line of each file. It is not equivalent to `sed -s -e '${p;g;}'` – fpmurphy Jan 12 '21 at 06:15
@Kusalananda. `sed -s -e $'$a\n' ./*.txt >concat.out` results in an extra newline at the end of `concat.out`. The OP wanted a newline between each file only. – fpmurphy Jan 12 '21 at 06:18
@fpmurphy, sorry, I meant `$'$a\\\n'`, the point being that `$'$a\n'` is `$a`, not `$a` like in the variant not using `$'...'`. – Stéphane Chazelas Jan 12 '21 at 06:26
@fpmurphy I'm aware that they get an extra newline at the end, and I'm ignoring it as it's trivial to remove it. Hmmm... I might mention how to do that anyway... Stephane was referring to a previous edit of my text that did not have the `p;g;` variation. – Kusalananda Jan 12 '21 at 06:33
I think the simplest solutions were Kusalananda's and terdon's, but even all the others were interesting. Kusalananda's has the advantage of keeping the extension. Thank you all. – Gigiux Jan 12 '21 at 11:25

score 5 · Answer 3 · answered Jan 11 '21 at 16:28

zsh has a P glob qualifier to prefix each filename resulting from a glob with an arbitrary argument.

While it's typically used for things like cmd *.txt(P[-i]) to prefix each filename with a given option, you could use here to insert any given file before each file. A temporary file containing an empty line could be done with =(print), so you could do:

() { cat file*.txt(P[$1]); } =(print)

On Linux or Cygwin, you could also do:

cat file*.txt(P[/dev/stdin]) <<< ''

αғsнιη · Answer 4 · 2021-01-11T15:48:14.113

Using GNU awk:

gawk -v RS='^$' -v ORS= '{
    print sep $0; sep="\n";
}' ./file*.txt >single.file

see Slurp-mode in awk?

prefix dot-slash in files name ./ is used to avoid problems with files named like file=x.txt for instance as awk do reading these kind of strings as a variable when these come after awk codes;

Another GNU awk approach would be:

gawk 'BEGINFILE{if (ARGIND>1) print ""};1' ./file*.txt >single.txt

which is better as it would add an empty line even if the last line doesn't end in a newline character and would avoid loading the whole files in memory.

there is also a sed alternative, but to remove very last \newline, you should add another pipe sed ... | to remove that.

sed -s '$s/$/\n/' file*.txt >single.file

score 4 · Answer 5 · answered Jan 11 '21 at 21:43

Perhaps not exactly what you were looking for, but like Quasímodo suggested in a comment, GNU's tail can add the empty line, in addition to a header with the filename:

$ echo 'this is foo' > foo.txt 
$ echo 'this is bar' > bar.txt   
$ tail -n+1 foo.txt bar.txt 
==> foo.txt <==
this is foo

==> bar.txt <==
this is bar

The -n+1 causes it to print the whole file; it means "print the tail starting from line 1."

If you want the header to be added even when there is only one file for consistency, you can use -v.

$ tail -n+1 foo.txt        
this is foo
$ tail -v -n+1 foo.txt 
==> foo.txt <==
this is foo

score 1 · Answer 6 · answered Jan 12 '21 at 10:19

1

This does not work in POSIX /bin/sh, but in bash:

cat file1 <(echo) file2 >concatenated

The <(echo) is replaced by a temporary named pipe that is connected to the output of the echo command, which generates a single newline.

answered Jan 12 '21 at 10:19

Simon Richter

4,409
18
20

1

... but it will only work easily for two files, and the OP seems to have "a bunch" of them. Maybe you can expand the answer to show how to make this into a shell script accepting "an arbitrary" number of input files? – AdminBee Jan 13 '21 at 11:07

score 0 · Answer 7 · answered Jan 12 '21 at 19:46

0

An example using Perl.

$ perl -e 'while(<>){print}continue{print"\n" if eof}' *.txt > concat.txt

which can be simplified to

$ perl -ne 'print; print "\n" if eof' [abc].txt > concat.txt

answered Jan 12 '21 at 19:46

RedGrittyBrick

2,089
20
22

Concatenate files placing an empty line between them

7 Answers7

Linked