41

Using a common command line tool like sed or awk, is it possible to join all lines that end with a given character, like a backslash?

For example, given the file:

foo bar \
bash \
baz
dude \
happy

I would like to get this output:

foo bar bash baz
dude happy
don_crissti
  • 79,330
  • 30
  • 216
  • 245
Cory Klein
  • 18,391
  • 26
  • 81
  • 93

9 Answers9

32

a shorter and simpler sed solution:

sed  '
: again
/\\$/ {
    N
    s/\\\n//
    t again
}
' textfile

or one-liner if using GNU sed:

sed ':x; /\\$/ { N; s/\\\n//; tx }' textfile
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
neurino
  • 1,809
  • 3
  • 19
  • 25
  • 1
    good one... I initally looked at this and couldn't understand it (so it wen't into the too-hard basket)... but after an in-depth look at Gilles' answer (which took quite along time) I had another look at your answer and it looked remarkably understandable I think I'm starting to understand `sed` :)... You are appending each line directly to the pattern-space, and when a "normally-ended" line comes along, the entire pattern space falls through and auto prints (because there is no -n option)... neat ! .. +1 – Peter.O May 24 '11 at 16:27
  • @fred: thanks I think I'm starting to understand sed too, it offers nice tools for multiline editing but how to mix-up them to get what you need is not straightforward nor readability is at top... – neurino May 25 '11 at 07:21
  • Beware of DOS line endings, aka. carriage returns or \r ! – user77376 Nov 28 '16 at 10:01
22

It is possibly easiest with perl (since perl is like sed and awk, I hope it is acceptable to you):

perl -p -e 's/\\\n//'
camh
  • 38,261
  • 8
  • 74
  • 62
21

Here's an awk solution. If a line ends with a \, strip the backslash and print the line with no terminating newline; otherwise print the line with a terminating newline.

awk '{if (sub(/\\$/,"")) printf "%s", $0; else print $0}'

It's also not too bad in sed, though awk is obviously more readable.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
3

This is not an answer as such. It is a side issue about sed.

Specifically, I needed to take Gilles sed command apart piece by piece to understand it... I started writing some notes on it, and then thought it may be useful here to someone...

so here it is... Gilles' sed script in documented format:


#!/bin/bash
#######################################
sed_dat="$HOME/ztest.dat"
while IFS= read -r line ;do echo "$line" ;done <<'END_DAT' >"$sed_dat"
foo bar \
bash \
baz
dude \
happy
yabba dabba 
doo
END_DAT

#######################################
sedexec="$HOME/ztest.sed"
while IFS= read -r line ;do echo "$line" ;done <<'END-SED' >"$sedexec"; \
sed  -nf "$sedexec" "$sed_dat"

  s/\\$//        # If a line has trailing '\', remove the '\'
                 #    
  t'Hold-append' # branch: Branch conditionally to the label 'Hold-append'
                 #         The condition is that a replacement was made.
                 #         The current pattern-space had a trailing '\' which  
                 #         was replaced, so branch to 'Hold-apend' and append 
                 #         the now-truncated line to the hold-space
                 #
                 # This branching occurs for each (successive) such line. 
                 #
                 # PS. The 't' command may be so named because it means 'on true' 
                 #     (I'm not sure about this, but the shoe fits)  
                 #
                 # Note: Appending to the hold-space introduces a leading '\n'   
                 #       delimiter for each appended line
                 #  
                 #   eg. compare the hex dump of the follow 4 example commands:  
                 #       'x' swaps the hold and patten spaces
                 #
                 #       echo -n "a" |sed -ne         'p' |xxd -p  ## 61 
                 #       echo -n "a" |sed -ne     'H;x;p' |xxd -p  ## 0a61
                 #       echo -n "a" |sed -ne   'H;H;x;p' |xxd -p  ## 0a610a61
                 #       echo -n "a" |sed -ne 'H;H;H;x;p' |xxd -p  ## 0a610a610a61

   # No replacement was made above, so the current pattern-space
   #   (input line) has a "normal" ending.

   x             # Swap the pattern-space (the just-read "normal" line)
                 #   with the hold-space. The hold-space holds the accumulation
                 #   of appended  "stripped-of-backslah" lines

   G             # The pattern-space now holds zero to many "stripped-of-backslah" lines
                 #   each of which has a preceding '\n'
                 # The 'G' command Gets the Hold-space and appends it to 
                 #   the pattern-space. This append action introduces another
                 #   '\n' delimiter to the pattern space. 

   s/\n//g       # Remove all '\n' newlines from the pattern-space

   p             # Print the pattern-space

   s/.*//        # Now we need to remove all data from the pattern-space
                 # This is done as a means to remove data from the hold-space 
                 #  (there is no way to directly remove data from the hold-space)

   x             # Swap the no-data pattern space with the hold-space
                 # This leaves the hold-space re-initialized to empty...
                 # The current pattern-space will be overwritten by the next line-read

   b             # Everything is ready for the next line-read. It is time to make 
                 # an unconditional branch  the to end of process for this line
                 #  ie. skip any remaining logic, read the next line and start the process again.

  :'Hold-append' # The ':' (colon) indicates a label.. 
                 # A label is the target of the 2 branch commands, 'b' and 't'
                 # A label can be a single letter (it is often 'a')
                 # Note;  'b' can be used without a label as seen in the previous command 

    H            # Append the pattern to the hold buffer
                 # The pattern is prefixed with a '\n' before it is appended

END-SED
#######
Peter.O
  • 32,426
  • 28
  • 115
  • 163
  • 1
    Neurino's solution is pretty simple actually. Speaking of mildly complicated sed, [this may interest you](http://stackoverflow.com/questions/5930246/what-does-this-sed-expression-from-todo-sh-do). – Gilles 'SO- stop being evil' May 24 '11 at 16:41
2

Yet another common command line tool would be ed, which by default modifies files in-place and therefore leaves file permissions unmodified (for more information on ed see Editing files with the ed text editor from scripts)

str='
foo bar \
bash 1 \
bash 2 \
bash 3 \
bash 4 \
baz
dude \
happy
xxx
vvv 1 \
vvv 2 \
CCC
'

# We are using (1,$)g/re/command-list and (.,.+1)j to join lines ending with a '\'
# ?? repeats the last regex search.
# replace ',p' with 'wq' to edit files in-place
# (using Bash and FreeBSD ed on Mac OS X)
cat <<-'EOF' | ed -s <(printf '%s' "$str")
H
,g/\\$/s///\
.,.+1j\
??s///\
.,.+1j
,p
EOF
verdo
  • 29
  • 1
2

Using the fact that read in the shell will interpret backslashes when used without -r:

$ while IFS= read line; do printf '%s\n' "$line"; done <file
foo bar bash baz
dude happy

Note that this will also interpret any other backslash in the data.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
1

The Mac version based on @Giles solution would look like this

sed ':x
/\\$/{N; s|\\'$'\\n||; tx
}' textfile

Where the main difference is how newlines are represented, and combining any further into one line breaks it

Andy
  • 202
  • 2
  • 5
1

A simple(r) solution that loads the whole file in memory:

sed -z 's/\\\n//g' file                   # GNU sed 4.2.2+.

Or an still short one which works understanding (output) lines (GNU syntax):

sed ':x;/\\$/{N;bx};s/\\\n//g' file

On one line (POSIX syntax):

sed -e :x -e '/\\$/{N;bx' -e '}' -e 's/\\\n//g' file

Or use awk (if the file is too big to fit in memory):

awk '{a=sub(/\\$/,"");printf("%s%s",$0,a?"":RS)}' file
-1

You can use cpp, but it produces some empty lines where it merged the output, and some introduction which I remove with sed - maybe it can be done with cpp-flags and options as well:

echo 'foo bar \
bash \
baz
dude \
happy' | cpp | sed 's/# 1 .*//;/^$/d'
foo bar bash baz
dude happy
user unknown
  • 10,267
  • 3
  • 35
  • 58
  • Are you sure `cpp` _is_ a solution? In your example the `echo` with string in double-quotes already outputs straightened text, so `cpp` is pointless. (This also applies to your `sed` code.) If you put the string in single-quotes, `cpp` just removes the backslashes but not concatenates the lines. (The concatenation with `cpp` would work if there would be no space before the backslashes, but then the separate words would be joined without separators.) – manatwork May 22 '12 at 09:18
  • @manatwork: Outsch! :) I was astonished, that the sed command worked, but of course, it wasn't the sed command, but the bash itself interprets backslash-linebreak as continuation of the previous line. – user unknown May 22 '12 at 11:41
  • Using `cpp` like that still not concatenates the lines for me. And the use of `sed` is definitely unnecessary. Use `cpp -P`: “`-P` Inhibit generation of linemarkers in the output from the preprocessor.” – man cpp – manatwork May 22 '12 at 11:57
  • Your command doesn't work for me: `cpp: “-P: No such file or directory cpp: warning: '-x c' after last input file has no effect cpp: unrecognized option '-P:' cpp: no input files` A `cpp --version` reveals `cpp (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3` - what? Ubuntu is patching cpp? Why? I would have expected to read GNU... – user unknown May 22 '12 at 12:14
  • Interesting. Ubuntu's `cpp` indeed concatenates the lines and leaves some blanks. Even more interesting, the same version 4.4.3-4ubuntu5.1 here accepts `-P`. However it only eliminates the linemarkers, the empty lines remain. – manatwork May 22 '12 at 12:38