A better paste command

Question

I have the following two files ( I padded the lines with dots so every line in a file is the same width and made file1 all caps to make it more clear).

contents of file1:

ETIAM......
SED........
MAECENAS...
DONEC......
SUSPENDISSE

contents of file2

Lorem....
Proin....
Nunc.....
Quisque..
Aenean...
Nam......
Vivamus..
Curabitur
Nullam...

Notice that file2 is longer than file1.

When I run this command:

paste file1 file2

I get this output

ETIAM...... Lorem....
SED........ Proin....
MAECENAS... Nunc.....
DONEC...... Quisque..
SUSPENDISSE Aenean...
    Nam......
    Vivamus..
    Curabitur
    Nullam...

What can I do for the output to be as follows ?

ETIAM...... Lorem....
SED........ Proin....
MAECENAS... Nunc.....
DONEC...... Quisque..
SUSPENDISSE Aenean...
            Nam......
            Vivamus..
            Curabitur
            Nullam...

I tried

paste file1 file2 | column -t

but it does this:

ETIAM......  Lorem....
SED........  Proin....
MAECENAS...  Nunc.....
DONEC......  Quisque..
SUSPENDISSE  Aenean...
Nam......
Vivamus..
Curabitur
Nullam...

non as ugly as the original output but wrong column-wise anyway.

`paste` is using tabs in front of the lines from second file. You may have to use a postprocessor to align the columns appropriately. — unxnut, Nov 05 '13 at 14:17
`paste file[12] | column -s $'\t' -t -o ' '` or have I missed something? — , Feb 24 '21 at 17:03

Mark Plotnick · Accepted Answer · 2018-12-07T16:07:32.330

20

Assuming you don't have any tab characters in your files,

paste file1 file2 | expand -t 13

with the arg to -t suitably chosen to cover the desired max line width in file1.

OP has added a more flexible solution:

I did this so it works without the magic number 13:

paste file1 file2 | expand -t $(( $(wc -L <file1) + 2 ))

It's not easy to type but can be used in a script.

edited Dec 07 '18 at 16:07

answered Nov 05 '13 at 15:02

Mark Plotnick

24,913
2
59
81

nice! I didn't know about expand before I read your answer :) – TabeaKischka Dec 07 '18 at 14:30

score 4 · Answer 2 · edited May 23 '17 at 12:39

I thought awk might do it nicely, so I googled "awk reading input from two files" and found an article on stackoverflow to use as a starting point.

First is the condensed version, then fully commented below that. This took a more than a few minutes to work out. I'd be glad of some refinements from smarter folks.

awk '{if(length($0)>max)max=length($0)}
FNR==NR{s1[FNR]=$0;next}{s2[FNR]=$0}
END { format = "%-" max "s\t%-" max "s\n";
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) { printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:"" }
}' file1 file2

And here is the fully documented version of the above.

# 2013-11-05 [email protected]
# Invoke thus:
#   awk -f this_file file1 file2
# The result is what you asked for and the columns will be
# determined by input file order.
#----------------------------------------------------------
# No matter which file we're reading,
# keep track of max line length for use
# in the printf format.
#
{ if ( length($0) > max ) max=length($0) }

# FNR is record number in current file
# NR is record number over all
# while they are equal, we're reading the first file
#   and we load the strings into array "s1"
#   and then go to the "next" line in the file we're reading.
FNR==NR { s1[FNR]=$0; next }

# and when they aren't, we're reading the
#   second file and we put the strings into
#   array s2
{s2[FNR]=$0}

# At the end, after all lines from both files have
# been read,
END {
  # use the max line length to create a printf format
  # the right widths
  format = "%-" max "s\t%-" max "s\n"
  # and figure the number of array elements we need
  # to cycle through in a for loop.
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) {
     printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:""
  }
}

+1 this is the only answer that does work with arbitrary input (i.e. with lines that may contain tabs). I don't think this could be significantly refined/improved. — don_crissti, Feb 15 '17 at 21:21

score 3 · Answer 3 · answered Nov 06 '13 at 17:06

On Debian and derivatives, column has a -n nomerge option that allows column to do the right thing with empty fields. Internally, column uses the wcstok(wcs, delim, ptr) function, which splits a wide character string into tokens delimited by the wide characters in the delim argument.

wcstok starts by skipping wide characters in delim, before recognizing the token. The -n option uses an algorythm that doesn't skip initial wide-characters in delim.

Unfortunately, this isn't very portable: -n is Debian-specific, and column is not in POSIX, it's apparently a BSD thing.

Jeff Taylor · Answer 4 · 2017-02-15T20:54:18.120

2

Taking out the dots that you used for padding:

file1:

ETIAM
SED
MAECENAS
DONEC
SUSPENDISSE

file2:

Lorem
Proin
Nunc
Quisque
Aenean
Nam
Vivamus
Curabitur
Nullam

Try this:

$ ( echo ".TS"; echo "l l."; paste file1 file2; echo ".TE" ) | tbl | nroff | more

And you will get:

ETIAM         Lorem
SED           Proin
MAECENAS      Nunc
DONEC         Quisque
SUSPENDISSE   Aenean
              Nam
              Vivamus
              Curabitur
              Nullam

edited Feb 15 '17 at 20:54

answered Feb 15 '17 at 20:47

Jeff Taylor

121
3

This, like the other solutions using `paste` will fail to print the proper output if there are any lines containing tabs. +1 for being different though – don_crissti Feb 15 '17 at 21:12
+1. Would you please explain how the solution works? – Tulains Córdova Feb 15 '17 at 22:45

score 2 · Answer 5 · answered Nov 05 '13 at 14:21

2

Not a very good solution but I was able to do it using

paste file1 file2 | sed 's/^TAB/&&/'

where TAB is replaced with the tab character.

answered Nov 05 '13 at 14:21

unxnut

5,908
2
19
27

What is the role of `&&` in the sed command? – Vombat Nov 05 '13 at 14:57
2

A single `&` puts what is being searched for (a tab in this case). This command simply replaces the tab at the beginning with two tabs. – unxnut Nov 05 '13 at 15:59
I had to change `TAB` to `\t` to make this work in zsh on Ubuntu debian. And it does only work if file1 has less than 15 chars – rubo77 Nov 30 '13 at 06:53

score 1 · Answer 6 · edited Dec 02 '13 at 18:30

An awk solution that should be fairly portable, and should work for an arbitrary number of input files:

# Invoke thus:
#   awk -F\\t -f this_file file1 file2

# every time we read a new file, FNR goes to 1

FNR==1 {
    curfile++                       # current file
}

# read all files and save all the info we'll need
{
    column[curfile,FNR]=$0          # save current line
    nlines[curfile]++               # number of lines in current file
    if (length > len[curfile])
            len[curfile] = length   # max line length in current file
}

# finally, show the lines from all files side by side, as a table
END {
    # iterate through lines until there are no more lines in any file
    for (line = 1; !end; line++) {
            $0 = _
            end = 1

            # iterate through all files, we cannot use
            #   for (file in nlines) because arrays are unordered
            for (file=1; file <= curfile; file++) {
                    # columnate corresponding line from each file
                    $0 = $0 sprintf("%*s" FS, len[file], column[file,line])
                    # at least some file had a corresponding line
                    if (nlines[file] >= line)
                            end = 0
            }

            # don't print a trailing empty line
            if (!end)
                    print
    }
}

How do you use this on file1 and file2? I called the script `paste-awk` and tried `paste file1 file2|paste-awk` and I tried `awk paste-awk file1 file2` but none worked. — rubo77, Nov 30 '13 at 07:04
I get `awk: Line:1: (FILENAME=file1 FNR=1) Fatal: Division by zero` — rubo77, Nov 30 '13 at 07:04
@rubo77: `awk -f paste-awk file1 file2` should work, at least for GNU awk and mawk. — ninjalj, Dec 02 '13 at 10:32
This works, although it is slightly different from `paste` there is less space between the two rows. And if the input file has not all rows same length, it will result in an align-right row — rubo77, Dec 02 '13 at 14:14

A better paste command

6 Answers6