18

I have the following two files ( I padded the lines with dots so every line in a file is the same width and made file1 all caps to make it more clear).

contents of file1:

ETIAM......
SED........
MAECENAS...
DONEC......
SUSPENDISSE

contents of file2

Lorem....
Proin....
Nunc.....
Quisque..
Aenean...
Nam......
Vivamus..
Curabitur
Nullam...

Notice that file2 is longer than file1.

When I run this command:

paste file1 file2

I get this output

ETIAM...... Lorem....
SED........ Proin....
MAECENAS... Nunc.....
DONEC...... Quisque..
SUSPENDISSE Aenean...
    Nam......
    Vivamus..
    Curabitur
    Nullam...

What can I do for the output to be as follows ?

ETIAM...... Lorem....
SED........ Proin....
MAECENAS... Nunc.....
DONEC...... Quisque..
SUSPENDISSE Aenean...
            Nam......
            Vivamus..
            Curabitur
            Nullam...

I tried

paste file1 file2 | column -t

but it does this:

ETIAM......  Lorem....
SED........  Proin....
MAECENAS...  Nunc.....
DONEC......  Quisque..
SUSPENDISSE  Aenean...
Nam......
Vivamus..
Curabitur
Nullam...

non as ugly as the original output but wrong column-wise anyway.

5
  • 2
    paste is using tabs in front of the lines from second file. You may have to use a postprocessor to align the columns appropriately. Commented Nov 5, 2013 at 14:17
  • 3
    paste file1 file2 | column -tn ? Commented Nov 5, 2013 at 14:18
  • does file1 always have fixed size columns? Commented Nov 5, 2013 at 14:18
  • @RSFalcon7 Yes, it does. Commented Nov 6, 2013 at 0:42
  • paste file[12] | column -s $'\t' -t -o ' ' or have I missed something? Commented Feb 24, 2021 at 17:03

9 Answers 9

21

Assuming you don't have any tab characters in your files,

paste file1 file2 | expand -t 13

with the arg to -t suitably chosen to cover the desired max line width in file1.

OP has added a more flexible solution:

I did this so it works without the magic number 13:

paste file1 file2 | expand -t $(( $(wc -L <file1) + 2 ))

It's not easy to type but can be used in a script.

1
  • nice! I didn't know about expand before I read your answer :) Commented Dec 7, 2018 at 14:30
4

I thought awk might do it nicely, so I googled "awk reading input from two files" and found an article on stackoverflow to use as a starting point.

First is the condensed version, then fully commented below that. This took a more than a few minutes to work out. I'd be glad of some refinements from smarter folks.

awk '{if(length($0)>max)max=length($0)}
FNR==NR{s1[FNR]=$0;next}{s2[FNR]=$0}
END { format = "%-" max "s\t%-" max "s\n";
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) { printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:"" }
}' file1 file2

And here is the fully documented version of the above.

# 2013-11-05 [email protected]
# Invoke thus:
#   awk -f this_file file1 file2
# The result is what you asked for and the columns will be
# determined by input file order.
#----------------------------------------------------------
# No matter which file we're reading,
# keep track of max line length for use
# in the printf format.
#
{ if ( length($0) > max ) max=length($0) }

# FNR is record number in current file
# NR is record number over all
# while they are equal, we're reading the first file
#   and we load the strings into array "s1"
#   and then go to the "next" line in the file we're reading.
FNR==NR { s1[FNR]=$0; next }

# and when they aren't, we're reading the
#   second file and we put the strings into
#   array s2
{s2[FNR]=$0}

# At the end, after all lines from both files have
# been read,
END {
  # use the max line length to create a printf format
  # the right widths
  format = "%-" max "s\t%-" max "s\n"
  # and figure the number of array elements we need
  # to cycle through in a for loop.
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) {
     printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:""
  }
}
1
  • 1
    +1 this is the only answer that does work with arbitrary input (i.e. with lines that may contain tabs). I don't think this could be significantly refined/improved. Commented Feb 15, 2017 at 21:21
3

On Debian and derivatives, column has a -n nomerge option that allows column to do the right thing with empty fields. Internally, column uses the wcstok(wcs, delim, ptr) function, which splits a wide character string into tokens delimited by the wide characters in the delim argument.

wcstok starts by skipping wide characters in delim, before recognizing the token. The -n option uses an algorythm that doesn't skip initial wide-characters in delim.

Unfortunately, this isn't very portable: -n is Debian-specific, and column is not in POSIX, it's apparently a BSD thing.

2

Not a very good solution but I was able to do it using

paste file1 file2 | sed 's/^TAB/&&/'

where TAB is replaced with the tab character.

3
  • What is the role of && in the sed command? Commented Nov 5, 2013 at 14:57
  • 2
    A single & puts what is being searched for (a tab in this case). This command simply replaces the tab at the beginning with two tabs. Commented Nov 5, 2013 at 15:59
  • I had to change TAB to \t to make this work in zsh on Ubuntu debian. And it does only work if file1 has less than 15 chars Commented Nov 30, 2013 at 6:53
2

Taking out the dots that you used for padding:

file1:

ETIAM
SED
MAECENAS
DONEC
SUSPENDISSE

file2:

Lorem
Proin
Nunc
Quisque
Aenean
Nam
Vivamus
Curabitur
Nullam

Try this:

$ ( echo ".TS"; echo "l l."; paste file1 file2; echo ".TE" ) | tbl | nroff | more

And you will get:

ETIAM         Lorem
SED           Proin
MAECENAS      Nunc
DONEC         Quisque
SUSPENDISSE   Aenean
              Nam
              Vivamus
              Curabitur
              Nullam
2
  • This, like the other solutions using paste will fail to print the proper output if there are any lines containing tabs. +1 for being different though Commented Feb 15, 2017 at 21:12
  • +1. Would you please explain how the solution works? Commented Feb 15, 2017 at 22:45
1

An awk solution that should be fairly portable, and should work for an arbitrary number of input files:

# Invoke thus:
#   awk -F\\t -f this_file file1 file2

# every time we read a new file, FNR goes to 1

FNR==1 {
    curfile++                       # current file
}

# read all files and save all the info we'll need
{
    column[curfile,FNR]=$0          # save current line
    nlines[curfile]++               # number of lines in current file
    if (length > len[curfile])
            len[curfile] = length   # max line length in current file
}

# finally, show the lines from all files side by side, as a table
END {
    # iterate through lines until there are no more lines in any file
    for (line = 1; !end; line++) {
            $0 = _
            end = 1

            # iterate through all files, we cannot use
            #   for (file in nlines) because arrays are unordered
            for (file=1; file <= curfile; file++) {
                    # columnate corresponding line from each file
                    $0 = $0 sprintf("%*s" FS, len[file], column[file,line])
                    # at least some file had a corresponding line
                    if (nlines[file] >= line)
                            end = 0
            }

            # don't print a trailing empty line
            if (!end)
                    print
    }
}
5
  • How do you use this on file1 and file2? I called the script paste-awk and tried paste file1 file2|paste-awk and I tried awk paste-awk file1 file2 but none worked. Commented Nov 30, 2013 at 7:04
  • I get awk: Line:1: (FILENAME=file1 FNR=1) Fatal: Division by zero Commented Nov 30, 2013 at 7:04
  • @rubo77: awk -f paste-awk file1 file2 should work, at least for GNU awk and mawk. Commented Dec 2, 2013 at 10:32
  • This works, although it is slightly different from paste there is less space between the two rows. And if the input file has not all rows same length, it will result in an align-right row Commented Dec 2, 2013 at 14:14
  • @rubo77: the field separator can be set with -F\\t Commented Dec 2, 2013 at 15:30
1

You can use pr command instead:

$ pr -mtT file1 file2
ETIAM......                         Lorem....
SED........                         Proin....
MAECENAS...                         Nunc.....
DONEC......                         Quisque..
SUSPENDISSE                         Aenean...
                                    Nam......
                                    Vivamus..
                                    Curabitur
                                    Nullam...

-m is for merging and -t and -T suppress the header. Check the man pages for pr to see all option.

1
  • Beware it truncates lines wider than 34 columns (with the default page width of 72 columns) Commented Nov 24 at 20:46
1
$ paste file1 file2 | column -s$'\t' -t
ETIAM......  Lorem....
SED........  Proin....
MAECENAS...  Nunc.....
DONEC......  Quisque..
SUSPENDISSE  Aenean...
             Nam......
             Vivamus..
             Curabitur
             Nullam...

or if your input might contain tabs then using any POSIX awk:

$ awk '
    NR==FNR { n=length(); wid=(n>wid?n:wid); vals[NR]=$0; next }
    { printf "%*s %s\n", wid, vals[FNR], $0 }
' file1 file2
ETIAM...... Lorem....
SED........ Proin....
MAECENAS... Nunc.....
DONEC...... Quisque..
SUSPENDISSE Aenean...
            Nam......
            Vivamus..
            Curabitur
            Nullam...
5
  • Beware the awk one assumes all characters are single-width (and with some awk implementations single-byte). Commented Nov 24 at 20:23
  • busybox awk doesn't support %*s in the build of busybox that comes with Debian here. Commented Nov 24 at 20:25
  • @StéphaneChazelas thanks for the heads up. Commented Nov 24 at 20:28
  • Doing printf "%"wid"s %s\n", vals[FNR], $0 would make it work in busybox awk. Commented Nov 24 at 20:30
  • Beware it prints nothing if file1 is empty. I've personally given up on using that unreliable NR==FNR trick and use awk '!file1_processed {...; next}; ...' file1 file1_processed=1 file2. Commented Nov 24 at 20:33
0

zsh's print builtin has a -C option to print arguments formatted in Columns.

$ f1=(${(f)"$(<file1)"}) f2=(${(f)"$(<file2)"})
$ f1[$#f2]+= f2[$#f1]+=
$ print -rC2 -- "$f1[@]" "$f2[@]"
ETIAM......  Lorem....
SED........  Proin....
MAECENAS...  Nunc.....
DONEC......  Quisque..
SUSPENDISSE  Aenean...
             Nam......
             Vivamus..
             Curabitur
             Nullam...
  • $(<file) like in ksh expands to the contents of file without the trailing newline characters.
  • the f parameter expansion flag, short for ps[\n] splits expansions (here applied to the above) on linefeeds.
  • With f1[$#f2]+= f2[$#f1]+= we ensure the two arrays are of the same size, by appending nothing to their nth field, where n is the size of the other array, and in the process create them and extend the size of the array accordingly.
  • print -rC2 -- "$f1[@]" "$f2[@]" prints those raw on 2 Columns.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.