Combing input from two files in awk

Question

I've read a bit about awk. It's proven to be extremely useful for single data Suppose I have two input files:

## inp1
x y
1 3
2 4
6 9
... 

## inp2
x z
1 5
2 19
6 9

I want to output something that 'combines' both files. Something like:

## output
x y z
1 3 5
2 4 19
6 9 9

I can think of ideas like interleaving these two files, like this: https://stackoverflow.com/questions/4011814/how-to-interleave-lines-from-two-text-files and doing something with awk.

Or maybe something using associative arrays? I'm not too sure however, which is the reason for this question ;).

I am using Linux.

There are some cases that should be explicitly stated, such as: what should happen if inp1 has a line for x=4 and inp2 does not, or vice versa; and if one file (or both) has two lines for x=7? — Paul_Pedant, Commented Jun 13, 2020 at 15:09
@Paul_Pedant I agree, and not only that. It was not explicitly stated if the files should be "combined" using the x column (or should it be the first column, regardless of the header?). I hope I am not being pedant, but I think otherwise the problem is not quite defined. — Quasímodo, Commented Jun 13, 2020 at 22:00

terdon · Accepted Answer · 2020-06-13 15:08:04Z

Sounds like you're simply looking for join to join the files on the first field:

$ join -j1 file1 file2 
x y z
1 3 5
2 4 19
6 9 9

Note that join expects its input to be sorted, so you might need to do:

$ join -j1 <(sort file1) <(sort file2 )
1 3 5
2 4 19
6 9 9
x y z

However, that will screw up your header, so to avoid that you coud do:

$ join -j1 <(head -n1 file1) <(head -n2 file2); join -j1 <(tail -n+2 file1|sort) <(tail -n+2 file2 |sort )
x y z
1 3 5
2 4 19
6 9 9

And to save that to a new file:

( \
    join -j1 <(head -n1 file1) <(head -n2 file2); 
    join -j1 <(tail -n+2 file1|sort) <(tail -n+2 file2 |sort ) \
) > newFile

Alternatively, with awk:

$ awk 'NR==FNR{a[$1]=$2; next}{print $1,$2,a[$1]}' file2 file1 
x y z
1 3 5
2 4 19
6 9 9

I'd prefer the awk answer... Thanks!!
– Suraaj K S
Commented Jun 13, 2020 at 15:13 — Suraaj K S, Commented Jun 13, 2020 at 15:13

Ed Morton · Accepted Answer · 2020-06-14 12:44:38Z

0

With the sample input you provided all you need is this if the fields are tab-separated:

$ paste file1 <(cut -f2 file2)
x   y   z
1   3   5
2   4   19
6   9   9

or this if separated by a single blank:

$ paste -d' ' file1 <(cut -d' ' -f2 file2)
x y z
1 3 5
2 4 19
6 9 9

answered Jun 14, 2020 at 12:44

Ed Morton

34.6k6 gold badges24 silver badges55 bronze badges

Add a comment |

AdminBee · Accepted Answer · 2020-06-16 11:36:57Z

0

Using awk:

Method 1

awk 'FNR==NR{a[FNR]=$2;next}{print $1,a[FNR],$2}' f1 f2

Method 2

awk 'NR==FNR{a[$1];a[FNR]=$2;next}($1 in a) {print $1,a[FNR],$2}' f1 f2

Output

AdminBee

23.5k25 gold badges52 silver badges76 bronze badges

answered Jun 16, 2020 at 7:36

Praveen Kumar BS

5,2952 gold badges11 silver badges15 bronze badges

Add a comment |

3 Answers 3