I have two tab separated files, each with two columns. I want to create a file which contains overlapping elements by column 1 of the two files. To do so, I put file 1 in an array first then scanned the array to check against file 2 for overlaps. However, somehow the index of the array cannot be recognized. See below for the elaboration of the problem.
The first 3 lines of the files look like this:
File 1:
90001 raw acceleration data
2634 Heavy DIY
1011 Light DIY
File 2:
2634 218263
25680 44313
25681 44313
To show that there are overlaps in column 1 of the two files:
user@cluster:~> grep 90001 file2
90001 103662
user@cluster:~> grep 2634 file2
2634 218263
To create file 3, I tried this first, which yielded an empty file.
awk 'BEGIN {FS = "\t"; OFS= "\t"}
NR==FNR {a[$1]=$2; next}
{ if($1 in a) print $1, a[$1]}' file1 file2 > file3
The following code confirmed the issue is the index of the array was not recognized; because adding the else
line actually prints file2 into file3.
awk 'BEGIN {FS = "\t"; OFS= "\t"}
NR==FNR {a[$1]=$2; next}
{if($1 in a)
print $1, a[$1]
else
print $1, $2}' file1 file2 > file3
I am quite puzzled. I wonder what might have caused the issue and how I can fix it? Thanks in advance.
LC_ALL=C sed -n l file1 file2
? (l
being lowercaseL
, not the digit 1)90001\r\traw acceleration data$
I wonder what is\r\t
doing there. Sorry I am not familiar with sed. Thanks.\r
is part of the field.