I have two files as follows: file1.csv
+------------+----------+--------+---------+
| Account_ID | Asset_ID | LOT_ID | FLAG_F1 |
+------------+----------+--------+---------+
| 10000 | 20000 | 30000 | Y |
| 10001 | 20001 | 30001 | N |
| 10002 | 20002 | 30002 | Y |
| 10003 | 20003 | 30003 | N |
| 10004 | 20004 | 30004 | Y |
| 10005 | 20005 | 30005 | N |
| 10006 | 20006 | 30006 | Y |
+------------+----------+--------+---------+
file2.csv
+------------+----------+--------+---------+-----+-----+
| Account_ID | Asset_ID | LOT_ID | FLAG_F2 | XYZ | ABC |
+------------+----------+--------+---------+-----+-----+
| 10000 | 20000 | 30000 | Y | XYZ | ABC |
| 10001 | 20001 | 30001 | Y | XYZ | ABC |
| 10002 | 20002 | 30002 | Y | XYZ | ABC |
| 10003 | 20003 | 30003 | Y | XYZ | ABC |
| 10004 | 20004 | 30004 | Y | XYZ | ABC |
| 10005 | 20005 | 30005 | Y | XYZ | ABC |
| 10006 | 20006 | 30006 | Y | XYZ | ABC |
| 10006 | 20006 | 30006 | Y | XYZ | ABC |
| 10006 | 20006 | 30006 | Y | XYZ | ABC |
+------------+----------+--------+---------+-----+-----+
I am trying to get the following output:
+------------+----------+--------+---------+-----+-----+---------+
| Account_ID | Asset_ID | LOT_ID | FLAG_F2 | XYZ | ABC | FLAG_F1 |
+------------+----------+--------+---------+-----+-----+---------+
| 10000 | 20000 | 30000 | Y | XYZ | ABC | Y |
| 10001 | 20001 | 30001 | Y | XYZ | ABC | N |
| 10002 | 20002 | 30002 | Y | XYZ | ABC | Y |
| 10003 | 20003 | 30003 | Y | XYZ | ABC | N |
| 10004 | 20004 | 30004 | Y | XYZ | ABC | Y |
| 10005 | 20005 | 30005 | Y | XYZ | ABC | N |
| 10006 | 20006 | 30006 | Y | XYZ | ABC | Y |
| 10006 | 20006 | 30006 | Y | XYZ | ABC | Y |
| 10007 | 20007 | 30006 | Y | XYZ | ABC | |
| 10006 | 20003 | 30006 | Y | XYZ | ABC | |
+------------+----------+--------+---------+-----+-----+---------+
In the above output I am adding FLAG_F1 from file1.csv into the file2.csv on the condition of Account_ID,Asset_ID, and LOT_ID values are equal on both file1.csv and file2.csv. If condition fails, it can be blank.
I have tried the following code which is used awk by two .csv files compare using awk
awk -F',' '
FNR == NR {
if (FNR == 1) {next}
a[$1] = $2;
b[$1] = $3;
next;
}
{
if (FNR == 1) {print;next}
if (a[$1] == $2) {
print $1,$2,$3,b[$1];
}
else {
print $1,a[$1],b[$1],b[$1];
}
}
' OFS=',' file1.csv file2.csv
It's better if any one explains me the above code line by line.
awkbook first and come back with specific separate questions for things you don't understand.