I am splitting a csv file where the first 3 columns will be common for all the output files.
input file:
h1 h2 h3 o1 o2 ....
a b c d e ....
a1 b1 c1 d1 e1 ....
output files:
o1.csv:
h1 h2 h3 o1
a b c d
a1 b1 c1 d1
o2.csv:
h1 h2 h3 o2
a b c e
a1 b1 c1 e1
So if there are n columns in the input file , the code creates n-3 output files. However my code is inefficient and is quite slow. It takes 20 seconds for 50000 rows.
old_IFS=$IFS
START_TIME=`date`
DELIMITER=,
# reading and writing headers
headers_line=$(head -n 1 "$csv_file")
IFS=$DELIMITER read -r -a headers <<< $headers_line
common_headers=${headers[0]}$DELIMITER${headers[1]}$DELIMITER${headers[2]}
for header in "${headers[@]:3}"
do
# writing headers to every file
echo $common_headers$DELIMITER$header > "$header$START_TIME".csv
done
# reading csv file line by line
i=1
while IFS=$DELIMITER read -r -a row_data
do
test $i -eq 1 && ((i++)) && continue # ignoring headers
j=0
common_data=${row_data[0]}$DELIMITER${row_data[1]}$DELIMITER${row_data[2]}
for val in "${row_data[@]:3}"
do
# appending row to every new csv file
echo $common_data$DELIMITER$val >> "${headers[(($j+3))]}$START_TIME".csv
((j++))
done
done < $csv_file
IFS=${old_IFS}
Any suggestions are appreciated.