Remove specific column if exists in CSV file

Question

I've a CSV file that contains about 25 columns. Some rows of the file contain 26 columns, so that I want to search for the lines that contain that extra column and remove it to be able to use awk with the whole file.

Fields are separated by ; Semicolon. The extra column is in the format of VARNAME="Text is here" and the value "text is here" is arbitrary text.

I managed to remove the VARNAME from all lines but I can't explore a pattern that matches the arbitrary value (the quoted text).

My target is, find lines with that extra column (VARNAME="Text is here") and remove it.

Example:

Current file:

ROW1: VAR1:"Value 1";VAR2="Value 2";VAR3="Value 3"
ROW2: VAR1:"Value 4";VAR2="Value 5";VAREXT="Different Values";VAR3="Value 6"

Target File should be:

ROW1: VAR1:"Value 1";VAR2="Value 2";VAR3="Value 3"
ROW2: VAR1:"Value 4";VAR2="Value 5";VAR3="Value 6"

you have wrote: "to search for the lines that contain that extra column". Post the exact search extra column value — RomanPerekhrest, Commented Jun 1, 2017 at 14:26

George Vasiliou · Accepted Answer · 2017-06-01 14:46:11Z

2

You can use something like:

sed 's/;VAREXT.[^;]*//' file  #combine with -i for in-place editing

Testing:

a=$'"ROW2: VAR1:"Value 4";VAR2="Value 5";VAREXT="Different Values";VAR3="Value 6"'
b=$'"ROW2: VAR1:"Value 4";VAR2="Value 5";VAREXT="1234567";VAR3="Value 6"'
c=$'"ROW2: VAR1:"Value 4";VAR2="Value 5";VAREXT="VAREXT";VAR3="Value 6"'

echo "$a" |sed 's/;VAREXT.[^;]*//'
echo "$b" |sed 's/;VAREXT.[^;]*//'
echo "$c" |sed 's/;VAREXT.[^;]*//'

"ROW2: VAR1:"Value 4";VAR2="Value 5";VAR3="Value 6"
"ROW2: VAR1:"Value 4";VAR2="Value 5";VAR3="Value 6"
"ROW2: VAR1:"Value 4";VAR2="Value 5";VAR3="Value 6"

answered Jun 1, 2017 at 14:46

George Vasiliou

8,0413 gold badges22 silver badges43 bronze badges

What if the VAREXT had a literal ; inside the double quotes'
– user218374
Commented Jun 1, 2017 at 17:52
@RakeshSharma I suppose that will fail, as all solutions using ; as delimiter would fail too.
– George Vasiliou
Commented Jun 1, 2017 at 18:26
The dot in your regular expression ;VAREXT.[^;]* is redundant or meaningless in this pattern matching.
– Murmulodi
Commented Jun 1, 2017 at 21:13

Add a comment |

Murmulodi · Accepted Answer · 2017-06-01 15:01:10Z

1

Expecting your csv has no header, there are no spaces after semicolon and only one VAREXT... per line, then with respect to your sample try:

sed 's/;VAREXT=\"[A-Za-z0-9 ]*\"//' in.csv

Where the value of VAREXT could be a composite of letters, digits and spaces.

edited Jun 1, 2017 at 15:01

answered Jun 1, 2017 at 14:23

Murmulodi

1,2166 gold badges20 silver badges38 bronze badges

Add a comment |

Stack Exchange Network

Remove specific column if exists in CSV file

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Remove specific column if exists in CSV file

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions