0

I've a CSV file that contains about 25 columns. Some rows of the file contain 26 columns, so that I want to search for the lines that contain that extra column and remove it to be able to use awk with the whole file.

Fields are separated by ; Semicolon. The extra column is in the format of VARNAME="Text is here" and the value "text is here" is arbitrary text.

I managed to remove the VARNAME from all lines but I can't explore a pattern that matches the arbitrary value (the quoted text).

My target is, find lines with that extra column (VARNAME="Text is here") and remove it.

Example:

Current file:

ROW1: VAR1:"Value 1";VAR2="Value 2";VAR3="Value 3"
ROW2: VAR1:"Value 4";VAR2="Value 5";VAREXT="Different Values";VAR3="Value 6"

Target File should be:

ROW1: VAR1:"Value 1";VAR2="Value 2";VAR3="Value 3"
ROW2: VAR1:"Value 4";VAR2="Value 5";VAR3="Value 6"
1
  • you have wrote: "to search for the lines that contain that extra column". Post the exact search extra column value Commented Jun 1, 2017 at 14:26

2 Answers 2

2

You can use something like:

sed 's/;VAREXT.[^;]*//' file  #combine with -i for in-place editing

Testing:

a=$'"ROW2: VAR1:"Value 4";VAR2="Value 5";VAREXT="Different Values";VAR3="Value 6"'
b=$'"ROW2: VAR1:"Value 4";VAR2="Value 5";VAREXT="1234567";VAR3="Value 6"'
c=$'"ROW2: VAR1:"Value 4";VAR2="Value 5";VAREXT="VAREXT";VAR3="Value 6"'

echo "$a" |sed 's/;VAREXT.[^;]*//'
echo "$b" |sed 's/;VAREXT.[^;]*//'
echo "$c" |sed 's/;VAREXT.[^;]*//'

"ROW2: VAR1:"Value 4";VAR2="Value 5";VAR3="Value 6"
"ROW2: VAR1:"Value 4";VAR2="Value 5";VAR3="Value 6"
"ROW2: VAR1:"Value 4";VAR2="Value 5";VAR3="Value 6"
3
  • What if the VAREXT had a literal ; inside the double quotes'
    – user218374
    Commented Jun 1, 2017 at 17:52
  • @RakeshSharma I suppose that will fail, as all solutions using ; as delimiter would fail too. Commented Jun 1, 2017 at 18:26
  • The dot in your regular expression ;VAREXT.[^;]* is redundant or meaningless in this pattern matching.
    – Murmulodi
    Commented Jun 1, 2017 at 21:13
1

Expecting your csv has no header, there are no spaces after semicolon and only one VAREXT... per line, then with respect to your sample try:

sed 's/;VAREXT=\"[A-Za-z0-9 ]*\"//' in.csv

Where the value of VAREXT could be a composite of letters, digits and spaces.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.