55

Having a CSV file like this:

HEADER
"first, column"|"second "some random quotes" column"|"third ol' column"
FOOTER

and looking for result like:

HEADER
first, column|second "some random quotes" column|third ol' column

in other words removing FOOTER, quotes in beginning, end and around |.

So far this code works:

sed '/FOOTER/d' csv > csv1 | #remove FOOTER
sed 's/^\"//' csv1 > csv2 | #remove quote at the beginning
sed 's/\"$//' csv2 > csv3 | #remove quote at the end
sed 's/\"|\"/|/g' csv3 > csv4 #remove quotes around pipe

As you see the problem is it creates 4 extra files.

Here is another solution, that has a goal not to create extra files and to do the same thing in a single script. It doesn't work very well.

#!/bin/ksh

sed '/begin/, /end/ { 
        /FOOTER/d
        s/^\"//
        s/\"$//
        s/\"|\"/|/g 
}' csv > csv4
1
  • 2
    Since you are having quotes you can have newlines in the fields. your sed is not going to work with that, only with simplified csv. Use a programming language with a library that can handle real CSV files (Python/Perl/Ruby). Commented Sep 12, 2015 at 12:53

3 Answers 3

72

First of all, as Michael showed, you can just combine all of these into a single command:

sed '/^FOOTER/d; s/^\"//; s/\"$//; s/\"|\"/|/g' csv > csv1

I think some sed implementations can't cope with that and might need:

  sed -e '/^FOOTER/d' -e 's/^\"//' -e 's/\"$//' -e 's/\"|\"/|/g' csv > csv1

That said, it looks like your fields are defined by | and you just want to remove " around the entire field, leaving those that are within the field. In that case, you could do:

$ sed '/FOOTER/d; s/\(^\||\)"/\1/g; s/"\($\||\)/\1/g' csv 
HEADER
first, column|second "some random quotes" column|third ol' column

Or, with GNU sed:

sed -r '/FOOTER/d; s/(^|\|)"/\1/g; s/"($|\|)/\1/g' csv 

You could also use Perl:

$ perl -F"|" -lane 'next if /FOOTER/; s/^"|"$// for @F; print @F' csv 
HEADER
first, column|second some random quotes column|third ol' column
20

This would also work:

sed 's/^"//; s/"|"/|/g; s/""$/"/'

Example:

$ echo '"this"|" and "ths""|" and "|" this 2"|" also "this", "thi", "and th""' | 
sed 's/^"//; s/"|"/|/g; s/""$/"/'
this| and "ths"| and | this 2| also "this", "thi", "and th"

pretty version

sed '
s/^"//
s/"|"/|/g
s/""$/"/
$d
'
2
  • 1
    This doesn't deal with the footer. Commented Sep 12, 2015 at 14:57
  • 3
    But that will remove the last line no matter what its contents. If there is no FOOTER, it will remove wanted data. Commented Sep 12, 2015 at 15:46
1

The sed command that worked for me is:

sed 's/ALA/A/g;s/CYS/C/g;s/ASP/D/g;s/GLU/E/g;s/PHE/F/g;s/GLY/G/g;s/HIS/H/g;s/HID/H/g;s/HIE/H/g;s/ILE/I/g;s/LYS/K/g;s/LEU/L/g;s/MET/M/g;s/ASN/N/g;s/PRO/P/g;s/GLN/Q/g;s/ARG/R/g;s/SER/S/g;s/THR/T/g;s/VAL/V/g;s/TRP/W/g;s/TYR/Y/g;s/MSE/X/g;s/ //g'  < old.txt > new.fasta

The sed commands cannot be piped. It has to be given as a single command.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.