0

I'm using the following gawk script to read values from the first column of the csv file file.csv.
I use gawk since I don't want any embedded commas to be ignored.

col=`gawk ' 
BEGIN {
FPAT="([^,]*)|(\"[^\"]*\")+"
}
{print $1 }' file.csv`

However, I noticed that if the empty string/space is in the last row, this method ignores it.

For example, if the file.csv is the following:

col1,col2
"a,a","a,a1" 
"b","b1" 
,"c1"  

The result would be

col1
a,a
b 

instad of

col1
a,a
b 

What can I do to fix this issue?

Thank you!

Related post: Reading empty string from CSV file in BASH

9
  • The command substitution (backticks) strips trailing newlines. What is your end goal here? there may be better ways than trying to grab a multiline string into a shell variable. Commented Jul 29, 2021 at 20:53
  • I'm processing the values later on so it's important for me to know all the values in every row, even if it's an empty string. I want to read a column from a csv file into a shall variable which I then turn into a shell array such as arr=("a,a" "b" "")
    – lilek3
    Commented Jul 29, 2021 at 21:01
  • 2
    If your version of bash supports process substitution, I suggest you skip the scalar variable and read the lines straight into the array arr=(); while IFS= read -r line; do arr+=("$line"); done < <(gawk ...) like we discussed in Putting string with newline separated words into array Commented Jul 29, 2021 at 22:25
  • It's not without reason CSV has never been embraced on *nix (to my knowledge). It is a terrible format for text-processing. Are these static files? Could you convert it to another format first? Any option to get what ever generates these files to use another format? Even TSV would likely be a lot better. I.e. unix.stackexchange.com/q/359832/140633
    – ibuprofen
    Commented Jul 30, 2021 at 0:06
  • 1
    @EdMorton Indeed. This fails termbin.com/mxkx - but this is OK termbin.com/senr , I'll have to look more at that later. Thanks for the heads up.
    – ibuprofen
    Commented Jul 30, 2021 at 17:05

1 Answer 1

1

As mentioned in the comments under your previous question, this has nothing to do with CSVs or your awk script, it's all about how you're saving the output of a command.

$ printf 'a\nb\n\n'
a
b

$ col=$(printf 'a\nb\n\n')
$ printf '%s' "$col"
a
b$

$ col=$(printf 'a\nb\n\n'; printf x)
$ printf '%s' "$col"
a
b

x$
$ col="${col%x}"
$ printf '%s' "$col"
a
b

$

Note that with the above you're getting the whole output of the command saved in the variable, including the final newline that command substitution would have stripped off. If you want to remove a final newline too then do a subsequent:

$ col="${col%$'\n'}"
$ echo "$col"
a
b

$ printf '%s' "$col"
a
b
$

The reason to remove the x and the \n in 2 steps rather than doing a single col="$(col%$'\n'x}" is that that would fail if the command had produced no output or output that didn't end in a \n because then \nx wouldn't exist in col:

Right:

$ col=$(printf 'a'; printf x)
$ col="${col%x}"
$ col="${col%$'\n'}"
$ printf '%s' "$col"
a$

Wrong:

$ col=$(printf 'a'; printf x)
$ col="${col%$'\n'x}"
$ printf '%s' "$col"
ax$

To learn more about the issue take a look at "Command Substitution" in:

  1. The POSIX standard's Shell Execution Environment section where it says:

The shell shall expand the command substitution by executing command in a subshell environment (see Shell Execution Environment) and replacing the command substitution (the text of command plus the enclosing "$()" or backquotes) with the standard output of the command, removing sequences of one or more characters at the end of the substitution.

  1. https://mywiki.wooledge.org/CommandSubstitution where it discusses the issue further and provides the workaround I used above.
1

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.