Using Raku (formerly known as Perl_6)
~$ raku -MText::CSV -e '
my @a = csv(in => $*IN, sep => "|", escape_char => "", allow_loose_quotes => 1);
my $index = @a>>.[0].pairs.sort(*.value).map: *.key;
@a = @a.[$index.cache]; csv(in => @a, out => $*OUT, sep => "|");' < file
Here maybe the best approach is to use an authentic CSV-parser. Raku is a programming language in the Perl-family. Raku's Text::CSV
module has been developed by the same author/maintainer of Perl(5)'s Text::CSV_XS
module (H. Merijn Brand).
The file is read-in using the module's high-level csv()
subroutine. Data is taken as $*IN
STDIN. The separator is set appropriate to the file. The two other input parameters, escape_char => "", allow_loose_quotes => 1
accept quoting as-is from the input file (this is a trick to force double-quoting of any unescaped double-quotes, see first link at bottom). Data is stored in@a
, an array.
Once you have data stored in an array, you don't need to substitute @@@
in place of |
. The code above gives the following sorted output (sorting here is assumed to be row-wise sorting based on Column 1. Note for row-wise sorting you create an $index
based on @a>>.[0]
values in the 1st column, then @a.[$index.cache]
apply the index row-wise.
Sample Input:
Z|"Per Sara Porras.|, LLC"|column2_data|column3_data
A|column1_data|"column2|data"|"column3|data"
Sample Output 1 (from code at top):
A|column1_data|"column2|data"|"column3|data"
Z|"Per Sara Porras.|, LLC"|column2_data|column3_data
Maybe you want row-wise sorting, substituting @@@
in place of |
. However this has the effect of removing quotes from the output (any fields without whitespace), as follows:
Sample Output 2:
~$ raku -MText::CSV -e '
my @a = csv(in => $*IN, sep => "|", escape_char => "", allow_loose_quotes => 1);
@a = @a>>.map(*.subst(:global, / \| /, "@@@"));
my $index = @a>>.[0].pairs.sort(*.value).map: *.key;
@a = @a.[$index.cache]; csv(in => @a, out => $*OUT, sep => "|");' < file
A|column1_data|column2@@@data|column3@@@data
Z|"Per Sara Porras.@@@, LLC"|column2_data|column3_data
You could get around this by always-quote
-ing the output, i.e. add always-quote => True
to the end of the final statement. Let's combine this approach with a method for taking the separator from the environment--just look it up in Raku's %*ENV
hash:
Sample Output 3:
~$ env ifs="|" raku -MText::CSV -e '
my @a = csv(in => $*IN, sep => %*ENV<ifs>, escape_char => "", allow_loose_quotes => 1);
@a = @a>>.map(*.subst(:global, / \| /, "@@@"));
my $index = @a>>.[0].pairs.sort(*.value).map: *.key;
@a = @a.[$index.cache]; csv(in => @a, out => $*OUT, sep => "|", always-quote => True);' < file
"A"|"column1_data"|"column2@@@data"|"column3@@@data"
"Z"|"Per Sara Porras.@@@, LLC"|"column2_data"|"column3_data"
Note: You can get back the quoting you started with by replacing embedded |
bar character with a character-sequence containing whitespace, such as @@ @@
(instead of @@@
). The CSV-parser will only quote whitespace-containing columns by default:
Sample Output 4:
~$ env ifs="|" raku -MText::CSV -e '
my @a = csv(in => $*IN, sep => %*ENV<ifs>, escape_char => "", allow_loose_quotes => 1);
@a = @a>>.map(*.subst(:global, / \| /, "@@ @@"));
my $index = @a>>.[0].pairs.sort(*.value).map: *.key;
@a = @a.[$index.cache]; csv(in => @a, out => $*OUT, sep => "|");' < file
A|column1_data|"column2@@ @@data"|"column3@@ @@data"
Z|"Per Sara Porras.@@ @@, LLC"|column2_data|column3_data
Finally, if you really really want column-wise sorting, the only way that makes sense is to set an index row such as a header row. For ideas on column-wise sorting, see the second link below.
https://unix.stackexchange.com/a/775855/227738
https://unix.stackexchange.com/a/746864/227738
https://raku.org
IFS
in the shell, that is just a very bad tool for this kind of thing."
marks seen so far on the line. An odd number means you're inside a quoted field; an even-number (or zero) means you're not.column2_data
to end up in the first column? Perhaps you're trying to sort rows by the first column instead? Thx.