Group by and sum in shell script without awk

Question

I have a file like:

$ cat input.csv
201,100
201,300
300,100
300,500
100,400

I want to add the values in column 2 which has same value in column 1. Expected output is as follows:

$ cat output.csv
201,400
300,600
100,400

I tried to do this by awk command but it is not working in Solaris. Please provide some alternative.

On Solaris, use nawk or /usr/xpg4/bin/awk, or add a PATH=`getconf PATH`:$PATH as the one in /bin is an ancient non-standard one. — Stéphane Chazelas
– Stéphane Chazelas, Commented Nov 21, 2014 at 11:11
The answers here focus on one-liners and custom scripts. For those looking for an existing utility, see this question: unix.stackexchange.com/q/85204/41737 — Nickolay
– Nickolay, Commented Jun 17, 2015 at 12:51

Valentin Bajrami · Accepted Answer · 2014-11-21 13:59:49Z

6

I think this'll do:

awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{ for (i in a) print i,a[i]}'

answered Nov 21, 2014 at 13:59

Valentin Bajrami

9,5773 gold badges28 silver badges39 bronze badges

3

The title: "group by and sum in shell script without awk"

jimmij
– jimmij

2014-11-21 14:25:29 +00:00
Commented Nov 21, 2014 at 14:25
3

The answer is great, AWK rocks!!!

Kannan Mohan
– Kannan Mohan

2014-11-21 14:32:43 +00:00
Commented Nov 21, 2014 at 14:32
So use uniq or sort, but if OP explicitly ask for non-awk solution I believe that should be respected.

jimmij
– jimmij

2014-11-21 14:41:07 +00:00
Commented Nov 21, 2014 at 14:41
@jimmij I am curious on your sh answer. If you can achieve the above in sh only, I'll remove my answer!

Valentin Bajrami
– Valentin Bajrami

2014-11-21 14:43:30 +00:00
Commented Nov 21, 2014 at 14:43
1

@jimmij I'm late to the party, but the only reason the question says "without awk" seems to be that the user couldn't get their own code to do the right thing ("I tried to do this by awk command, but it is not working in Solaris"). Showing an awk command that does do the right thing would therefore be helpful, and even better if it would work with the default awk on Solaris...

Kusalananda
– Kusalananda ♦

2025-04-23 06:33:15 +00:00
Commented Apr 23 at 6:33

| Show 2 more comments

jimmij · Accepted Answer · 2014-11-21 14:35:19Z

4

Pure bash, one-liner:

unset x y sum; while IFS=, read x y; do ((sum[$x]+=y)); done <  input.csv; for i in ${!sum[@]}; do echo $i,${sum[$i]}; done

Or in more readable form:

unset x y sum
while IFS=, read x y; do
    ((sum[$x]+=y)); done < input.csv
for i in ${!sum[@]}; do
    echo $i,${sum[$i]}
done

The result:

100,400
201,400
300,600

answered Nov 21, 2014 at 14:35

jimmij

48.7k20 gold badges136 silver badges141 bronze badges

Add a comment |

Kannan Mohan · Accepted Answer · 2014-11-21 14:31:19Z

0

With python this can be done more effectively. This program by default expects the file to be named as 'file.txt', which you can change if needed.

#!/usr/bin/env python3

col1, col2 = [ list(y) for y in zip(*[ x.strip().split(',') for x in open('file.txt').readlines() if x != '\n' ]) ]

for (offset,x) in enumerate(list(col1)):
    value = 0
    while col1.count(x) > 1:
        index = col1.index(x)
        col1.pop(index)
        value =  int(col2.pop(index))

        index = col1.index(x)
        col2[index] = int(col2[index]) + value

for x, y in zip(col1, col2):
    print(x,',',y)

Output:

201 , 400
300 , 600
100 , 400

edited Nov 21, 2014 at 14:31

answered Nov 21, 2014 at 14:20

Kannan Mohan

3,3492 gold badges20 silver badges16 bronze badges

Just use sys.argv[1] for a filename or read from sys.stdin, if filename is not specified.

orion
– orion

2015-02-11 08:37:56 +00:00
Commented Feb 11, 2015 at 8:37

Add a comment |

Kusalananda · Accepted Answer · 2025-04-23 04:49:45Z

0

Using Miller (mlr) to calculate the sum of the second field while grouping using the first field. The input is read as a header-less CSV file:

$ mlr --csv -N stats1 -a sum -f 2 -g 1 file
201,400
300,600
100,400

Instead of --csv -N ("header-less CSV input and output"), you could use --nidx --fs comma ("comma-separated index-numbered (toolkit style) input and output").

edited Apr 23 at 4:49

answered Apr 13 at 11:06

Kusalananda♦

356k42 gold badges737 silver badges1.1k bronze badges

Add a comment |

Stack Exchange Network

Group by and sum in shell script without awk

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

Group by and sum in shell script without awk

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions