0

I have a file like:

$ cat input.csv
201,100
201,300
300,100
300,500
100,400

I want to add the values in column 2 which has same value in column 1. Expected output is as follows:

$ cat output.csv
201,400
300,600
100,400

I tried to do this by awk command but it is not working in Solaris. Please provide some alternative.

3
  • 1
    Please show us your awk code. Commented Nov 21, 2014 at 11:05
  • 2
    On Solaris, use nawk or /usr/xpg4/bin/awk, or add a PATH=`getconf PATH`:$PATH as the one in /bin is an ancient non-standard one. Commented Nov 21, 2014 at 11:11
  • The answers here focus on one-liners and custom scripts. For those looking for an existing utility, see this question: unix.stackexchange.com/q/85204/41737 Commented Jun 17, 2015 at 12:51

4 Answers 4

6

I think this'll do:

awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{ for (i in a) print i,a[i]}'
7
  • 3
    The title: "group by and sum in shell script without awk" Commented Nov 21, 2014 at 14:25
  • 3
    The answer is great, AWK rocks!!! Commented Nov 21, 2014 at 14:32
  • So use uniq or sort, but if OP explicitly ask for non-awk solution I believe that should be respected. Commented Nov 21, 2014 at 14:41
  • @jimmij I am curious on your sh answer. If you can achieve the above in sh only, I'll remove my answer! Commented Nov 21, 2014 at 14:43
  • 1
    @jimmij I'm late to the party, but the only reason the question says "without awk" seems to be that the user couldn't get their own code to do the right thing ("I tried to do this by awk command, but it is not working in Solaris"). Showing an awk command that does do the right thing would therefore be helpful, and even better if it would work with the default awk on Solaris... Commented Apr 23 at 6:33
4

Pure bash, one-liner:

unset x y sum; while IFS=, read x y; do ((sum[$x]+=y)); done <  input.csv; for i in ${!sum[@]}; do echo $i,${sum[$i]}; done

Or in more readable form:

unset x y sum
while IFS=, read x y; do
    ((sum[$x]+=y)); done < input.csv
for i in ${!sum[@]}; do
    echo $i,${sum[$i]}
done

The result:

100,400
201,400
300,600
0

With python this can be done more effectively. This program by default expects the file to be named as 'file.txt', which you can change if needed.

#!/usr/bin/env python3

col1, col2 = [ list(y) for y in zip(*[ x.strip().split(',') for x in open('file.txt').readlines() if x != '\n' ]) ]

for (offset,x) in enumerate(list(col1)):
    value = 0
    while col1.count(x) > 1:
        index = col1.index(x)
        col1.pop(index)
        value =  int(col2.pop(index))

        index = col1.index(x)
        col2[index] = int(col2[index]) + value

for x, y in zip(col1, col2):
    print(x,',',y)

Output:

201 , 400
300 , 600
100 , 400
1
  • Just use sys.argv[1] for a filename or read from sys.stdin, if filename is not specified. Commented Feb 11, 2015 at 8:37
0

Using Miller (mlr) to calculate the sum of the second field while grouping using the first field. The input is read as a header-less CSV file:

$ mlr --csv -N stats1 -a sum -f 2 -g 1 file
201,400
300,600
100,400

Instead of --csv -N ("header-less CSV input and output"), you could use --nidx --fs comma ("comma-separated index-numbered (toolkit style) input and output").

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.