Differences in an array based on groups defined by another array

Question

I have two arrays of the same size. One, call it A, contains a series of repeated numbers; the other, B contains random numbers.

import numpy as np

A = np.array([1,1,1,2,2,2,0,0,0,3,3])
B = np.array([1,2,3,6,5,4,7,8,9,10,11])

I need to find the differences in B between the two extremes defined by the groups in A. More specifically, I need an output C such as

C = [2, -2, 2, 1]

where each term is the difference 3 - 1, 4 - 6, 9 - 7, and 11 - 10, i.e., the difference between the extremes in B identified by the groups of repeated numbers in A.

I tried to play around with itertools.groupby to isolate the groups in the first array, but it is not clear to me how to exploit the indexing to operate the differences in the second.

Does each group have a unique value or something like this [0, 0, 1, 1, 0, 0] is possible? — Ch3steR, Commented Feb 5, 2022 at 13:10

Joe · Accepted Answer · 2022-02-05 12:40:39Z

1

Edit: C is now sorted the same way as in the question

C = []
_, idx = np.unique(A, return_index=True)
for i in A[np.sort(idx)]:
    bs = B[A==i]
    C.append(bs[-1] - bs[0])

print(C) // [2, -2, 2, 1]

np.unique returns, for each unique value in A, the index of the first appearance of it.

i in A[np.sort(idx)] iterates over the unique values in the order of the indexes.

B[A==i] extracts the values from B at the same indexes as those values in A.

answered Feb 5, 2022 at 12:40

Joe

3311 silver badge7 bronze badges

Add a comment |

mozway · Accepted Answer · 2022-02-05 12:47:12Z

0

This is easily achieved using pandas' groupby:

A = np.array([1,1,1,2,2,2,0,0,0,3,3])
B = np.array([1,2,3,6,5,4,7,8,9,10,11])

import pandas as pd
pd.Series(B).groupby(A, sort=False).agg(lambda g: g.iloc[-1]-g.iloc[0]).to_numpy()

output: array([ 2, -2, 2, 1])

using itertools.groupby:

from itertools import groupby

[(x:=list(g))[-1][1]-x[0][1] for k, g in groupby(zip(A,B), lambda x: x[0])]

output: [2, -2, 2, 1]

NB. Note that the two solutions will behave differently if there are different non-consecutive groups

edited Feb 5, 2022 at 12:47

answered Feb 5, 2022 at 12:33

mozway

264k13 gold badges50 silver badges99 bronze badges

Thank you, this is a great starting point, but—if you notice—the output is not the one desired. I think that groupby is ordering the groups, while I want them to remain in the order given by the vector A.
– FDP
Commented Feb 5, 2022 at 12:40
@FDP yes, I fixed that (pandas groupby sorts the groups by default)
– mozway
Commented Feb 5, 2022 at 12:41
Instead of lambda you could use out = _.agg(['first', 'last']); out['last'] - out['first'] I guess this should be faster than lambda as far as I know.
– Ch3steR
Commented Feb 5, 2022 at 13:26

Add a comment |

Collectives™ on Stack Overflow

Differences in an array based on groups defined by another array

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related