1

I have two arrays of the same size. One, call it A, contains a series of repeated numbers; the other, B contains random numbers.

import numpy as np

A = np.array([1,1,1,2,2,2,0,0,0,3,3])
B = np.array([1,2,3,6,5,4,7,8,9,10,11])

I need to find the differences in B between the two extremes defined by the groups in A. More specifically, I need an output C such as

C = [2, -2, 2, 1]

where each term is the difference 3 - 1, 4 - 6, 9 - 7, and 11 - 10, i.e., the difference between the extremes in B identified by the groups of repeated numbers in A.

I tried to play around with itertools.groupby to isolate the groups in the first array, but it is not clear to me how to exploit the indexing to operate the differences in the second.

2
  • Does each group have a unique value or something like this [0, 0, 1, 1, 0, 0] is possible?
    – Ch3steR
    Commented Feb 5, 2022 at 13:10
  • Yes, something like your example would be possible.
    – FDP
    Commented Feb 5, 2022 at 15:56

2 Answers 2

1

Edit: C is now sorted the same way as in the question

C = []
_, idx = np.unique(A, return_index=True)
for i in A[np.sort(idx)]:
    bs = B[A==i]
    C.append(bs[-1] - bs[0])

print(C) // [2, -2, 2, 1]

np.unique returns, for each unique value in A, the index of the first appearance of it.

i in A[np.sort(idx)] iterates over the unique values in the order of the indexes.

B[A==i] extracts the values from B at the same indexes as those values in A.

0

This is easily achieved using pandas' groupby:

A = np.array([1,1,1,2,2,2,0,0,0,3,3])
B = np.array([1,2,3,6,5,4,7,8,9,10,11])

import pandas as pd
pd.Series(B).groupby(A, sort=False).agg(lambda g: g.iloc[-1]-g.iloc[0]).to_numpy()

output: array([ 2, -2, 2, 1])

using itertools.groupby:

from itertools import groupby

[(x:=list(g))[-1][1]-x[0][1] for k, g in groupby(zip(A,B), lambda x: x[0])]

output: [2, -2, 2, 1]

NB. Note that the two solutions will behave differently if there are different non-consecutive groups

3
  • Thank you, this is a great starting point, but—if you notice—the output is not the one desired. I think that groupby is ordering the groups, while I want them to remain in the order given by the vector A.
    – FDP
    Commented Feb 5, 2022 at 12:40
  • @FDP yes, I fixed that (pandas groupby sorts the groups by default)
    – mozway
    Commented Feb 5, 2022 at 12:41
  • Instead of lambda you could use out = _.agg(['first', 'last']); out['last'] - out['first'] I guess this should be faster than lambda as far as I know.
    – Ch3steR
    Commented Feb 5, 2022 at 13:26

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.