1

I am working with two dataframes. One containing the pair of unique keys and the other having values of those unique keys. There are almost 5.8million pairs.

Dataframe 1 - pair_df

key1 key2
a b
a c
b c
e f

Dataframe 2 - key_value_df

key value
a 432
b 654
c 874
e 014
f 421

I want a dataframe in such a way that in the dataframe 1, the respective values for both keys are present as additional columns.

Required Dataframe

key1 key2 value1 value2
a b 432 654
a c 432 874
b c 654 874
e f 014 421

I tried with the following code:

def find_vect(key_value_df, pair_df, key_name, vect_name):
    pair_df[vect_name]=''
    count=0
    for idx1, pic1 in enumerate(pair_df[key_name]):  # key_name='key1
        for idx2, pic2 in enumerate(key_value_df['key']):
            if pic1==pic2:
                vect = list(key_value_df.iloc[idx2, 0:49])
                pair_df.loc[idx1, vect_name] = vect
                count+=1
                if count%10000==0:
                    print(count)

This is a simplified version of the actual code and might contain some error. The logic is working but due to huge number of data, it is taking a lot of time for the process. Also same code has to be rerun for the other key in dataframe 1. This is making the process very time consuming.

Is there any other efficient way to solve the problem?

1
  • 2
    There are ready to use operations like df.merge or df.join. Don't reinvent the wheel Commented Jul 6, 2021 at 7:22

1 Answer 1

1

Use Series.map by Series created from key_value_df by set key to index:

s = key_value_df.set_index('key')['value']

pair_df['value1'] = pair_df['key1'].map(s)
pair_df['value2'] = pair_df['key2'].map(s)
3
  • Hello @Jezrael, at first it seams that the solution works but actually it is creating list of list for the value. It is creating nested lists of same value. Commented Jul 6, 2021 at 18:53
  • @RajRajeshwariPrasad - So input data looks different like in question?
    – jezrael
    Commented Jul 7, 2021 at 5:11
  • 1
    Hello @Jezrael, I reworked on the solution you provided and found that the problem was in the input. The input itself was a nested list. I figured out the problem and now the your solution works absolutely fine. Thanks a lot. Commented Jul 7, 2021 at 6:46

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.