Here is my dataframe:
df= pd.DataFrame(
{"mat" : ['A' ,'A', 'A', 'A', 'B'],
"ppl" : ['P', 'P', 'P', '', 'P'],
"ia1" : ['', 'X', 'X', '', 'X'],
"ia2" : ['X', '', '', 'X', 'X']},
index = [1, 2, 3, 4, 5])
I want to select unique values on the two first columns. I do:
df2 = df.loc[:,['mat','ppl']].drop_duplicates(subset=['mat','ppl']).sort_values(by=['mat','ppl'])
I get, as expected:
mat ppl
4 A
1 A P
5 B P
What I want now is, df3 to be:
mat ppl ia1 ia2
A X
A P X X
B P X X
That is: in df3
for row A+P, in column ia1, I got an X because there is a X in column ia1 in one of the row of df
, for A+P