pd.merge is adding extra rows, duplicates

Question

While merging two df, it seems to be adding duplicates rows some how.

I do need to keep the exactly number of rows of db in the left.

data:

# main data
df = pd.DataFrame({ "campaign_name": ["111","222","333"], "leads": [1, 2, 1] })

# reff table
dim_campaign = pd.DataFrame({ "campaign_name": ["111","222","333"], "Type": ["a", "b" , "c"] })


# counting number leads
df.campaign_name.value_counts()

my code:

The problem is.. after merging and verify number of rows has increase. I do want keep all the original rows of "df" and just add the info of columns that matches.

df = df.groupby("campaign_name")["leads"].sum()


df = pd.merge(df, dim_campaign[["campaign_name", "Type"]],on='campaign_name', how='left')


x =df.loc[df.campaign_name=="222"]
x.leads.sum()

# it gives a higher value

I'm trying to replicate your problem and I can't - I see 3 rows in your merged and unmerged dataframes. Can you elaborate? — Hayden Eastwood, Commented Jan 2, 2020 at 23:53
also, I'm wondering whether an "inner join" wouldn't be better suited? An inner join would ensure that only matches are merged. — Hayden Eastwood, Commented Jan 3, 2020 at 0:03
inner joins the intersection of both sets, so I don't think it's what you're looking for, come to think of it. Drop duplicates is therefore the best fit, I think. — Hayden Eastwood, Commented Jan 3, 2020 at 0:09
Your question as it stands doesn't match with the answer, since obviously there will not be duplicates there. — cs95, Commented Jan 3, 2020 at 0:18

Hayden Eastwood · Accepted Answer · 2020-01-03 00:16:36Z

1

I would suggest:

df.drop_duplicates()

To remove duplicates from a left or right join.

answered Jan 3, 2020 at 0:16

Hayden Eastwood

9662 gold badges10 silver badges21 bronze badges

Add a comment |

Collectives™ on Stack Overflow

pd.merge is adding extra rows, duplicates

The problem is.. after merging and verify number of rows has increase. I do want keep all the original rows of "df" and just add the info of columns that matches.

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

The problem is.. after merging and verify number of rows has increase. I do want keep all the original rows of "df" and just add the info of columns that matches.

1 Answer 1

Related