0

While merging two df, it seems to be adding duplicates rows some how.

I do need to keep the exactly number of rows of db in the left.

data:

# main data
df = pd.DataFrame({ "campaign_name": ["111","222","333"], "leads": [1, 2, 1] })

# reff table
dim_campaign = pd.DataFrame({ "campaign_name": ["111","222","333"], "Type": ["a", "b" , "c"] })


# counting number leads
df.campaign_name.value_counts()

my code:

The problem is.. after merging and verify number of rows has increase. I do want keep all the original rows of "df" and just add the info of columns that matches.

df = df.groupby("campaign_name")["leads"].sum()


df = pd.merge(df, dim_campaign[["campaign_name", "Type"]],on='campaign_name', how='left')


x =df.loc[df.campaign_name=="222"]
x.leads.sum()

# it gives a higher value

10
  • I'm trying to replicate your problem and I can't - I see 3 rows in your merged and unmerged dataframes. Can you elaborate? Commented Jan 2, 2020 at 23:53
  • 1
    have you tried doing df.drop_duplicates()? Commented Jan 2, 2020 at 23:59
  • 1
    also, I'm wondering whether an "inner join" wouldn't be better suited? An inner join would ensure that only matches are merged. Commented Jan 3, 2020 at 0:03
  • 1
    inner joins the intersection of both sets, so I don't think it's what you're looking for, come to think of it. Drop duplicates is therefore the best fit, I think. Commented Jan 3, 2020 at 0:09
  • 1
    Your question as it stands doesn't match with the answer, since obviously there will not be duplicates there.
    – cs95
    Commented Jan 3, 2020 at 0:18

1 Answer 1

1

I would suggest:

df.drop_duplicates()

To remove duplicates from a left or right join.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.