I have two dataframes with timestamp data. It is sensor readouts from different sources. I want to combine them. The left dataframe (df1) can be quite large as it will be a combination of multiple sources, the right dataframe (df2) will have max. 8 columns. Some cols of df2 may already be in df1, but there might be more or less timestamps with values. The timestamps may also be double. Some columns in df2 will be new to df1.
E.g.
df1 = pd.DataFrame(
{
"PT1": ["A0", "A1", "A2"],
"PT2": ["B0", "B1", "B2"],
"PT3": ["C0", "C1", "C2"],
},
index=pd.DatetimeIndex(["2025-05-01 10:00", "2025-05-01 10:01", "2025-05-01 10:02"]),
)
df2 = pd.DataFrame(
{
"PT1": ["A0", "A1", "A3"],
"PT4": ["D0", "D1", "D3"],
},
index=pd.DatetimeIndex(["2025-05-01 10:00", "2025-05-01 10:01", "2025-05-01 10:03"]),
)
I tried concat & merge, but either I don't get the Timestamps combined or I loose the index. :-/
Expected output would be:
df1updated = pd.DataFrame(
{
"PT1": ["A0", "A1", "A2", "A3"],
"PT2": ["B0", "B1", "B2", nan ],
"PT3": ["C0", "C1", "C2", nan ],
"PT4": ["D0", "D1", nan, "D3"],
},
index=pd.DatetimeIndex(["2025-05-01 10:00", "2025-05-01 10:01", "2025-05-01 10:02", , "2025-05-01 10:03"]),
)
Update after @ouroboros1 comment: Usually, there should only be double entries in the two dataframes, when the value is either the same or one of them is nan. Two different values could happen, but can be solved from the data source side. If the two values are different, it is because the source filled df2 with data from an earlier timestamp for that sensor. So I need to detect that somehow. But my plan was to do that on df2 before combining it with df1. E.g. by checking for duplicate values in df2 per column and repalcing them with nan again.
df2could also have:"PT1": ["A1000", "A1", "A3"]? And if so, what should happen? Always pick the value fromdf1("A0") and silently ignore the one fromdf2("A1000")? If so, or if this does not happen, all you need isdf1.combine_first(df2). If not, it would be useful to update your example with edge cases and explain what should happen when they appear.