df1:
Date Cycle
0 2021-02-07 C01
1 2021-02-08 C01
2 2021-02-14 C02
3 2021-06-15 C02
4 2021-02-28 C03
df2:
From_Date To_Date
0 2021-02-07 2021-02-13
1 2021-02-14 2021-02-27
2 2021-02-28 2021-03-03
First, let's make sure that dates are of datetime type:
df1['Date'] = pd.to_datetime(df1['Date'], format='%d.%m.%Y')
df2['From_Date'] = pd.to_datetime(df2['From_Date'], format='%d.%m.%Y')
df2['To_Date'] = pd.to_datetime(df2['To_Date'], format='%d.%m.%Y')
Construct IntervalIndex for df2:
>>> df2.index = pd.IntervalIndex.from_arrays(df2['From_Date'], df2['To_Date'],closed='both')
>>> df2
From_Date To_Date
[2021-02-07, 2021-02-13] 2021-02-07 2021-02-13
[2021-02-14, 2021-02-27] 2021-02-14 2021-02-27
[2021-02-28, 2021-03-03] 2021-02-28 2021-03-03
Define function to map Date in df1 to the range of dates in df2, and compute new column in df1 to store this range:
def get_date(d):
try:
return df2.loc[d].name
except KeyError:
pass
df1['index'] = df1['Date'].apply(get_date)
output:
Date Cycle index
0 2021-02-07 C01 [2021-02-07, 2021-02-13]
1 2021-02-08 C01 [2021-02-07, 2021-02-13]
2 2021-02-14 C02 [2021-02-14, 2021-02-27]
3 2021-06-15 C02 NaN
4 2021-02-28 C03 [2021-02-28, 2021-03-03]
Merge the two dataframes on "index" and filter the columns:
df2.reset_index().merge(df1, on='index')[['From_Date', 'To_Date', 'Cycle']]
From_Date To_Date Cycle
0 2021-02-07 2021-02-13 C01
1 2021-02-07 2021-02-13 C01
2 2021-02-14 2021-02-27 C02
3 2021-02-28 2021-03-03 C03
If you really want to merge only on the first df1 value for each range you can groupby and keep the first, assuming the merge is now df3:
df3.groupby(['From_Date', 'To_Date'], as_index=False).first()
output:
From_Date To_Date Cycle
0 2021-02-07 2021-02-13 C01
1 2021-02-14 2021-02-27 C02
2 2021-02-28 2021-03-03 C03
Full code:
df1 = pd.DataFrame({'Date': ['02.07.2021', '08.02.2021', '14.02.2021', '15.06.2021', '28.02.2021'],
'Cycle': ['C01', 'C01', 'C02', 'C02', 'C03']})
df2 = pd.DataFrame({'From_Date': ['07.02.2021', '14.02.2021', '28.02.2021'],
'To_Date': ['13.02.2021', '27.02.2021', '03.03.2021']})
df1['Date'] = pd.to_datetime(df1['Date'], format='%d.%m.%Y')
df2['From_Date'] = pd.to_datetime(df2['From_Date'], format='%d.%m.%Y')
df2['To_Date'] = pd.to_datetime(df2['To_Date'], format='%d.%m.%Y')
df2.index = pd.IntervalIndex.from_arrays(df2['From_Date'], df2['To_Date'], closed='both')
def get_date(d):
try:
return df2.loc[d].name
except KeyError:
pass
df1['index'] = df1['Date'].apply(get_date)
df3 = df2.reset_index().merge(df1, on='index')[['From_Date', 'To_Date', 'Cycle']]
df3.groupby(['From_Date', 'To_Date'], as_index=False).first()