0

I have 2 dataframes:

df1

   date               event    group    failure
2018-04-19 02:07:00     1       E1         0
2018-04-19 02:07:00     2       E2         1

df2:

        start_time                   end_time           group      failure
2018-04-01 00:00:00+01:00   2018-04-01 23:59:59+01:00     E1         1
2018-04-27 19:00:00+01:00   2018-04-27 21:29:59+01:00     E1         1
2018-04-27 06:00:00+01:00   2018-04-27 12:59:59+01:00     E1         1
2018-04-26 19:00:00+01:00   2018-04-26 21:29:59+01:00     E1         1
2018-04-26 06:00:00+01:00   2018-04-26 12:59:59+01:00     E1         1
2018-04-25 19:00:00+01:00   2018-04-25 21:29:59+01:00     E1         1
2018-04-25 06:00:00+01:00   2018-04-25 12:59:59+01:00     E1         1
2018-04-24 19:00:00+01:00   2018-04-24 21:29:59+01:00     E1         1
2018-04-24 06:00:00+01:00   2018-04-24 12:59:59+01:00     E1         1
2018-04-23 19:00:00+01:00   2018-04-23 21:29:59+01:00     E1         1
2018-04-23 06:00:00+01:00   2018-04-23 12:59:59+01:00     E1         1
2018-04-16 00:00:00+01:00   2018-04-22 23:59:59+01:00     E1         1
2018-04-28 00:00:00+01:00   2018-04-29 23:59:59+01:00     E1         1
2018-04-07 00:00:00+01:00   2018-04-08 23:59:59+01:00     E1         1
2018-04-06 19:00:00+01:00   2018-04-06 21:29:59+01:00     E1         1
2018-04-06 06:00:00+01:00   2018-04-06 12:59:59+01:00     E1         1
2018-04-09 00:00:00+01:00   2018-04-15 23:59:59+01:00     E1         1
2018-04-05 19:00:00+01:00   2018-04-05 21:29:59+01:00     E1         1
2018-04-04 06:00:00+01:00   2018-04-04 12:59:59+01:00     E1         1
2018-04-03 06:00:00+01:00   2018-04-03 12:59:59+01:00     E1         1
2018-04-02 00:00:00+01:00   2018-04-02 23:59:59+01:00     E1         1
2018-04-04 19:00:00+01:00   2018-04-04 21:29:59+01:00     E1         1
2018-04-05 06:00:00+01:00   2018-04-05 12:59:59+01:00     E1         1
2018-04-03 19:00:00+01:00   2018-04-03 21:29:59+01:00     E1         1
2018-04-27 06:00:00+01:00   2018-04-27 12:59:59+01:00     E2         1
2018-04-02 00:00:00+01:00   2018-04-02 23:59:59+01:00     E2         1
2018-04-26 19:00:00+01:00   2018-04-26 21:29:59+01:00     E2         1
2018-04-25 06:00:00+01:00   2018-04-25 12:59:59+01:00     E2         1
2018-04-03 06:00:00+01:00   2018-04-03 12:59:59+01:00     E2         1
2018-04-26 06:00:00+01:00   2018-04-26 12:59:59+01:00     E2         1
2018-04-27 19:00:00+01:00   2018-04-27 21:29:59+01:00     E2         1
2018-04-01 00:00:00+01:00   2018-04-01 23:59:59+01:00     E2         1
2018-04-25 19:00:00+01:00   2018-04-25 21:29:59+01:00     E2         1
2018-04-03 19:00:00+01:00   2018-04-03 21:29:59+01:00     E2         1
2018-04-24 19:00:00+01:00   2018-04-24 21:29:59+01:00     E2         1
2018-04-04 06:00:00+01:00   2018-04-04 12:59:59+01:00     E2         1
2018-04-24 06:00:00+01:00   2018-04-24 12:59:59+01:00     E2         1
2018-04-23 19:00:00+01:00   2018-04-23 21:29:59+01:00     E2         1
2018-04-04 19:00:00+01:00   2018-04-04 21:29:59+01:00     E2         1
2018-04-23 06:00:00+01:00   2018-04-23 12:59:59+01:00     E2         1
2018-04-16 00:00:00+01:00   2018-04-22 23:59:59+01:00     E2         1
2018-04-05 06:00:00+01:00   2018-04-05 12:59:59+01:00     E2         1
2018-04-09 00:00:00+01:00   2018-04-15 23:59:59+01:00     E2         1
2018-04-07 00:00:00+01:00   2018-04-08 23:59:59+01:00     E2         1
2018-04-05 19:00:00+01:00   2018-04-05 21:29:59+01:00     E2         1
2018-04-06 19:00:00+01:00   2018-04-06 21:29:59+01:00     E2         1
2018-04-06 06:00:00+01:00   2018-04-06 12:59:59+01:00     E2         1
2018-04-28 00:00:00+01:00   2018-04-29 23:59:59+01:00     E2         1

I have to check if:

  • df1(date) is between df2(start_time) and df2(end_time)

  • df1(group)=df2(group)

then replace df2(failure) with df1(failure). The desired outcome looks like:

        start_time                   end_time           group      failure
2018-04-01 00:00:00+01:00   2018-04-01 23:59:59+01:00     E1         1
2018-04-27 19:00:00+01:00   2018-04-27 21:29:59+01:00     E1         1
2018-04-27 06:00:00+01:00   2018-04-27 12:59:59+01:00     E1         1
2018-04-26 19:00:00+01:00   2018-04-26 21:29:59+01:00     E1         1
2018-04-26 06:00:00+01:00   2018-04-26 12:59:59+01:00     E1         1
2018-04-25 19:00:00+01:00   2018-04-25 21:29:59+01:00     E1         1
2018-04-25 06:00:00+01:00   2018-04-25 12:59:59+01:00     E1         1
2018-04-24 19:00:00+01:00   2018-04-24 21:29:59+01:00     E1         1
2018-04-24 06:00:00+01:00   2018-04-24 12:59:59+01:00     E1         1
2018-04-23 19:00:00+01:00   2018-04-23 21:29:59+01:00     E1         1
2018-04-23 06:00:00+01:00   2018-04-23 12:59:59+01:00     E1         1
2018-04-16 00:00:00+01:00   2018-04-22 23:59:59+01:00     E1         0
2018-04-28 00:00:00+01:00   2018-04-29 23:59:59+01:00     E1         1
2018-04-07 00:00:00+01:00   2018-04-08 23:59:59+01:00     E1         1
2018-04-06 19:00:00+01:00   2018-04-06 21:29:59+01:00     E1         1
2018-04-06 06:00:00+01:00   2018-04-06 12:59:59+01:00     E1         1
2018-04-09 00:00:00+01:00   2018-04-15 23:59:59+01:00     E1         1
2018-04-05 19:00:00+01:00   2018-04-05 21:29:59+01:00     E1         1
2018-04-04 06:00:00+01:00   2018-04-04 12:59:59+01:00     E1         1
2018-04-03 06:00:00+01:00   2018-04-03 12:59:59+01:00     E1         1
2018-04-02 00:00:00+01:00   2018-04-02 23:59:59+01:00     E1         1
2018-04-04 19:00:00+01:00   2018-04-04 21:29:59+01:00     E1         1
2018-04-05 06:00:00+01:00   2018-04-05 12:59:59+01:00     E1         1
2018-04-03 19:00:00+01:00   2018-04-03 21:29:59+01:00     E1         1
2018-04-27 06:00:00+01:00   2018-04-27 12:59:59+01:00     E2         1
2018-04-02 00:00:00+01:00   2018-04-02 23:59:59+01:00     E2         1
2018-04-26 19:00:00+01:00   2018-04-26 21:29:59+01:00     E2         1
2018-04-25 06:00:00+01:00   2018-04-25 12:59:59+01:00     E2         1
2018-04-03 06:00:00+01:00   2018-04-03 12:59:59+01:00     E2         1
2018-04-26 06:00:00+01:00   2018-04-26 12:59:59+01:00     E2         1
2018-04-27 19:00:00+01:00   2018-04-27 21:29:59+01:00     E2         1
2018-04-01 00:00:00+01:00   2018-04-01 23:59:59+01:00     E2         1
2018-04-25 19:00:00+01:00   2018-04-25 21:29:59+01:00     E2         1
2018-04-03 19:00:00+01:00   2018-04-03 21:29:59+01:00     E2         1
2018-04-24 19:00:00+01:00   2018-04-24 21:29:59+01:00     E2         1
2018-04-04 06:00:00+01:00   2018-04-04 12:59:59+01:00     E2         1
2018-04-24 06:00:00+01:00   2018-04-24 12:59:59+01:00     E2         1
2018-04-23 19:00:00+01:00   2018-04-23 21:29:59+01:00     E2         1
2018-04-04 19:00:00+01:00   2018-04-04 21:29:59+01:00     E2         1
2018-04-23 06:00:00+01:00   2018-04-23 12:59:59+01:00     E2         1
2018-04-16 00:00:00+01:00   2018-04-22 23:59:59+01:00     E2         1
2018-04-05 06:00:00+01:00   2018-04-05 12:59:59+01:00     E2         1
2018-04-09 00:00:00+01:00   2018-04-15 23:59:59+01:00     E2         1
2018-04-07 00:00:00+01:00   2018-04-08 23:59:59+01:00     E2         1
2018-04-05 19:00:00+01:00   2018-04-05 21:29:59+01:00     E2         1
2018-04-06 19:00:00+01:00   2018-04-06 21:29:59+01:00     E2         1
2018-04-06 06:00:00+01:00   2018-04-06 12:59:59+01:00     E2         1
2018-04-28 00:00:00+01:00   2018-04-29 23:59:59+01:00     E2         1

I have tried with if functions, but I get the error: Can only compare identically-labeled Series objects. Any suggestion? Thank you in advance!

7
  • try: df1[~df1.date.isin(df2.start_time.values)] at least this give you the idea if it matches the records you are looking in..
    – Karn Kumar
    Commented Oct 19, 2018 at 12:06
  • I don't get any error, but it doesn't change the value!
    – Luca91
    Commented Oct 19, 2018 at 12:09
  • are both of your columns datetime objects? the error you receive might indicate that one of them is a string and the other one is a datetime
    – Daneel R.
    Commented Oct 19, 2018 at 12:11
  • Yeah, it not change value just checking if the values from date columns are matching anywhere in the df2 start_time columns. replace logic still to implement..
    – Karn Kumar
    Commented Oct 19, 2018 at 12:11
  • @DanielR. he is not getting any error :-)
    – Karn Kumar
    Commented Oct 19, 2018 at 12:12

1 Answer 1

3

I could compare the dates after doing the following:-

e1['date'] = e1['date'].apply( lambda x: pd.to_datetime(x).tz_localize('US/Eastern'))
e2['start_time'] = e2['start_time'].apply( lambda x: 
pd.to_datetime(x).tz_localize('US/Eastern'))
e2['end_time'] = e2['end_time'].apply( lambda x: pd.to_datetime(x).tz_localize('US/Eastern'))

I merged both tables and then checked if date is between start time and end time to replace failure variable.

failure_x is of E2 while failure_y is of E1 dataframes:-

df = e2.merge(e1,on='group',how='left')
df['failure_x'] = np.where((df['start_time'] <= df['date']) & (df['date'] <=  df['end_time']), df['failure_y'], df['failure_x'])
1
  • haven't you mistaken a bit with all those "df"? I mean, which should be df, df1 or df2
    – Luca91
    Commented Oct 19, 2018 at 13:07

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.