1

My lack of experience in working with python in the past year has made me rather rusty, and I'm getting back into coding again.

I have a dataframe of events that has a 'start_time' column. What I need to do is create an 'end_time' column that has a time value that is 1 second less than the next row's start_time. This is an ask for doing event time calculations.

The desired output:

start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08  09:15:03

I currently have code that will accomplish this, but from anything I've read here, and from what I remember, I really shouldn't be iterating through a dataframe in a for loop.

for ndx, row in df.iterrows():
    if ndx != df[atnp_df.columns[0]].count() - 1:
        df.iloc[ndx, 9] = pd.to_datetime(df.iloc[ndx+1, 8]) - timedelta(seconds=1)

(Hey, it works, but it's slow...)

How do I do this pythonically? I know the solution should be something like this:

df['end_time'] = pd.to_datetime(df['start_time']) - timedelta(seconds=1)

But, this subtracts 1 second from the start_time in the same row. I'm not quite sure how to access the next row's start time in this way.

Any and all help is greatly appreciated!

1 Answer 1

4

offsets

df.assign(end_time=pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1)))

  start_time        end_time
0   00:00:00 0 days 07:59:59
1   08:00:00 0 days 08:20:04
2   08:20:05 0 days 08:29:19
3   08:29:20 0 days 08:29:20
4   08:29:21 0 days 08:35:14
5   08:35:15 0 days 08:55:21
6   08:55:22 0 days 08:57:20
7   08:57:21 0 days 09:02:23
8   09:02:24 0 days 09:14:07
9   09:14:08             NaT

A little cleaned up and returning formatted strings:

s = pd.to_timedelta(df.start_time).shift(-1).sub(pd.offsets.Second(1))

df.assign(end_time=s.add(pd.Timestamp('now').normalize()).dt.time.astype(str))

  start_time  end_time
0   00:00:00  07:59:59
1   08:00:00  08:20:04
2   08:20:05  08:29:19
3   08:29:20  08:29:20
4   08:29:21  08:35:14
5   08:35:15  08:55:21
6   08:55:22  08:57:20
7   08:57:21  09:02:23
8   09:02:24  09:14:07
9   09:14:08       NaT
​
Sign up to request clarification or add additional context in comments.

4 Comments

Wow, this is beyond what I thought would be the solution. Thanks for your help!
'pd.Timestamp('now')' has me a bit stumped. I can find plenty of documentation on pd.Timestamp(), but not anything showing passing 'now' as an argument to it. Is this behaving the same way as pd.to_datetime('now')?
Yes. And it’s just a trick. It could have been any time stamp. I use it to be able to peal of the time component.
Thanks. I appreciate your help on this and answering my question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.