Row-wise python operations faster than apply

Question

I'm working with OptionMetrics data, and I want to check my calculation of forward prices on the SPX index. The problem I'm having is that the rates information I've been given is discrete, so I have to provide an interpolation for each calculation. With apply, this is horifically inefficient.

To be precise: I have a dataframe of forward prices, F. F contains columns date, expiration, and forwardprice. date and expiration are datetime.date types, and forwardprice has floats. My information on rates is contained in a separate dataframe, r. On a specific date, r contains a forward curve on that date, i.e.:

r[r['date'] == example_date] will yield something like: pd.DataFrame({'date': [dt.date(2022, 1, 3), dt.date(2022, 1, 3)], 'days': [10, 20], 'rate': [0.05, 0.06]})

On each date in F, I need to calculate the days to expiration (I can do this), and then find the appropriate rate from the forward curve by linear interpolation. (In the above example, this means on Jan 3rd, 2022, my forward price for Jan 18 (+15 days) should come out to 0.055).

The only way I could think about doing this was: F['rate'] = F.apply(lambda x: np.interp((x.expiration - x.date).days, r.loc[r['date'] == x.date, 'days'], r.loc[r['date'] == x.date, 'rate']), axis=1)

This approach is not practical for the size of the dataframes I'm working with, so I was curious to see if there's a better way. How can I make this more efficient? Do I need to approach the problem from a different way entirely? Thanks.

Reproducible example, please.

Reinderien
– Reinderien

2025-06-22 22:31:17 +00:00
Commented Jun 22 at 22:31 — Reinderien
– Reinderien, Commented Jun 22 at 22:31
you can use parallel_apply from pandarallel if it helps

micro5
– micro5

2025-06-23 06:46:43 +00:00
Commented Jun 23 at 6:46 — micro5
– micro5, Commented Jun 23 at 6:46

mozway · Accepted Answer · 2025-06-23 08:51:49Z

An efficient (but a bit verbose) approach would be to perform a double merge_asof to identify the lower/upper bound per date, then manually interpolate and merge back to the original:

# compute the days_diff for F
F['days_diff'] = F['expiration'].sub(F['date']).dt.days
# sort both inputs by days_diff
r2 = r.sort_values(by='days')
F2 = F.sort_values(by='days_diff')

# perform a backward and forward merge_asof
asof_b = pd.merge_asof(F2, r2, left_on='days_diff', right_on='days',
                       by='date', direction='backward')
asof_f = pd.merge_asof(F2, r2, left_on='days_diff', right_on='days',
                       by='date', direction='forward')


# compute the interpolation and update the asof_b
tmp = (asof_f-asof_b)
rate = asof_b['rate'].add(tmp['rate'].div(tmp['days'])
                          .mul(asof_f['days_diff'].sub(asof_b['days'])))
asof_b.update(rate.rename('rate'))

# merge to original
cols = ['date', 'days_diff']
out = F.merge(asof_b[cols+['rate']], on=cols, how='left')

Example output:

        date expiration  forwardprice  days_diff   rate
0 2022-01-03 2022-01-18         100.0         15  0.055
1 2022-01-03 2022-01-23         101.0         20  0.060
2 2022-01-04 2022-01-24         102.0         20  0.060

Used inputs (from the now deleted answer from @Hermann12):

F = pd.DataFrame({
    'date': pd.to_datetime(['2022-01-03', '2022-01-03', '2022-01-04']),
    'expiration': pd.to_datetime(['2022-01-18', '2022-01-23', '2022-01-24']),
    'forwardprice': [100.0, 101.0, 102.0]
})

r = pd.DataFrame({
    'date': pd.to_datetime(['2022-01-03', '2022-01-03', '2022-01-04', '2022-01-04']),
    'days': [10, 20, 10, 30],
    'rate': [0.05, 0.06, 0.055, 0.065]
})

Variant to avoid the last merge by saving the original index instead:

# compute the days_diff for F
F['days_diff'] = F['expiration'].sub(F['date']).dt.days
# sort both inputs by days_diff, save the index for F2
r2 = r.sort_values(by='days')
F2 = F.reset_index().sort_values(by='days_diff')

# perform a backward and forward merge_asof
asof_b = pd.merge_asof(F2, r2, left_on='days_diff', right_on='days', by='date', direction='backward')
asof_f = pd.merge_asof(F2, r2, left_on='days_diff', right_on='days', by='date', direction='forward')


# compute the interpolation and update the asof_b
tmp = (asof_f-asof_b)
rate = asof_b['rate'].add(tmp['rate'].div(tmp['days']).mul(asof_f['days_diff'].sub(asof_b['days'])))
asof_b.update(rate.rename('rate'))

# assign rate to original DataFrame
F['rate'] = rate.fillna(asof_b['rate']).set_axis(F2['index'])

Collectives™ on Stack Overflow

Row-wise python operations faster than apply

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related