I am trying to understand if there is any way to do when..then..otherwise in polars and assign to multiple columns. I have a elo dataset with millions of rows where I want to assign the current elo to anything greater than date. In pandas, I would do
elo_df.loc[(id, date:)), ["elo", "true_skill_mu", "true_skill_sigma"]] = elo, true_skill_mu, true_skill_sigma
The code below works but is very slow. I am hoping I can increase the speed by at least 3x by making the filter happen once. Also, if have any suggestion on to how tomake this faster, please let me know.
elo_df = elo_df.with_columns([pl.when((pl.col("id") == col) & (pl.col("date") >= date)).then(pl.lit(new_rating)).otherwise(pl.col("elo")).alias("elo"),
pl.when((pl.col("id") == col) & (pl.col("date") >= date)).then(pl.lit(new_mu)).otherwise(pl.col("true_skill_mu")).alias("true_skill_mu"),
pl.when((pl.col("id") == col) & (pl.col("date") >= date)).then(pl.lit(new_sigma)).otherwise(pl.col("true_skill_sigma")).alias("true_skill_sigma")]
with_columns
context, the three when/then/otherwise expressions will run in parallel (as long as your CPU has at least 3 cores). So from a wall-clock standpoint, you will not gain much by trying to rewrite them as one filter. That said, are you updating large batches of id's at one time? If so, then there is a speed-up for that.