Revisions to How can I iterate over rows in a Pandas DataFrame?

added 809 characters in body

Source Link

edited Jan 20 at 22:32

56.9k
35
302
399

Alternatives

Try the Polars Python library. Written in Rust, it's supposed to be waaay faster than Pandas.

From their homepage: https://pola.rs/ (emphasis added):

Built by developers for developers to achieve up to 50x performance

Why use Polars

Polars was benchmarked in a derived version of the independent TPC-H benchmark against several other solutions. This benchmark aims to replicate data wrangling operations used in practice. Polars easily trumps other solutions due to its parallel execution engine, efficient algorithms and use of vectorization with SIMD (Single Instruction, Multiple Data). Compared to pandas, it can achieve more than 30x performance gains.

added 5 characters in body

Source Link

edited Jan 20 at 22:05

Gabriel Staples

56.9k
35
302
399

deleted 60 characters in body

Source Link

edited Dec 12, 2025 at 17:08

Gabriel Staples

56.9k
35
302
399

Technique 5: 5_itertuples_in_for_loop

# pre-allocate a `val` array of the appropriate size
val = [np.NAN]*len(df)
# Now iterate over all rows in the dataframe, and populate `val`
for row in df.itertuples():
    val[row.Index] = calculate_val(
        row.A_i_minus_2,
        row.A_i_minus_1,
        row.A,
        row.A_i_plus_1,
        row.B,
        row.C,
        row.D,
    )

df["val"] = val  # put this column back into the dataframe

Technique 6: 6_vectorization__with_apply_for_if_statement_corner_case

def calculate_new_column_b_value(b_value):
    # Python ternary operator
    b_value_new = (6 * b_value) if b_value > 0 else (60 * b_value)
    return b_value_new

# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, pure vectorization is less intuitive. So, first we'll
# calculate a new `B` column using
# **`apply()`**, then we'll use vectorization for the rest.
df["B_new"] = df["B"].apply(calculate_new_column_b_value)
# OR (same thing, but with a lambda function instead)
# df["B_new"] = df["B"].apply(lambda x: (6 * x) if x > 0 else (60 * x))

# Now we can use vectorization for the rest. "Vectorization" in this case
# means to simply use the column series variables in equations directly,
# without manually iterating over them. Pandas DataFrames will handle the
# underlying iteration automatically for you. You just focus on the math.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 7: 7_vectorization__with_list_comprehension_for_if_statment_corner_case

# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, pure vectorization is less intuitive. So, first we'll
# calculate a new `B` column using **list comprehension**, then we'll use
# vectorization for the rest.
df["B_new"] = [
    calculate_new_column_b_value(b_value) for b_value in df["B"]
]

# Now we can use vectorization for the rest. "Vectorization" in this case
# means to simply use the column series variables in equations directly,
# without manually iterating over them. Pandas DataFrames will handle the
# underlying iteration automatically for you. You just focus on the math.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 8: 8_pure_vectorization__with_df.loc[]_boolean_array_indexing_for_if_statment_corner_case

This uses boolean indexing, AKA: a boolean mask, to accomplish the equivalent of the if statement in the equation. In this way, pure vectorization can be used for the entire equation, thereby maximizing performance and speed.

# If statement to evaluate:
#
#     if B > 0:
#         B_new = 6 * B
#     else:
#         B_new = 60 * B
#
# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, we can use some boolean array indexing through
# `df.loc[]` for some pure vectorization magic.
#
# Explanation:
#
# Long:
#
# The format is: `df.loc[rows, columns]`, except in this case, the rows are
# specified by a "boolean array" (AKA: a boolean expression, list of
# booleans, or "boolean mask"), specifying all rows where `B` is > 0. Then,
# only in that `B` column for those rows, set the value accordingly. After
# we do this for where `B` is > 0, we do the same thing for where `B`
# is <= 0, except with the other equation.
#
# Short:
#
# For all rows where the boolean expression applies, set the column value
# accordingly.
#
# GitHub CoPilot first showedOfficial medocumentation thison `.loc[]` technique.
# See also the official documentation:
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
#
# ===========================
# 1st: handle the > 0 case
# ===========================
df["B_new"] = df.loc[df["B"] > 0, "B"] * 6
#
# ===========================
# 2nd: handle the <= 0 case, merging the results into the
# previously-created "B_new" column
# ===========================
# - NB: this does NOT work; it overwrites and replaces the whole "B_new"
#   column instead:
#
#       df["B_new"] = df.loc[df["B"] <= 0, "B"] * 60
#
# This works:
df.loc[df["B"] <= 0, "B_new"] = df.loc[df["B"] <= 0, "B"] * 60

# Now use normal vectorization for the rest.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 9: 9_apply_function_with_lambda

df["val"] = df.apply(
    lambda row: calculate_val(
        row["A_i_minus_2"],
        row["A_i_minus_1"],
        row["A"],
        row["A_i_plus_1"],
        row["B"],
        row["C"],
        row["D"]
    ),
    axis='columns' # same as `axis=1`: "apply function to each row",
                   # rather than to each column
)

Technique 10: 10_list_comprehension_w_zip_and_direct_variable_assignment_passed_to_func

df["val"] = [
    # Note: you *could* do the calculations directly here instead of using a
    # function call, so long as you don't have indented code blocks such as
    # sub-routines or multi-line if statements.
    #
    # I'm using a function call.
    calculate_val(
        A_i_minus_2,
        A_i_minus_1,
        A,
        A_i_plus_1,
        B,
        C,
        D
    ) for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 11: 11_list_comprehension_w_zip_and_direct_variable_assignment_calculated_in_place

df["val"] = [
    2 * A_i_minus_2
    + 3 * A_i_minus_1
    + 4 * A
    + 5 * A_i_plus_1
    # Python ternary operator; don't forget parentheses around the entire
    # ternary expression!
    + ((6 * B) if B > 0 else (60 * B))
    + 7 * C
    - 8 * D
    for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 12: 12_list_comprehension_w_zip_and_row_tuple_passed_to_func

df["val"] = [
    calculate_val(
        row[0],
        row[1],
        row[2],
        row[3],
        row[4],
        row[5],
        row[6],
    ) for row
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 13: 13_list_comprehension_w__to_numpy__and_direct_variable_assignment_passed_to_func

df["val"] = [
    # Note: you *could* do the calculations directly here instead of using a
    # function call, so long as you don't have indented code blocks such as
    # sub-routines or multi-line if statements.
    #
    # I'm using a function call.
    calculate_val(
        A_i_minus_2,
        A_i_minus_1,
        A,
        A_i_plus_1,
        B,
        C,
        D
    ) for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
        # Note: this `[[...]]` double-bracket indexing is used to select a
        # subset of columns from the dataframe. The inner `[]` brackets
        # create a list from the column names within them, and the outer
        # `[]` brackets accept this list to index into the dataframe and
        # select just this list of columns, in that order.
        # - See the official documentation on it here:
        #   https://pandas.pydata.org/docs/user_guide/indexing.html#basics
        #   - Search for the phrase "You can pass a list of columns to [] to
        #     select columns in that order."
        #   - I learned this from this comment here:
        #     https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas/55557758#comment136020567_55557758
        # - One of the **list comprehension** examples in this answer here
        #   uses `.to_numpy()` like this:
        #   https://stackoverflow.com/a/55557758/4561887
    in df[[
        "A_i_minus_2",
        "A_i_minus_1",
        "A",
        "A_i_plus_1",
        "B",
        "C",
        "D"
    ]].to_numpy()  # NB: `.values` works here too, but is deprecated. See:
                   # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html
]

Technique 5: 5_itertuples_in_for_loop

# pre-allocate a `val` array of the appropriate size
val = [np.NAN]*len(df)
# Now iterate over all rows in the dataframe, and populate `val`
for row in df.itertuples():
    val[row.Index] = calculate_val(
        row.A_i_minus_2,
        row.A_i_minus_1,
        row.A,
        row.A_i_plus_1,
        row.B,
        row.C,
        row.D,
    )

df["val"] = val  # put this column back into the dataframe

Technique 6: 6_vectorization__with_apply_for_if_statement_corner_case

def calculate_new_column_b_value(b_value):
    # Python ternary operator
    b_value_new = (6 * b_value) if b_value > 0 else (60 * b_value)
    return b_value_new

# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, pure vectorization is less intuitive. So, first we'll
# calculate a new `B` column using
# **`apply()`**, then we'll use vectorization for the rest.
df["B_new"] = df["B"].apply(calculate_new_column_b_value)
# OR (same thing, but with a lambda function instead)
# df["B_new"] = df["B"].apply(lambda x: (6 * x) if x > 0 else (60 * x))

# Now we can use vectorization for the rest. "Vectorization" in this case
# means to simply use the column series variables in equations directly,
# without manually iterating over them. Pandas DataFrames will handle the
# underlying iteration automatically for you. You just focus on the math.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 7: 7_vectorization__with_list_comprehension_for_if_statment_corner_case

# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, pure vectorization is less intuitive. So, first we'll
# calculate a new `B` column using **list comprehension**, then we'll use
# vectorization for the rest.
df["B_new"] = [
    calculate_new_column_b_value(b_value) for b_value in df["B"]
]

# Now we can use vectorization for the rest. "Vectorization" in this case
# means to simply use the column series variables in equations directly,
# without manually iterating over them. Pandas DataFrames will handle the
# underlying iteration automatically for you. You just focus on the math.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 8: 8_pure_vectorization__with_df.loc[]_boolean_array_indexing_for_if_statment_corner_case

This uses boolean indexing, AKA: a boolean mask, to accomplish the equivalent of the if statement in the equation. In this way, pure vectorization can be used for the entire equation, thereby maximizing performance and speed.

# If statement to evaluate:
#
#     if B > 0:
#         B_new = 6 * B
#     else:
#         B_new = 60 * B
#
# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, we can use some boolean array indexing through
# `df.loc[]` for some pure vectorization magic.
#
# Explanation:
#
# Long:
#
# The format is: `df.loc[rows, columns]`, except in this case, the rows are
# specified by a "boolean array" (AKA: a boolean expression, list of
# booleans, or "boolean mask"), specifying all rows where `B` is > 0. Then,
# only in that `B` column for those rows, set the value accordingly. After
# we do this for where `B` is > 0, we do the same thing for where `B`
# is <= 0, except with the other equation.
#
# Short:
#
# For all rows where the boolean expression applies, set the column value
# accordingly.
#
# GitHub CoPilot first showed me this `.loc[]` technique.
# See also the official documentation:
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
#
# ===========================
# 1st: handle the > 0 case
# ===========================
df["B_new"] = df.loc[df["B"] > 0, "B"] * 6
#
# ===========================
# 2nd: handle the <= 0 case, merging the results into the
# previously-created "B_new" column
# ===========================
# - NB: this does NOT work; it overwrites and replaces the whole "B_new"
#   column instead:
#
#       df["B_new"] = df.loc[df["B"] <= 0, "B"] * 60
#
# This works:
df.loc[df["B"] <= 0, "B_new"] = df.loc[df["B"] <= 0, "B"] * 60

# Now use normal vectorization for the rest.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 9: 9_apply_function_with_lambda

df["val"] = df.apply(
    lambda row: calculate_val(
        row["A_i_minus_2"],
        row["A_i_minus_1"],
        row["A"],
        row["A_i_plus_1"],
        row["B"],
        row["C"],
        row["D"]
    ),
    axis='columns' # same as `axis=1`: "apply function to each row",
                   # rather than to each column
)

Technique 10: 10_list_comprehension_w_zip_and_direct_variable_assignment_passed_to_func

df["val"] = [
    # Note: you *could* do the calculations directly here instead of using a
    # function call, so long as you don't have indented code blocks such as
    # sub-routines or multi-line if statements.
    #
    # I'm using a function call.
    calculate_val(
        A_i_minus_2,
        A_i_minus_1,
        A,
        A_i_plus_1,
        B,
        C,
        D
    ) for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 11: 11_list_comprehension_w_zip_and_direct_variable_assignment_calculated_in_place

df["val"] = [
    2 * A_i_minus_2
    + 3 * A_i_minus_1
    + 4 * A
    + 5 * A_i_plus_1
    # Python ternary operator; don't forget parentheses around the entire
    # ternary expression!
    + ((6 * B) if B > 0 else (60 * B))
    + 7 * C
    - 8 * D
    for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 12: 12_list_comprehension_w_zip_and_row_tuple_passed_to_func

df["val"] = [
    calculate_val(
        row[0],
        row[1],
        row[2],
        row[3],
        row[4],
        row[5],
        row[6],
    ) for row
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 13: 13_list_comprehension_w__to_numpy__and_direct_variable_assignment_passed_to_func

df["val"] = [
    # Note: you *could* do the calculations directly here instead of using a
    # function call, so long as you don't have indented code blocks such as
    # sub-routines or multi-line if statements.
    #
    # I'm using a function call.
    calculate_val(
        A_i_minus_2,
        A_i_minus_1,
        A,
        A_i_plus_1,
        B,
        C,
        D
    ) for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
        # Note: this `[[...]]` double-bracket indexing is used to select a
        # subset of columns from the dataframe. The inner `[]` brackets
        # create a list from the column names within them, and the outer
        # `[]` brackets accept this list to index into the dataframe and
        # select just this list of columns, in that order.
        # - See the official documentation on it here:
        #   https://pandas.pydata.org/docs/user_guide/indexing.html#basics
        #   - Search for the phrase "You can pass a list of columns to [] to
        #     select columns in that order."
        #   - I learned this from this comment here:
        #     https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas/55557758#comment136020567_55557758
        # - One of the **list comprehension** examples in this answer here
        #   uses `.to_numpy()` like this:
        #   https://stackoverflow.com/a/55557758/4561887
    in df[[
        "A_i_minus_2",
        "A_i_minus_1",
        "A",
        "A_i_plus_1",
        "B",
        "C",
        "D"
    ]].to_numpy()  # NB: `.values` works here too, but is deprecated. See:
                   # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html
]

Technique 5: 5_itertuples_in_for_loop

# pre-allocate a `val` array of the appropriate size
val = [np.NAN]*len(df)
# Now iterate over all rows in the dataframe, and populate `val`
for row in df.itertuples():
    val[row.Index] = calculate_val(
        row.A_i_minus_2,
        row.A_i_minus_1,
        row.A,
        row.A_i_plus_1,
        row.B,
        row.C,
        row.D,
    )

df["val"] = val  # put this column back into the dataframe

Technique 6: 6_vectorization__with_apply_for_if_statement_corner_case

def calculate_new_column_b_value(b_value):
    # Python ternary operator
    b_value_new = (6 * b_value) if b_value > 0 else (60 * b_value)
    return b_value_new

# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, pure vectorization is less intuitive. So, first we'll
# calculate a new `B` column using
# **`apply()`**, then we'll use vectorization for the rest.
df["B_new"] = df["B"].apply(calculate_new_column_b_value)
# OR (same thing, but with a lambda function instead)
# df["B_new"] = df["B"].apply(lambda x: (6 * x) if x > 0 else (60 * x))

# Now we can use vectorization for the rest. "Vectorization" in this case
# means to simply use the column series variables in equations directly,
# without manually iterating over them. Pandas DataFrames will handle the
# underlying iteration automatically for you. You just focus on the math.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 7: 7_vectorization__with_list_comprehension_for_if_statment_corner_case

# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, pure vectorization is less intuitive. So, first we'll
# calculate a new `B` column using **list comprehension**, then we'll use
# vectorization for the rest.
df["B_new"] = [
    calculate_new_column_b_value(b_value) for b_value in df["B"]
]

# Now we can use vectorization for the rest. "Vectorization" in this case
# means to simply use the column series variables in equations directly,
# without manually iterating over them. Pandas DataFrames will handle the
# underlying iteration automatically for you. You just focus on the math.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 8: 8_pure_vectorization__with_df.loc[]_boolean_array_indexing_for_if_statment_corner_case

This uses boolean indexing, AKA: a boolean mask, to accomplish the equivalent of the if statement in the equation. In this way, pure vectorization can be used for the entire equation, thereby maximizing performance and speed.

# If statement to evaluate:
#
#     if B > 0:
#         B_new = 6 * B
#     else:
#         B_new = 60 * B
#
# In this particular example, since we have an embedded `if-else` statement
# for the `B` column, we can use some boolean array indexing through
# `df.loc[]` for some pure vectorization magic.
#
# Explanation:
#
# Long:
#
# The format is: `df.loc[rows, columns]`, except in this case, the rows are
# specified by a "boolean array" (AKA: a boolean expression, list of
# booleans, or "boolean mask"), specifying all rows where `B` is > 0. Then,
# only in that `B` column for those rows, set the value accordingly. After
# we do this for where `B` is > 0, we do the same thing for where `B`
# is <= 0, except with the other equation.
#
# Short:
#
# For all rows where the boolean expression applies, set the column value
# accordingly.
#
# Official documentation on `.loc[]`:
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
#
# ===========================
# 1st: handle the > 0 case
# ===========================
df["B_new"] = df.loc[df["B"] > 0, "B"] * 6
#
# ===========================
# 2nd: handle the <= 0 case, merging the results into the
# previously-created "B_new" column
# ===========================
# - NB: this does NOT work; it overwrites and replaces the whole "B_new"
#   column instead:
#
#       df["B_new"] = df.loc[df["B"] <= 0, "B"] * 60
#
# This works:
df.loc[df["B"] <= 0, "B_new"] = df.loc[df["B"] <= 0, "B"] * 60

# Now use normal vectorization for the rest.
df["val"] = (
    2 * df["A_i_minus_2"]
    + 3 * df["A_i_minus_1"]
    + 4 * df["A"]
    + 5 * df["A_i_plus_1"]
    + df["B_new"]
    + 7 * df["C"]
    - 8 * df["D"]
)

Technique 9: 9_apply_function_with_lambda

df["val"] = df.apply(
    lambda row: calculate_val(
        row["A_i_minus_2"],
        row["A_i_minus_1"],
        row["A"],
        row["A_i_plus_1"],
        row["B"],
        row["C"],
        row["D"]
    ),
    axis='columns' # same as `axis=1`: "apply function to each row",
                   # rather than to each column
)

Technique 10: 10_list_comprehension_w_zip_and_direct_variable_assignment_passed_to_func

df["val"] = [
    # Note: you *could* do the calculations directly here instead of using a
    # function call, so long as you don't have indented code blocks such as
    # sub-routines or multi-line if statements.
    #
    # I'm using a function call.
    calculate_val(
        A_i_minus_2,
        A_i_minus_1,
        A,
        A_i_plus_1,
        B,
        C,
        D
    ) for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 11: 11_list_comprehension_w_zip_and_direct_variable_assignment_calculated_in_place

df["val"] = [
    2 * A_i_minus_2
    + 3 * A_i_minus_1
    + 4 * A
    + 5 * A_i_plus_1
    # Python ternary operator; don't forget parentheses around the entire
    # ternary expression!
    + ((6 * B) if B > 0 else (60 * B))
    + 7 * C
    - 8 * D
    for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 12: 12_list_comprehension_w_zip_and_row_tuple_passed_to_func

df["val"] = [
    calculate_val(
        row[0],
        row[1],
        row[2],
        row[3],
        row[4],
        row[5],
        row[6],
    ) for row
    in zip(
        df["A_i_minus_2"],
        df["A_i_minus_1"],
        df["A"],
        df["A_i_plus_1"],
        df["B"],
        df["C"],
        df["D"]
    )
]

Technique 13: 13_list_comprehension_w__to_numpy__and_direct_variable_assignment_passed_to_func

df["val"] = [
    # Note: you *could* do the calculations directly here instead of using a
    # function call, so long as you don't have indented code blocks such as
    # sub-routines or multi-line if statements.
    #
    # I'm using a function call.
    calculate_val(
        A_i_minus_2,
        A_i_minus_1,
        A,
        A_i_plus_1,
        B,
        C,
        D
    ) for A_i_minus_2, A_i_minus_1, A, A_i_plus_1, B, C, D
        # Note: this `[[...]]` double-bracket indexing is used to select a
        # subset of columns from the dataframe. The inner `[]` brackets
        # create a list from the column names within them, and the outer
        # `[]` brackets accept this list to index into the dataframe and
        # select just this list of columns, in that order.
        # - See the official documentation on it here:
        #   https://pandas.pydata.org/docs/user_guide/indexing.html#basics
        #   - Search for the phrase "You can pass a list of columns to [] to
        #     select columns in that order."
        #   - I learned this from this comment here:
        #     https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas/55557758#comment136020567_55557758
        # - One of the **list comprehension** examples in this answer here
        #   uses `.to_numpy()` like this:
        #   https://stackoverflow.com/a/55557758/4561887
    in df[[
        "A_i_minus_2",
        "A_i_minus_1",
        "A",
        "A_i_plus_1",
        "B",
        "C",
        "D"
    ]].to_numpy()  # NB: `.values` works here too, but is deprecated. See:
                   # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html
]

deleted 787 characters in body

Source Link

edited Dec 12, 2025 at 12:46

Gabriel Staples

56.9k
35
302
399

Loading

Post Undeleted by blackgreen♦

occurred Dec 12, 2025 at 9:16

Post Deleted by Dalija Prasnikar♦

occurred Dec 6, 2025 at 10:04

added 127 characters in body

Source Link

edited Feb 9, 2025 at 15:20

Gabriel Staples

56.9k
35
302
399

Loading

added 143 characters in body

Source Link

edited Jan 18, 2025 at 19:47

Gabriel Staples

56.9k
35
302
399

Loading

Fixed the weird syntax highlighting (as a result, the diff looks more extensive than it really is - use view "Side-by-side Markdown" to compare).

Source Link

edited Mar 13, 2024 at 7:23

Peter Mortensen

31.2k
22
111
134

Loading

added 173 characters in body

Source Link

edited Dec 8, 2023 at 15:13

Gabriel Staples

56.9k
35
302
399

Loading

added 125 characters in body

Source Link

edited Nov 6, 2023 at 5:42

Gabriel Staples

56.9k
35
302
399

Loading

added 54 characters in body

Source Link

edited Nov 4, 2023 at 16:24

Gabriel Staples

56.9k
35
302
399

Loading

added 333 characters in body

Source Link

edited Nov 2, 2023 at 0:23

Gabriel Staples

56.9k
35
302
399

Loading

added 213 characters in body

Source Link

edited Nov 1, 2023 at 22:19

Gabriel Staples

56.9k
35
302
399

Loading

deleted 2249 characters in body

Source Link

edited Oct 12, 2023 at 19:19

Gabriel Staples

56.9k
35
302
399

Loading

added 73 characters in body

Source Link

edited Oct 12, 2023 at 0:39

Gabriel Staples

56.9k
35
302
399

Loading

added 914 characters in body

Source Link

edited Oct 11, 2023 at 20:50

Gabriel Staples

56.9k
35
302
399

Loading

deleted 5 characters in body

Source Link

edited Oct 11, 2023 at 6:48

Gabriel Staples

56.9k
35
302
399

Loading

deleted 1124 characters in body

Source Link

edited Oct 11, 2023 at 6:10

Gabriel Staples

56.9k
35
302
399

Loading

added 275 characters in body

Source Link

edited Oct 11, 2023 at 5:58

Gabriel Staples

56.9k
35
302
399

Loading

deleted 15 characters in body

Source Link

edited Oct 11, 2023 at 5:51

Gabriel Staples

56.9k
35
302
399

Loading

added 111 characters in body

Source Link

edited Oct 11, 2023 at 5:13

Gabriel Staples

56.9k
35
302
399

Loading

added 541 characters in body

Source Link

edited Oct 11, 2023 at 5:03

Gabriel Staples

56.9k
35
302
399

Loading

Source Link

answered Oct 11, 2023 at 4:54

Gabriel Staples

56.9k
35
302
399

Loading

Collectives™ on Stack Overflow

Return to Answer

Post Timeline

Alternatives

Built by developers for developers to achieve up to 50x performance

Why use Polars

Alternatives

Built by developers for developers to achieve up to 50x performance

Why use Polars