Skip to main content
Active reading [<https://en.wikipedia.org/wiki/NumPy>]. Removed historical information (that is what the revision history is for)—the answer should be as if it was written right now; see e.g. <https://meta.stackexchange.com/a/131011> (near "Changelogs").
Source Link
Peter Mortensen
  • 31.2k
  • 22
  • 111
  • 134

Update: cs95 has updated hiscs95's answer to includeincludes plain numpyNumPy vectorization. You can simply refer to his answer.


cs95 shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to cs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

Update: cs95 has updated his answer to include plain numpy vectorization. You can simply refer to his answer.


cs95 shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to cs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

cs95's answer includes plain NumPy vectorization. You can simply refer to his answer.


cs95 shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to cs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

Updated with note regarding changes to cs95's answer
Source Link
bug_spray
  • 1.5k
  • 2
  • 12
  • 24

Update: cs95 has updated his answer to include plain numpy vectorization. You can simply refer to his answer.


cs95 shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to cs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

cs95 shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to cs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

Update: cs95 has updated his answer to include plain numpy vectorization. You can simply refer to his answer.


cs95 shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to cs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

Fixed edit blunder: cs95 did actually answer.
Source Link
Peter Mortensen
  • 31.2k
  • 22
  • 111
  • 134

Wes McKinney showscs95 shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to @cs95'scs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

Wes McKinney shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to @cs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

cs95 shows that Pandas vectorization far outperforms other Pandas methods for computing stuff with dataframes.

I wanted to add that if you first convert the dataframe to a NumPy array and then use vectorization, it's even faster than Pandas dataframe vectorization, (and that includes the time to turn it back into a dataframe series).

If you add the following functions to cs95's benchmark code, this becomes pretty evident:

def np_vectorization(df):
    np_arr = df.to_numpy()
    return pd.Series(np_arr[:,0] + np_arr[:,1], index=df.index)

def just_np_vectorization(df):
    np_arr = df.to_numpy()
    return np_arr[:,0] + np_arr[:,1]

Enter image description here

Active reading. Removed meta information (this belongs in comments). User "cs95" did not answer, only edited (two answers of which one was the accepted one)
Source Link
Peter Mortensen
  • 31.2k
  • 22
  • 111
  • 134
Loading
Source Link
bug_spray
  • 1.5k
  • 2
  • 12
  • 24
Loading