71

The question

Given a Series s and DataFrame df, how do I operate on each column of df with s?

df = pd.DataFrame(
    [[1, 2, 3], [4, 5, 6]],
    index=[0, 1],
    columns=['a', 'b', 'c']
)

s = pd.Series([3, 14], index=[0, 1])

When I attempt to add them, I get all np.nan

df + s

    a   b   c   0   1
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN

What I thought I should get is

    a   b   c
0   4   5   6
1  18  19  20

Objective and motivation

I've seen this kind of question several times over and have seen many other questions that involve some element of this. Most recently, I had to spend a bit of time explaining this concept in comments while looking for an appropriate canonical Q&A. I did not find one and so I thought I'd write one.

These questions usually arises with respect to a specific operation, but equally applies to most arithmetic operations.

  • How do I subtract a Series from every column in a DataFrame?
  • How do I add a Series from every column in a DataFrame?
  • How do I multiply a Series from every column in a DataFrame?
  • How do I divide a Series from every column in a DataFrame?
1

4 Answers 4

82
+250

It is helpful to create a mental model of what Series and DataFrame objects are.

Anatomy of a Series

A Series should be thought of as an enhanced dictionary. This isn't always a perfect analogy, but we'll start here. Also, there are other analogies that you can make, but I am targeting a dictionary in order to demonstrate the purpose of this post.

index

These are the keys that we can reference to get at the corresponding values. When the elements of the index are unique, the comparison to a dictionary becomes very close.

values

These are the corresponding values that are keyed by the index.

Anatomy of a DataFrame

A DataFrame should be thought of as a dictionary of Series or a Series of Series. In this case the keys are the column names and the values are the columns themselves as Series objects. Each Series agrees to share the same index which is the index of the DataFrame.

columns

These are the keys that we can reference to get at the corresponding Series.

index

This the the index that all of the Series values agree to share.

Note: RE: columns and index objects

They are the same kind of things. A DataFrames index can be used as another DataFrames columns. In fact, this happens when you do df.T to get a transpose.

values

This is a two-dimensional array that contains the data in a DataFrame. The reality is that values is not what is stored inside the DataFrame object. (Well, sometimes it is, but I'm not about to try to describe the block manager). The point is, it is better to think of this as access to a two-dimensional array of the data.


Define Sample Data

These are sample pandas.Index objects that can be used as the index of a Series or DataFrame or can be used as the columns of a DataFrame:

idx_lower = pd.Index([*'abcde'], name='lower')
idx_range = pd.RangeIndex(5, name='range')

These are sample pandas.Series objects that use the pandas.Index objects above:

s0 = pd.Series(range(10, 15), idx_lower)
s1 = pd.Series(range(30, 40, 2), idx_lower)
s2 = pd.Series(range(50, 10, -8), idx_range)

These are sample pandas.DataFrame objects that use the pandas.Index objects above:

df0 = pd.DataFrame(100, index=idx_range, columns=idx_lower)
df1 = pd.DataFrame(
    np.arange(np.product(df0.shape)).reshape(df0.shape),
    index=idx_range, columns=idx_lower
)

Series on Series

When operating on two Series, the alignment is obvious. You align the index of one Series with the index of the other.

s1 + s0

lower
a    40
b    43
c    46
d    49
e    52
dtype: int64

Which is the same as when I randomly shuffle one before I operate. The indices will still align.

s1 + s0.sample(frac=1)

lower
a    40
b    43
c    46
d    49
e    52
dtype: int64

And is not the case when instead I operate with the values of the shuffled Series. In this case, Pandas doesn't have the index to align with and therefore operates from a positions.

s1 + s0.sample(frac=1).values

lower
a    42
b    42
c    47
d    50
e    49
dtype: int64

Add a scalar

s1 + 1

lower
a    31
b    33
c    35
d    37
e    39
dtype: int64

DataFrame on DataFrame

The similar is true when operating between two DataFrames. The alignment is obvious and does what we think it should do:

df0 + df1

lower    a    b    c    d    e
range
0      100  101  102  103  104
1      105  106  107  108  109
2      110  111  112  113  114
3      115  116  117  118  119
4      120  121  122  123  124

It shuffles the second DataFrame on both axes. The index and columns will still align and give us the same thing.

df0 + df1.sample(frac=1).sample(frac=1, axis=1)

lower    a    b    c    d    e
range
0      100  101  102  103  104
1      105  106  107  108  109
2      110  111  112  113  114
3      115  116  117  118  119
4      120  121  122  123  124

It is the same shuffling, but it adds the array and not the DataFrame. It is no longer aligned and will get different results.

df0 + df1.sample(frac=1).sample(frac=1, axis=1).values

lower    a    b    c    d    e
range
0      123  124  121  122  120
1      118  119  116  117  115
2      108  109  106  107  105
3      103  104  101  102  100
4      113  114  111  112  110

Add a one-dimensional array. It will align with columns and broadcast across rows.

df0 + [*range(2, df0.shape[1] + 2)]

lower    a    b    c    d    e
range
0      102  103  104  105  106
1      102  103  104  105  106
2      102  103  104  105  106
3      102  103  104  105  106
4      102  103  104  105  106

Add a scalar. There isn't anything to align with, so broadcasts to everything:

df0 + 1

lower    a    b    c    d    e
range
0      101  101  101  101  101
1      101  101  101  101  101
2      101  101  101  101  101
3      101  101  101  101  101
4      101  101  101  101  101

DataFrame on Series

If DataFrames are to be thought of as dictionaries of Series and Series are to be thought of as dictionaries of values, then it is natural that when operating between a DataFrame and Series that they should be aligned by their "keys".

s0:
lower    a    b    c    d    e
        10   11   12   13   14

df0:
lower    a    b    c    d    e
range
0      100  100  100  100  100
1      100  100  100  100  100
2      100  100  100  100  100
3      100  100  100  100  100
4      100  100  100  100  100

And when we operate, the 10 in s0['a'] gets added to the entire column of df0['a']:

df0 + s0

lower    a    b    c    d    e
range
0      110  111  112  113  114
1      110  111  112  113  114
2      110  111  112  113  114
3      110  111  112  113  114
4      110  111  112  113  114

The heart of the issue and point of the post

What about if I want s2 and df0?

s2:               df0:

             |    lower    a    b    c    d    e
range        |    range
0      50    |    0      100  100  100  100  100
1      42    |    1      100  100  100  100  100
2      34    |    2      100  100  100  100  100
3      26    |    3      100  100  100  100  100
4      18    |    4      100  100  100  100  100

When I operate, I get the all np.nan as cited in the question:

df0 + s2

        a   b   c   d   e   0   1   2   3   4
range
0     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4     NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

This does not produce what we wanted, because Pandas is aligning the index of s2 with the columns of df0. The columns of the result includes a union of the index of s2 and the columns of df0.

We could fake it out with a tricky transposition:

(df0.T + s2).T

lower    a    b    c    d    e
range
0      150  150  150  150  150
1      142  142  142  142  142
2      134  134  134  134  134
3      126  126  126  126  126
4      118  118  118  118  118

But it turns out Pandas has a better solution. There are operation methods that allow us to pass an axis argument to specify the axis to align with.

- sub + add * mul / div ** pow

And so the answer is simply:

df0.add(s2, axis='index')

lower    a    b    c    d    e
range
0      150  150  150  150  150
1      142  142  142  142  142
2      134  134  134  134  134
3      126  126  126  126  126
4      118  118  118  118  118

It turns out axis='index' is synonymous with axis=0. As is axis='columns' synonymous with axis=1:

df0.add(s2, axis=0)

lower    a    b    c    d    e
range
0      150  150  150  150  150
1      142  142  142  142  142
2      134  134  134  134  134
3      126  126  126  126  126
4      118  118  118  118  118

The rest of the operations

df0.sub(s2, axis=0)

lower   a   b   c   d   e
range
0      50  50  50  50  50
1      58  58  58  58  58
2      66  66  66  66  66
3      74  74  74  74  74
4      82  82  82  82  82

df0.mul(s2, axis=0)

lower     a     b     c     d     e
range
0      5000  5000  5000  5000  5000
1      4200  4200  4200  4200  4200
2      3400  3400  3400  3400  3400
3      2600  2600  2600  2600  2600
4      1800  1800  1800  1800  1800

df0.div(s2, axis=0)

lower         a         b         c         d         e
range
0      2.000000  2.000000  2.000000  2.000000  2.000000
1      2.380952  2.380952  2.380952  2.380952  2.380952
2      2.941176  2.941176  2.941176  2.941176  2.941176
3      3.846154  3.846154  3.846154  3.846154  3.846154
4      5.555556  5.555556  5.555556  5.555556  5.555556

df0.pow(1 / s2, axis=0)

lower         a         b         c         d         e
range
0      1.096478  1.096478  1.096478  1.096478  1.096478
1      1.115884  1.115884  1.115884  1.115884  1.115884
2      1.145048  1.145048  1.145048  1.145048  1.145048
3      1.193777  1.193777  1.193777  1.193777  1.193777
4      1.291550  1.291550  1.291550  1.291550  1.291550

It's important to address some higher level concepts first. Since my motivation is to share knowledge and teach, I wanted to make this as clear as possible.

Sign up to request clarification or add additional context in comments.

2 Comments

Another good resource for me to mark dup for future questions . :-)
One more approach is via broadcasting df[df.columns] = df.values+s.values[:,None]
11

I prefer the method mentioned by piSquared (i.e., df.add(s, axis=0)), but another method uses apply together with lambda to perform an action on each column in the dataframe:

>>>> df.apply(lambda col: col + s)
    a   b   c
0   4   5   6
1  18  19  20

To apply the lambda function to the rows, use axis=1:

>>> df.T.apply(lambda row: row + s, axis=1)
   0   1
a  4  18
b  5  19
c  6  20

This method could be useful when the transformation is more complex, e.g.:

df.apply(lambda col: 0.5 * col ** 2 + 2 * s - 3)

1 Comment

Basically you can simply add .T at the end for the first code than using axis =1 if I'm not wrong.
1

Just to add an extra layer from my own experience. It extends what others have done here. This shows how to operate on a Series with a DataFrame that has extra columns that you want to keep the values for. Below is a short demonstration of the process.

import pandas as pd

d = [1.056323, 0.126681, 
     0.142588, 0.254143,
     0.15561, 0.139571,
     0.102893, 0.052411]
     
df = pd.Series(d, index = ['const', '426', '428', '424', '425', '423', '427', '636'])

print(df)
const    1.056323
426      0.126681
428      0.142588
424      0.254143
425      0.155610
423      0.139571
427      0.102893
636      0.052411

d2 = {
'loc': ['D', 'D', 'E', 'E', 'F', 'F', 'G', 'G', 'E', 'D'],
'426': [9, 2, 3, 2, 4, 0, 2, 7, 2, 8],
'428': [2, 4, 1, 0, 2, 1, 3, 0, 7, 8],
'424': [1, 10, 5, 8, 2, 7, 10, 0, 3, 5],
'425': [9, 2, 6, 8, 9, 1, 7, 3, 8, 6],
'423': [4, 2, 8, 7, 9, 6, 10, 5, 9, 9],
'423': [2, 7, 3, 10, 8, 1, 2, 9, 3, 9],
'427': [4, 10, 4, 0, 8, 3, 1, 5, 7, 7],
'636': [10, 5, 6, 4, 0, 5, 1, 1, 4, 8],
'seq': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
}

df2 = pd.DataFrame(d2)

print(df2)
  loc  426  428  424  425  423  427  636  seq
0   D    9    2    1    9    2    4   10    1
1   D    2    4   10    2    7   10    5    1
2   E    3    1    5    6    3    4    6    1
3   E    2    0    8    8   10    0    4    1
4   F    4    2    2    9    8    8    0    1
5   F    0    1    7    1    1    3    5    1
6   G    2    3   10    7    2    1    1    1
7   G    7    0    0    3    9    5    1    1
8   E    2    7    3    8    3    7    4    1
9   D    8    8    5    6    9    7    8    1

To multiply a DataFrame by a Series and keep dissimilar columns

  1. Create a list of the elements in the DataFrame and Series you want to operate on:
col = ['426', '428', '424', '425', '423', '427', '636']
  1. Perform your operation using the list and indicate the axis to use:
df2[col] = df2[col].mul(df[col], axis=1)

print(df2)
  loc       426       428       424      425       423       427       636  seq
0   D  1.140129  0.285176  0.254143  1.40049  0.279142  0.411572  0.524110    1
1   D  0.253362  0.570352  2.541430  0.31122  0.976997  1.028930  0.262055    1
2   E  0.380043  0.142588  1.270715  0.93366  0.418713  0.411572  0.314466    1
3   E  0.253362  0.000000  2.033144  1.24488  1.395710  0.000000  0.209644    1
4   F  0.506724  0.285176  0.508286  1.40049  1.116568  0.823144  0.000000    1
5   F  0.000000  0.142588  1.779001  0.15561  0.139571  0.308679  0.262055    1
6   G  0.253362  0.427764  2.541430  1.08927  0.279142  0.102893  0.052411    1
7   G  0.886767  0.000000  0.000000  0.46683  1.256139  0.514465  0.052411    1
8   E  0.253362  0.998116  0.762429  1.24488  0.418713  0.720251  0.209644    1
9   D  1.013448  1.140704  1.270715  0.93366  1.256139  0.720251  0.419288    1

Comments

1

Cause and solution

When a series is added to a frame, the series is seen as a row and labels are seen as column labels. Then the values of the frame are added with the values of the series with matching column labels, and vice-versa. This is what is called an alignment on column labels. In your case there are no matching column labels, thus the sum is always NaN.

You must force the alignment to be done on index labels. The + operator calls DataFrame.add() which has a parameter to select the alignment axis. By default the axis is 'columns' which corresponds to the behavior described. You have to use the function instead of the operator to change the parameter:

df.add(s, axis='index')

Result:

    a   b   c
0   4   5   6
1  18  19  20

This is similar for the other operators, they have a corresponding function with an axis parameter. Details follow.


Your code

df = pd.DataFrame([[1, 2, 3], 
                   [4, 5, 6]], 
                   index=[0, 1],
                   columns=['a', 'b', 'c'])

s = pd.Series([3, 14],
              index=[0, 1])

It creates these objects:

   a  b  c
0  1  2  3
1  4  5  6

0     3
1    14
dtype: int64

How was your request processed

When you add the series to the frame, Pandas does this:

  • It sees the series as a single row with column labels 0 and 1.

  • It aligns both objects on columns, creating a resulting frame which columns labels are the union of the column labels in the frame and in the series: [a,b,c,0,1] (and which rows labels are the frame row labels).

  • It "broadcasts" the series to the size of the frame, that is it creates one row duplicate to match the number of rows in the frame.

  • It fills the resulting frame according to index and columns labels (which are the union).

    • For each frame row it adds the corresponding values of the frame row and the broadcast series. In your case, there are no common labels, the result is always NaN.

The filled resulting frame:

    a   b   c   0   1
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN

As you see the problem is the assumption the series labels will be aligned with the frame index labels. Alignment is done with the columns labels.

Naive solution

You need to force the correct alignment, this can be done by using a frame instead of a series:

df2 = pd.DataFrame([[3, 3, 3],
                    [14, 14, 14]],
                   index=[0, 1],
                   columns=['a', 'b', 'c'])

The frame:

    a   b   c
0   3   3   3
1  14  14  14

Column and row orders can be whatever you like, positions are not used for alignment, only the labels.

Now df + df2 is what you wanted, because labels are matched during alignment:

    a   b   c
0   4   5   6
1  18  19  20

Of course creating a frame with redundant columns is not optimal. Let's go to the regular solution which use the series instead.

Regular solution

Operator + internally calls DataFrame.add with default parameters. The parameter axis defaults to 'index', but can be changed to 'columns'. This parameter determines which axis is used for alignment:

Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

Just call the function directly:

df.add(s, axis='index')

The result is:

    a   b   c
0   4   5   6
1  18  19  20

Other operators

From the documentation of .add():

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

tells you can use the other functions for a similar result.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.