How to iterate over columns of a pandas dataframe

Question

I have this code using Pandas in Python:

all_data = {}
for ticker in ['FIUIX', 'FSAIX', 'FSAVX', 'FSTMX']:
    all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2015')

prices = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()})  
returns = prices.pct_change()

I know I can run a regression like this:

regs = sm.OLS(returns.FIUIX,returns.FSTMX).fit()

but how can I do this for each column in the dataframe? Specifically, how can I iterate over columns, in order to run the regression on each?

Specifically, I want to regress each other ticker symbol (FIUIX, FSAIX and FSAVX) on FSTMX, and store the residuals for each regression.

I've tried various versions of the following, but nothing I've tried gives the desired result:

resids = {}
for k in returns.keys():
    reg = sm.OLS(returns[k],returns.FSTMX).fit()
    resids[k] = reg.resid

Is there something wrong with the returns[k] part of the code? How can I use the k value to access a column? Or else is there a simpler approach?

The Unfun Cat · Accepted Answer · 2023-10-25 12:46:17Z

585

Old answer:

for column in df:
    print(df[column])

The previous answer still works, but was added around the time of pandas 0.16.0. Better versions are available.

Now you can do:

for series_name, series in df.items():
    print(series_name)
    print(series)

edited Oct 25, 2023 at 12:46

answered Sep 14, 2015 at 6:42

The Unfun Cat

32.2k32 gold badges125 silver badges167 bronze badges

2

I seem to only get back the column header when I use this method. So for example: print(df) shows me the data in the dataframe columns but for c in df: print(c) only prints the header not the data.
– Reddspark
Commented Mar 20, 2017 at 9:30
8

Ok ignore me -- I was doing print(column) not print (df[column])
– Reddspark
Commented Mar 20, 2017 at 11:26
32

Watch out for columns with the same name!
– freethebees
Commented Aug 29, 2017 at 13:53
8

It's nice and concise. I'd expect for x in df to iterate over rows, though. :-/
– Eric Duminil
Commented Jan 29, 2018 at 14:47
13

for idx, row in df.iterrows() iterates over rows. Since colbased operations are vectorized it is natural that the main iteration is over columns :)
– The Unfun Cat
Commented Jan 30, 2018 at 10:52

| Show 6 more comments

Zim · Accepted Answer · 2025-01-08 15:58:36Z

128

You can use items():

for name, values in df.items():
    print('{name}: {value}'.format(name=name, value=values[0]))

For pandas < 2.0, you can use iteritems():

for name, values in df.iteritems():
    print('{name}: {value}'.format(name=name, value=values[0]))

edited Jan 8 at 15:58

Zim

5155 silver badges13 bronze badges

answered Apr 2, 2016 at 11:31

mdh

5,5835 gold badges31 silver badges33 bronze badges

5

Great answer. By the way, df.iteritems() can be also written as df.items() giving the same result.
– Dr_Zaszuś
Commented Feb 1, 2022 at 18:20
8

In fact, pandas >= 2.0 only has .items() but no .iteritems().
– Gregor Sturm
Commented Aug 22, 2023 at 11:24

Add a comment |

Abhinav Gupta · Accepted Answer · 2018-09-13 22:18:57Z

This answer is to iterate over selected columns as well as all columns in a DF.

df.columns gives a list containing all the columns' names in the DF. Now that isn't very helpful if you want to iterate over all the columns. But it comes in handy when you want to iterate over columns of your choosing only.

We can use Python's list slicing easily to slice df.columns according to our needs. For eg, to iterate over all columns but the first one, we can do:

for column in df.columns[1:]:
    print(df[column])

Similarly to iterate over all the columns in reversed order, we can do:

for column in df.columns[::-1]:
    print(df[column])

We can iterate over all the columns in a lot of cool ways using this technique. Also remember that you can get the indices of all columns easily using:

for ind, column in enumerate(df.columns):
    print(ind, column)

a1cd · Accepted Answer · 2025-01-07 04:23:45Z

23

You can index dataframe columns by the position using ix.

df1.ix[:,1]

The following returns the first column for example. (0 would be the index)

df1.ix[0,]

The following returns the first row.

df1.ix[:,1]

The following would be the value at the intersection of row 0 and column 1:

df1.ix[0,1]

and so on. So you can enumerate() returns.keys(): and use the number to index the dataframe.

edited Jan 7 at 4:23

a1cd

25.5k4 gold badges11 silver badges34 bronze badges

answered Jan 29, 2015 at 15:51

JAB

12.8k7 gold badges47 silver badges51 bronze badges

17

ix is deprecated, use iloc
– Yohan Obadia
Commented Feb 8, 2018 at 10:47

Add a comment |

kdauria · Accepted Answer · 2015-07-22 17:40:19Z

18

A workaround is to transpose the DataFrame and iterate over the rows.

for column_name, column in df.transpose().iterrows():
    print column_name

answered Jul 22, 2015 at 17:40

kdauria

6,7414 gold badges36 silver badges54 bronze badges

9

Transposition is rather expensive :)
– The Unfun Cat
Commented Sep 23, 2018 at 8:33
2

Might be expensive, but this is a great solution for relatively small dataframes. Thanks kdauria!
– elPastor
Commented Feb 11, 2020 at 21:12
2

I guess this suggestion is deprecated. With recent versions of pandas, better use DataFrame.items(). Also, transposition may lead to data type conversions if the DataFrame consists of different dtypes.
– normanius
Commented Apr 11, 2022 at 14:48

Add a comment |

MEhsan · Accepted Answer · 2017-03-22 22:38:33Z

11

Using list comprehension, you can get all the columns names (header):

[column for column in df]

answered Mar 22, 2017 at 22:38

MEhsan

2,3449 gold badges29 silver badges41 bronze badges

4

Shorter version: list(df.columns) or [c for c in df]
– The Unfun Cat
Commented Mar 26, 2018 at 8:07
down side of this is you only get the column names
– Aaron C
Commented Nov 1, 2023 at 1:32
[func(df[column]) for column in df]
– Aaron C
Commented Nov 1, 2023 at 1:46

Add a comment |

Herpes Free Engineer · Accepted Answer · 2018-04-23 17:36:27Z

9

Based on the accepted answer, if an index corresponding to each column is also desired:

for i, column in enumerate(df):
    print i, df[column]

The above df[column] type is Series, which can simply be converted into numpy ndarrays:

for i, column in enumerate(df):
    print i, np.asarray(df[column])

answered Apr 23, 2018 at 17:36

Herpes Free Engineer

2,7023 gold badges29 silver badges37 bronze badges

Add a comment |

Gaurav · Accepted Answer · 2017-04-29 04:09:48Z

I'm a bit late but here's how I did this. The steps:

Create a list of all columns
Use itertools to take x combinations
Append each result R squared value to a result dataframe along with excluded column list
Sort the result DF in descending order of R squared to see which is the best fit.

This is the code I used on DataFrame called aft_tmt. Feel free to extrapolate to your use case..

import pandas as pd
# setting options to print without truncating output
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

import statsmodels.formula.api as smf
import itertools

# This section gets the column names of the DF and removes some columns which I don't want to use as predictors.
itercols = aft_tmt.columns.tolist()
itercols.remove("sc97")
itercols.remove("sc")
itercols.remove("grc")
itercols.remove("grc97")
print itercols
len(itercols)

# results DF
regression_res = pd.DataFrame(columns = ["Rsq", "predictors", "excluded"])

# excluded cols
exc = []

# change 9 to the number of columns you want to combine from N columns.
#Possibly run an outer loop from 0 to N/2?
for x in itertools.combinations(itercols, 9):
    lmstr = "+".join(x)
    m = smf.ols(formula = "sc ~ " + lmstr, data = aft_tmt)
    f = m.fit()
    exc = [item for item in x if item not in itercols]
    regression_res = regression_res.append(pd.DataFrame([[f.rsquared, lmstr, "+".join([y for y in itercols if y not in list(x)])]], columns = ["Rsq", "predictors", "excluded"]))

regression_res.sort_values(by="Rsq", ascending = False)

Loc Quan · Accepted Answer · 2024-08-09 04:33:59Z

If you care about performance, I have benchmarked some ways to iterate over columns.

If you just want the column names, fastest method is to iterate over df.columns.values -- 51% faster than df.columns, 86% faster than df and a whopping 2500% faster than df.items().

Details are as below:

# DataFrame with 1000 rows and 26 columns (from 'a' to 'z')
df = pd.DataFrame(
    np.random.randn(1000, 26),
    columns=list('abcdefghijklmnopqrstuvwxyz')
)

# Method 1
for col_name, col in df.items():
    ...
98.5 μs ± 1.17 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# Method 2
for col in df:
    ...
6.9 μs ± 35.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# Method 3
for col in df.columns:
    ...
5.6 μs ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# Method 4 (fastest)
for col in df.columns.values:
    ...
3.7 μs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Pierre D · Accepted Answer · 2021-02-04 13:38:41Z

1

I landed on this question as I was looking for a clean iterator of columns only (Series, no names).

Unless I am mistaken, there is no such thing, which, if true, is a bit annoying. In particular, one would sometimes like to assign a few individual columns (Series) to variables, e.g.:

x, y = df[['x', 'y']]  # does not work

There is df.items() that gets close, but it gives an iterator of tuples (column_name, column_series). Interestingly, there is a corresponding df.keys() which returns df.columns, i.e. the column names as an Index, so a, b = df[['x', 'y']].keys() assigns properly a='x' and b='y'. But there is no corresponding df.values(), and for good reason, as df.values is a property and returns the underlying numpy array.

One (inelegant) way is to do:

x, y = (v for _, v in df[['x', 'y']].items())

but it's less pythonic than I'd like.

answered Feb 4, 2021 at 13:38

Pierre D

26.4k8 gold badges69 silver badges104 bronze badges

Hey @Pierre D I came across your answer & was looking for something similar. I don't know if this link helps or not but it may be worth a look.
– JC23
Commented Jun 30, 2021 at 20:15
1

I had a similar question regarding the assignment. x, y = df[["x", "y"]].T.values works.
– normanius
Commented Apr 11, 2022 at 14:53

Add a comment |

dsz · Accepted Answer · 2022-11-07 01:31:03Z

0

Most of these answers are going via the column name, rather than iterating the columns directly. They will also have issues if there are multiple columns with the same name. If you want to iterate the columns, I'd suggest:

for series in (df.iloc[:,i] for i in range(df.shape[1])):
   ...

answered Nov 7, 2022 at 1:31

dsz

5,2521 gold badge44 silver badges39 bronze badges

Good point about iterating over columns rather than names but you can do it using items as said in answers above: for _, col in data_df.items():
– Jérôme
Commented May 11, 2023 at 12:54

Add a comment |

JeeyCi · Accepted Answer · 2021-12-12 09:20:42Z

-1

assuming X-factor, y-label (multicolumn):

columns = [c for c in _df.columns if c in ['col1', 'col2','col3']]  #or '..c not in..'
_df.set_index(columns, inplace=True)
print( _df.index)

X, y =  _df.iloc[:,:4].values, _df.index.values

answered Dec 12, 2021 at 9:20

JeeyCi

6276 silver badges13 bronze badges

Add a comment |

Collectives™ on Stack Overflow

How to iterate over columns of a pandas dataframe

12 Answers 12

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

Linked

Related