20

I have a pandas dataframe where one of the columns has array of strings as each element.

So something like this.

  col1 col2
0 120  ['abc', 'def']
1 130  ['ghi', 'klm']

Now when i store this to csv using to_csv it seems fine. When i read it back using from_csv i seems to read back. But then when i analyse the value in each cell the array is

'[' ''' 'a' 'b' 'c' and so on. So essentially its not reading it as an array but a set of strings. Can somebody suggest how I can convert this string into an array?

I mean to say the array has been stored like a string

'[\'abc\',\'def\']'

4 Answers 4

40

As mentioned in the other questions, you should use literal_eval here:

from ast import literal_eval
df['col2'] = df['col2'].apply(literal_eval)

In action:

In [11]: df = pd.DataFrame([[120, '[\'abc\',\'def\']'], [130, '[\'ghi\',\'klm\']']], columns=['A', 'B'])

In [12]: df
Out[12]:
     A              B
0  120  ['abc','def']
1  130  ['ghi','klm']

In [13]: df.loc[0, 'B']  # a string
Out[13]: "['abc','def']"

In [14]: df.B = df.B.apply(literal_eval)

In [15]: df.loc[0, 'B']  # now it's a list
Out[15]: ['abc', 'def']
Sign up to request clarification or add additional context in comments.

4 Comments

Can I get some explanation how literal_eval is working for the mentioned problem?
@HammadHassan it tries to parse a string into a python object, similar to json.loads.
I'm pretty sure that this is slower than using pandas own split function -- see the answer below.
worth to mention that this works if the data was stored as a list, and not as numpy array, (because the string representation is using spaces instead of comma), so df["col"] = list(data) or data.tolist() is helpful
6

Nevermind got it.

All i had to do was

arr = s[1:-1].split(',')

This got rid of the square brackets and also split the string into an array like I wanted.

Comments

2

Without pandas, this is one way to do it using the ast modules' literal_eval():

>>> data = "['abc', 'def']"
>>> import ast
>>> a_list = ast.literal_eval(data)
>>> type(a_list)
<class 'list'>
>>> a_list[0]
'abc'

3 Comments

with pandas, you should also use literal_eval!
@AndyHayden Ah okay! Never used pandas, wouldn't know :)
this is more what i wanted.
0

Maybe try using a different separator value? Like so:

DataFrame.to_csv(filepath, sep=';')

and then read with

DataFrame.from_csv(filepath, sep=';')

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.