Pandas: Read array column as column (array of strings)

Question

I have a dataset (csv format) which looks like this:

id,description_data

0, "['manage' 'musical' 'staffmanage' 'staff' 'music' 'coordinate' 'duties' 'musical' 'staff' 'manage' 'music' 'staff' 'direct' 'musical' 'staffAssign' 'manage' 'staff' 'tasks' 'areas' 'scoring' 'arranging' 'copying' 'music' 'vocal' 'coaching']"

When I apply the pandas command pd.read_csv on a dataset (which includes this column which is an array of string), the row that is returned looks like this:

"['manage' 'musical' 'staffmanage' 'staff' 'music' 'coordinate' 'duties'\n 'musical' 'staff' 'manage' 'music' 'staff' 'direct' 'musical'\n 'staffAssign' 'manage' 'staff' 'tasks' 'areas' 'scoring' 'arranging'\n 'copying' 'music' 'vocal' 'coaching']"

This is clearly a string. But I saved this value as an array of strings. How can I properly parse / read this from the csv? Is this possible through pandas, or do I have to write my own parser for this?

To be more specific, what I want is this.

['manage', 'musical', 'staffmanage', 'staff', 'music', 'coordinate', 'duties', 'musical', ...'arranging', 'copying', 'music', 'vocal', 'coaching']

Is there a simple pandas function to deliver this?

@ DaveTheAI , Did you tried using pd.read_csv(file, sep=" ", header=None) like this? can you post the code you tried if any? — Karn Kumar
– Karn Kumar, Commented Jan 5, 2019 at 19:11

BENY · Accepted Answer · 2019-01-05 19:15:51Z

3

Fixed you problem with

df.description_data.str[1:-1].str.replace("'",'').str.split(' ')
0    [manage, musical, staffmanage, staff, music, c...
Name: description_data, dtype: object

answered Jan 5, 2019 at 19:15

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

DaveTheAl · Accepted Answer · 2019-01-05 19:14:04Z

1

I just solved it using a simple parsing function:

return inp[:-1][1:].replace("'", "").strip().split()

A bit ugly, but works..

answered Jan 5, 2019 at 19:14

DaveTheAl

2,1555 gold badges37 silver badges70 bronze badges

1 Comment

Karn Kumar Over a year ago

I believe you can do it while managing the data with pd.read_csv itself.

Collectives™ on Stack Overflow

Pandas: Read array column as column (array of strings)

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related