1

I have a dataset (csv format) which looks like this:

id,description_data

0, "['manage' 'musical' 'staffmanage' 'staff' 'music' 'coordinate' 'duties' 'musical' 'staff' 'manage' 'music' 'staff' 'direct' 'musical' 'staffAssign' 'manage' 'staff' 'tasks' 'areas' 'scoring' 'arranging' 'copying' 'music' 'vocal' 'coaching']"

When I apply the pandas command pd.read_csv on a dataset (which includes this column which is an array of string), the row that is returned looks like this:

"['manage' 'musical' 'staffmanage' 'staff' 'music' 'coordinate' 'duties'\n 'musical' 'staff' 'manage' 'music' 'staff' 'direct' 'musical'\n 'staffAssign' 'manage' 'staff' 'tasks' 'areas' 'scoring' 'arranging'\n 'copying' 'music' 'vocal' 'coaching']"

This is clearly a string. But I saved this value as an array of strings. How can I properly parse / read this from the csv? Is this possible through pandas, or do I have to write my own parser for this?

To be more specific, what I want is this.

['manage', 'musical', 'staffmanage', 'staff', 'music', 'coordinate', 'duties', 'musical', ...'arranging', 'copying', 'music', 'vocal', 'coaching']

Is there a simple pandas function to deliver this?

1
  • @ DaveTheAI , Did you tried using pd.read_csv(file, sep=" ", header=None) like this? can you post the code you tried if any? Commented Jan 5, 2019 at 19:11

2 Answers 2

3

Fixed you problem with

df.description_data.str[1:-1].str.replace("'",'').str.split(' ')
0    [manage, musical, staffmanage, staff, music, c...
Name: description_data, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

I just solved it using a simple parsing function:

return inp[:-1][1:].replace("'", "").strip().split()

A bit ugly, but works..

1 Comment

I believe you can do it while managing the data with pd.read_csv itself.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.