0

My DataFrame db is built from a csv file, using read_csv. Values of column A look like this:

[1,2,5,6,48,125]

On every row, the "vector" can have a different length. But it is still a string. I can strip the [ and ] as follows:

db["A"] = db["A"].str.rstrip(']').str.lstrip('[')

The resulting values, such as 1,2,5,6,48,125, should be good input for np.fromstring. However, I am not able to apply this function in combination with pandas DataFrame.

When I try: db["A"] = np.fromstring(db["A"], sep=','), it says: a bytes-like object is required, not 'Series'. Using apply also does not work. Thanks for any tips.

3 Answers 3

2

One idea is convert values to lists and then to np.array:

import ast

db["A"] = db["A"].apply(lambda x: np.array(ast.literal_eval(x)))
Sign up to request clarification or add additional context in comments.

Comments

0
import numpy as np
for i in range(0, len(db)-1):
  db["A"] = np.array(db.iloc[i]["A"])
  continue

Comments

0

np.fromarray() is built for this purpose like you(OP) already pointed out. The problem here is that the input isn't being recognized as a string.

However this addresses the problem,

import pandas as pd
import numpy as np

dataframe = pd.DataFrame({'data': ["[1,2,4]", "[1,2,4,5]","[1,2,4,5,6]"]})
dataframe['data'] = dataframe['data'].apply(lambda x : np.fromstring(str(x).replace('[','').replace(']',''), sep=','))

The output will be an 1D- nparray

Running dataframe.head() gives me this

    data
0   [1.0, 2.0, 4.0]
1   [1.0, 2.0, 4.0, 5.0]
2   [1.0, 2.0, 4.0, 5.0, 6.0]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.