3

I have a data frame, duration is one of the attributes. The duration's content is like:

            array(['487', '346', ...,  '227', '17']). 

And the df.info(), I get: Data columns (total 22 columns):

             duration        2999 non-null object
             campaign        2999 non-null object
             ...

Now I want to convert duration into int. Is there any solution?

3 Answers 3

5

Use astype:

df['duration'] = df['duration'].astype(int)

Timings

Using the following setup to produce a large sample dataset:

n = 10**5
data = list(map(str, np.random.randint(10**4, size=n)))
df = pd.DataFrame({'duration': data})

I get the following timings:

%timeit -n 100 df['duration'].astype(int)
100 loops, best of 3: 10.9 ms per loop

%timeit -n 100 df['duration'].apply(int)
100 loops, best of 3: 44.3 ms per loop

%timeit -n 100 df['duration'].apply(lambda x: int(x))
100 loops, best of 3: 60.1 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

nice timings, though I suggest tweaking it to use the same number of loops for easier comparison
Edited to have the same number of loops.
3
df['duration'] = df['duration'].astype(int)

Comments

0

Use int(str):

df['duration'] = df['duration'].apply(lambda x: int(x)) #df is your dataframe with attribute 'duration'

2 Comments

No need for the lambda, .apply(int) will work and give better performance.
In general, lambda *args, **kwargs: f(*args, **kwargs) is exactly equivalent to f

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.