extract numerical values from DataFrame string object

Question

I have a DataFrame object of dtype string. A typical row looks like below:

'\n\n              Dividend Indicated Gross Yield\n          \n\n              1.50%\n          \n'

I am trying to extract only the numerical data from the above string. for example, my desired output should be 1.50.

The other thing to keep in mind is that each row will have different length of numericals and some may include a negative sign too.

I have tried some recommendations involving .rstrip(), regex, convert_objects but they do not work as intended. Any help appreciated.

Can you post some of the regex you have tried?
– Alexander McFarlane
Commented Jun 11, 2015 at 0:21 — Alexander McFarlane, Commented Jun 11, 2015 at 0:21

maxymoo · Accepted Answer · 2015-06-15 22:49:03Z

2

You probably want to do this:

df.col.str.extract('(\-?\d+\.\d+)').astype(np.float64)

answered Jun 11, 2015 at 0:26

maxymoo

36.6k12 gold badges96 silver badges120 bronze badges

i tried the above solution but it throws a error message: "ValueError: This pattern contains no groups to capture."
– Siraj S.
Commented Jun 15, 2015 at 21:24
I made a mistake with the regex, it needs a set of parentheses to tell it which group to extract, it should work now.
– maxymoo
Commented Jun 15, 2015 at 22:49

Add a comment |

1 Answer 1