extract product code using regular expression in Python and apply to a column [duplicate]

Question

I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:

url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"

I have used regular expressions to extract the product code as below

re.findall('\d+', url)

However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error

regex = lambda x: x.re.findall('\d+')
df["new_column"] = df['url'].apply(regex)

'str' object has no attribute 're' .

In pandas, use df['url'].str.extractall(r'\d+') instead. pandas.pydata.org/pandas-docs/stable/generated/… — Frank
– Frank, Commented Nov 19, 2018 at 20:18
Use pandas str methods, df['url'].str.extract('(\d+)', expand = False) — Vaishali
– Vaishali, Commented Nov 19, 2018 at 20:22

robertwest · Accepted Answer · 2018-11-19 20:22:25Z

0

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('\d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('\d+', x)[0]

answered Nov 19, 2018 at 20:22

robertwest

9527 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

df['url'].str.extract('(\d+)', expand = False) this one does the trick