0

I have a pd.DataFrame with multiple columns and one column has url extracted from web e.g.:

url = "http://www.currys.co.uk/gbuk/s/10153572/product_confirmation.html"

I have used regular expressions to extract the product code as below

re.findall('\d+', url)

However, if I try and replicate to the entire dataset ( which has multiple columns) I get an error

regex = lambda x: x.re.findall('\d+')
df["new_column"] = df['url'].apply(regex)

'str' object has no attribute 're' .

2
  • 1
    In pandas, use df['url'].str.extractall(r'\d+') instead. pandas.pydata.org/pandas-docs/stable/generated/… Commented Nov 19, 2018 at 20:18
  • 1
    Use pandas str methods, df['url'].str.extract('(\d+)', expand = False) Commented Nov 19, 2018 at 20:22

1 Answer 1

0

Just use the same syntax in your lambda function that you used in your scaler example:

regex = lambda x: re.findall('\d+', x)

you probably want the zeroeth element too so you don't any up with a series of lists

regex = lambda x: re.findall('\d+', x)[0]
Sign up to request clarification or add additional context in comments.

1 Comment

df['url'].str.extract('(\d+)', expand = False) this one does the trick

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.