1

I am extracting content with the help of scrapy into an array. Each element has the unwanted characters ": " inside which I would like to remove as efficient as possible.

v = response.xpath('//div[@id="tab"]/text()').extract()
>>> v
['Marke:', 'Modell:']
>>> for i in v : re.sub(r'[^\w]', '', i)
... 
'Marke'
'Modell'

Now that seems to work, but how can I retain the result? In my code, v hasn't changed:

>>> v
['Marke:', 'Modell:']

2 Answers 2

3

You can solve this with a list comprehension:

>>> v = response.xpath('//div[@id="tab"]/text()').extract()
>>>
>>> import re
>>> v = [re.sub(r'[^\w]', '', i) for i in v]
>>> v
['Marke', 'Modell']
1

I think that pulling in regex for this is a little overkill: use the string replace method:

v = ['Marke:', 'Modell:']
v = [str.replace(':', '') for str in v]
print(v)

Output:

['Marke', 'Modell']

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.