Newest 'python-2.7+regex+scrapy' Questions

1 vote

2 answers

3k views

How can a scrape a page that literally contains "\x2d", but save that character as "-" in my item?

I need to scrape some text from within a script on a page, and save that text within a scrapy item, presumably as a UTF-8 string. However the actual literal text I'm scraping from has special ...

Chris

301

asked Mar 29, 2019 at 20:04

-3 votes

1 answer

5k views

Using Scrapy sitemap spider, show me how to crawl for article titles

I am trying to crawl Washington Post Sitemap for articles with title that has the word "trump." I did my research here https://scrapy.readthedocs.io/en/latest/topics/spiders.html#sitemapspider, but I ...

Rahmi Pruitt

601

asked Jan 3, 2018 at 0:18

1 vote

2 answers

93 views

python regex to find name of fielder

I am trying to crawl a website and parse cricket scoreboard using scrapy. I have been able to do most of it except for the field who caught the ball. There can be several ways in which the text can be ...

Neel

625

asked Jan 8, 2017 at 13:27

3 votes

1 answer

234 views

Working with Scrapy 'regex definition'

I have been trying to generate a script to scrape data from the website https://services.aamc.org/msar/home#null. I generated a python scrapy 2.7 script to get a piece of text from the website (I am ...

mg520

33

asked May 2, 2016 at 19:27

1 vote

2 answers

49 views

How can I identify a js array by its keys?

My spider returns javascript code as a string. From this code I need to retrieve an array which I can identify by its keys. That means, I already have the keys but how do I get the complete array? ...

steph

565

asked Jul 29, 2015 at 8:14

2 votes

2 answers

2k views

remove special character in scrapy python

I try to remove the special characters between the following text: sample_sample_sample_2.18.14 I tried following patterns for remove those special characters: item['xxxx'] = item['aaaa'].replace('...

Karthick

55

asked Jun 8, 2015 at 13:17

0 votes

1 answer

373 views

scrapy regex cannot find long dash

I'm using scrapy xpath + re to extract data from web pages. Characters are unicode (russian) and all strings to be extracted contain long dashes (python code '\u2014') The problem is my regex cannot ...

thepolina

1,274

asked Jun 5, 2015 at 11:51

0 votes

2 answers

43 views

unable to get text betweent character?

I am try to get text "XXXXX" between the characters.like / XXXXX .doc from the url link I am trying to "item['xxxxx'] = re.search(r'/(.*?)/.doc', item['url']).group(1)" Here unable to get the text ...

Karthick

55

asked Jun 3, 2015 at 6:02

0 votes

1 answer

2k views

Regular expression with Scrapy/Python

with Scrapy I want to extract some data from websites. This is my section for the parsing: item['title'] = sel.xpath('//div[@class="box"]/h3/text()').extract() item['date'] = sel.xpath('//div[@class="...

ChristopherB

1

asked Mar 1, 2015 at 18:52

0 votes

1 answer

43 views

get sgml allow regex for "example.xom/page/200/"

I'm trying to get the regular expression for "example.com/page/200/". Here's what I've done so far: rules = (Rule (SgmlLinkExtractor( allow=("//page/\d+",), restrict_xpaths=('xxxxx',)), ...

Suresh

123

asked Feb 19, 2015 at 8:09

9 votes

1 answer

12k views

Scrapy Extract number from page text with regex

I have been looking for a few hours on how to search all text on a page and if it matches a regex then extract it. I have my spider set up as follows: def parse(self, response): title = ...

Xaxum

3,675

asked Nov 3, 2014 at 21:18

3 votes

1 answer

54 views

Can't get additional items from url

I'm scraping few items from this site, but it grabs items only from the first product and doesn't loop further. I know I'm doing simple stupid mistake, but if you can just point out where I got this ...

user3404005

187

asked Apr 8, 2014 at 21:41

-1 votes

3 answers

90 views

Why this regular expression is not working

I am using python 2.7 with scrapy .20 I have this test 0552121152, +97143321090 I want to get the value before the comma and the value after it. ...

Marco Dinatsoli

10.6k

asked Mar 24, 2014 at 20:49

-2 votes

2 answers

65 views

why this regular expression returns empty [closed]

I have these strings: Phone: 3396222 Phone: +33333388 I want to extract the numbers. I tried this regular expression: Phone:\s*(\d+\.\d+) But I got an empty result I am using scrapy so my code is ...

user2226785

489

asked Mar 15, 2014 at 20:06

1 vote

2 answers

180 views

regular expression to get string from text()

I have this html: <p class="marB0">Phone:+97143396222<br> Email:[email protected]</p> And I want to get the phone number I get the text like this: normalize-...

Marco Dinatsoli

10.6k

asked Mar 10, 2014 at 22:07

Collectives™ on Stack Overflow

All Questions

How can a scrape a page that literally contains "\x2d", but save that character as "-" in my item?

Using Scrapy sitemap spider, show me how to crawl for article titles

python regex to find name of fielder

Working with Scrapy 'regex definition'

How can I identify a js array by its keys?

remove special character in scrapy python

scrapy regex cannot find long dash

unable to get text betweent character?

Regular expression with Scrapy/Python

get sgml allow regex for "example.xom/page/200/"

Scrapy Extract number from page text with regex

Can't get additional items from url

Why this regular expression is not working

why this regular expression returns empty [closed]

regular expression to get string from text()

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags