All Questions
10 questions
1
vote
1
answer
114
views
Unsuccessful in using scrapy to load an already filtered RSS feed
For reference see my code below:
import scrapy
headers = \
{'Host': 'log.rlsbb.cc',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0',
'Accept': 'text/html,...
1
vote
0
answers
213
views
How can I only scrap new data with Scrapy?
I am working on a custom RSS feed generator, which scrapes a site and stores the scraped info in a JSON file, and then builds an RSS feed from the JSON. This would be run daily from a background ...
1
vote
0
answers
401
views
Getting media:thumbnail url from rss feed
I have this RSS feed(sample supplied below) and I want to extract the url from the media:thumbnail object, but I haven't been successful in doing so(my code supplied below). All the examples I found ...
1
vote
0
answers
161
views
Scrapy Response Changes After Visiting Page in Web Browser
Below is a spider I wrote to crawl an RSS feed and extract the first link and image title, and save them to a text file:
import scrapy
class artSpider(scrapy.Spider):
name = "metart"
...
0
votes
1
answer
1k
views
Crawl news website from rss with scrapy
I want to read some news websites rss feeds for example nytimes.com rss:
<item>
<title>
White House Signals Acceptance of Russia Sanctions Bill
</title>
<link&...
1
vote
1
answer
952
views
Why isn't XMLFeedSpider failing to iterate through the designated nodes?
I'm trying to parse through PLoS's RSS feed to pick up new publications. The RSS feed is located here.
Below is my spider:
from scrapy.contrib.spiders import XMLFeedSpider
class PLoSSpider(...
1
vote
1
answer
956
views
creating rss with using scrapy
I added a pipeline which I found as an answer in stackoverflow to a sample project.
it is :
import csv
from craiglist_sample import settings
def write_to_csv(item):
writer = csv.writer(open(...
1
vote
1
answer
320
views
unable to scrape through scrapy while scraping rss feed
I want to scrape all title tags along with other tags within parent item tag . But unable to scrape. Tried scrapy shell and it seems to work fine . Below is my whole code
from scrapy.contrib.spiders ...
0
votes
1
answer
140
views
Get RSS links given a domain
I have a file which has a list of domains. I need to crawl the domain(i.e. the whole website) to get rss links. Recursively crawl each page of the website to get rss links from each page and write to ...
1
vote
1
answer
836
views
How to parse RSS link (get ulr to RSS) from the page in Python framework Scrapy?
I want to parse Google search and get links to RSS from each item from the search results.
I use Scrapy.
I tried this construction,
...
def parse_second(self, response):
hxs = HtmlXPathSelector(...