Skip to main content

All Questions

Tagged with
1 vote
1 answer
114 views

Unsuccessful in using scrapy to load an already filtered RSS feed

For reference see my code below: import scrapy headers = \ {'Host': 'log.rlsbb.cc', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0', 'Accept': 'text/html,...
Ozooha Ozooha's user avatar
1 vote
0 answers
213 views

How can I only scrap new data with Scrapy?

I am working on a custom RSS feed generator, which scrapes a site and stores the scraped info in a JSON file, and then builds an RSS feed from the JSON. This would be run daily from a background ...
Shukai Ni's user avatar
  • 465
1 vote
0 answers
401 views

Getting media:thumbnail url from rss feed

I have this RSS feed(sample supplied below) and I want to extract the url from the media:thumbnail object, but I haven't been successful in doing so(my code supplied below). All the examples I found ...
george haddad's user avatar
1 vote
0 answers
161 views

Scrapy Response Changes After Visiting Page in Web Browser

Below is a spider I wrote to crawl an RSS feed and extract the first link and image title, and save them to a text file: import scrapy class artSpider(scrapy.Spider): name = "metart" ...
Chaitanya Chowgule's user avatar
0 votes
1 answer
1k views

Crawl news website from rss with scrapy

I want to read some news websites rss feeds for example nytimes.com rss: <item> <title> White House Signals Acceptance of Russia Sanctions Bill </title> <link&...
Nasim's user avatar
  • 28
1 vote
1 answer
952 views

Why isn't XMLFeedSpider failing to iterate through the designated nodes?

I'm trying to parse through PLoS's RSS feed to pick up new publications. The RSS feed is located here. Below is my spider: from scrapy.contrib.spiders import XMLFeedSpider class PLoSSpider(...
Louis Thibault's user avatar
1 vote
1 answer
956 views

creating rss with using scrapy

I added a pipeline which I found as an answer in stackoverflow to a sample project. it is : import csv from craiglist_sample import settings def write_to_csv(item): writer = csv.writer(open(...
St3114's user avatar
  • 55
1 vote
1 answer
320 views

unable to scrape through scrapy while scraping rss feed

I want to scrape all title tags along with other tags within parent item tag . But unable to scrape. Tried scrapy shell and it seems to work fine . Below is my whole code from scrapy.contrib.spiders ...
user3136348's user avatar
0 votes
1 answer
140 views

Get RSS links given a domain

I have a file which has a list of domains. I need to crawl the domain(i.e. the whole website) to get rss links. Recursively crawl each page of the website to get rss links from each page and write to ...
blackmamba's user avatar
  • 2,002
1 vote
1 answer
836 views

How to parse RSS link (get ulr to RSS) from the page in Python framework Scrapy?

I want to parse Google search and get links to RSS from each item from the search results. I use Scrapy. I tried this construction, ... def parse_second(self, response): hxs = HtmlXPathSelector(...
Oleksii's user avatar
  • 276