Skip to main content

All Questions

Tagged with
0 votes
3 answers
777 views

UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83c' in position 0: surrogates not allowed

I am trying to parse "https://tre.tbe.taleo.net/tre01/ats/servlet/Rss?org=arobpers2&cws=42" but I am getting the error "UnicodeEncodeError: 'utf-8' codec can't encode character '\...
Asher Ross's user avatar
1 vote
1 answer
114 views

Unsuccessful in using scrapy to load an already filtered RSS feed

For reference see my code below: import scrapy headers = \ {'Host': 'log.rlsbb.cc', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0', 'Accept': 'text/html,...
Ozooha Ozooha's user avatar
4 votes
3 answers
6k views

Reading RSS feed in Python

I am trying to obtain the first title.text from this RSS feed: https://www.mmafighting.com/rss/current. The feed is up to date and operational. However, when I use the following code, it appears the ...
Dalej400's user avatar
0 votes
0 answers
88 views

How do I write HTML code in Django's syndication feed framework?

I'm using Django's syndication feed framework to generate RSS for my website, referring to this document I have done the work according to the example it provides, the code is similar to the following,...
Yarin Zhang's user avatar
0 votes
0 answers
42 views

I'm trying to store pubdate tag of xml into database using python. I'm using beautifulsoup for web crawler

<pubDate> <![CDATA[ Wed, 17 Aug 2022 14:32:47 +0530 ]]></pubDate> Above is the xml tag now how can I store this date tag into dbms? from bs4 import BeautifulSoup import requests ...
ASTHA JAIN's user avatar
0 votes
2 answers
82 views

How do I return the first link in a non-list output

I am attempting to return only the first url that pops up when scraping "https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&CIK=&type=8-k&company=&dateb=&owner=include&...
pilotso's user avatar
  • 39
1 vote
2 answers
71 views

How to scrape keywords that change every time?

I am trying to scrape a keyword in an xml document with BeautifulSoup but am unsure how to do so. The xml document contains "Central Index Key," which changes each time for each document ...
pilotso's user avatar
  • 39
1 vote
1 answer
158 views

Web scraper does not update/loop properly

I am trying to make a web scraper that refreshes infinitely every 5 seconds to update the output window with a new article with specific keywords when it is posted. However, this code only refreshes ...
pilotso's user avatar
  • 39
0 votes
1 answer
73 views

Return for specific title keyword with beautifoulsoup

I'm trying to create a web scraper that returns articles only if there is a certain keyword in the title from an rss feed (xml format). However, whenever I run the code it returns blank, even if the ...
pilotso's user avatar
  • 39
-1 votes
1 answer
115 views

Feedparser not returning values, only metadata

I'm using feedparser to get info from a public database (https://knesset.gov.il/Odata/ParliamentInfo.svc/KNS_Bill()). Each of my entries looks as follows When accessing specific properties: url = '...
Numy's user avatar
  • 1
1 vote
0 answers
55 views

Retrieving title of most recent post in an RSS feed as quickly as possible

I am writing a Python script that depends on being able to poll an RSS feed for updates as quickly as possible. The relevant information for my purposes is contained in the title of the post. I would ...
amiller3513's user avatar
0 votes
1 answer
57 views

Parse specific item in XML by id

I'm a beginner so to improve myself i'm working on those kind of things. I'm trying to get a specific rss/xml item with it's id. Live XML/RSS example is here I want to get specific blog post content ...
spancer's user avatar
  • 15
1 vote
1 answer
61 views

I am doing RSS feed news scrapting using python3.7. I am not get the exact information. Help me to get the proper data

Here I am trying to get the news from the RSS feed and I am not getting the exact information. I am using the requests and BeautifulSoup to achieve the goal. I have the following object. <item> ...
Mehul Dhariyaparmar's user avatar
1 vote
0 answers
104 views

removing relative links from rss feed in python django

When creating the news, relative links were added to the text of the news itself by [link_name] (downloads/generic/2020.04.1/) When I load this text through the standard rss feed handler class ...
kanvull's user avatar
  • 21
0 votes
0 answers
51 views

Inclusive RSS parsing in Python?

I'm parsing a set of rss feeds dynamically. This is my code which works for most sites. class ParseFeeds: @staticmethod def parse(source): logger = logging.getLogger(__name__) ...
Melissa Stewart's user avatar

15 30 50 per page