All Questions
60 questions
0
votes
3
answers
777
views
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83c' in position 0: surrogates not allowed
I am trying to parse "https://tre.tbe.taleo.net/tre01/ats/servlet/Rss?org=arobpers2&cws=42" but I am getting the error "UnicodeEncodeError: 'utf-8' codec can't encode character '\...
1
vote
1
answer
114
views
Unsuccessful in using scrapy to load an already filtered RSS feed
For reference see my code below:
import scrapy
headers = \
{'Host': 'log.rlsbb.cc',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0',
'Accept': 'text/html,...
4
votes
3
answers
6k
views
Reading RSS feed in Python
I am trying to obtain the first title.text from this RSS feed: https://www.mmafighting.com/rss/current. The feed is up to date and operational. However, when I use the following code, it appears the ...
0
votes
0
answers
88
views
How do I write HTML code in Django's syndication feed framework?
I'm using Django's syndication feed framework to generate RSS for my website, referring to this document
I have done the work according to the example it provides, the code is similar to the following,...
0
votes
0
answers
42
views
I'm trying to store pubdate tag of xml into database using python. I'm using beautifulsoup for web crawler
<pubDate> <![CDATA[ Wed, 17 Aug 2022 14:32:47 +0530 ]]></pubDate>
Above is the xml tag now how can I store this date tag into dbms?
from bs4 import BeautifulSoup import requests ...
0
votes
2
answers
82
views
How do I return the first link in a non-list output
I am attempting to return only the first url that pops up when scraping "https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent&CIK=&type=8-k&company=&dateb=&owner=include&...
1
vote
2
answers
71
views
How to scrape keywords that change every time?
I am trying to scrape a keyword in an xml document with BeautifulSoup but am unsure how to do so.
The xml document contains "Central Index Key," which changes each time for each document ...
1
vote
1
answer
158
views
Web scraper does not update/loop properly
I am trying to make a web scraper that refreshes infinitely every 5 seconds to update the output window with a new article with specific keywords when it is posted. However, this code only refreshes ...
0
votes
1
answer
73
views
Return for specific title keyword with beautifoulsoup
I'm trying to create a web scraper that returns articles only if there is a certain keyword in the title from an rss feed (xml format). However, whenever I run the code it returns blank, even if the ...
-1
votes
1
answer
115
views
Feedparser not returning values, only metadata
I'm using feedparser to get info from a public database (https://knesset.gov.il/Odata/ParliamentInfo.svc/KNS_Bill()).
Each of my entries looks as follows
When accessing specific properties:
url = '...
1
vote
0
answers
55
views
Retrieving title of most recent post in an RSS feed as quickly as possible
I am writing a Python script that depends on being able to poll an RSS feed for updates as quickly as possible. The relevant information for my purposes is contained in the title of the post. I would ...
0
votes
1
answer
57
views
Parse specific item in XML by id
I'm a beginner so to improve myself i'm working on those kind of things.
I'm trying to get a specific rss/xml item with it's id.
Live XML/RSS example is here
I want to get specific blog post content ...
1
vote
1
answer
61
views
I am doing RSS feed news scrapting using python3.7. I am not get the exact information. Help me to get the proper data
Here I am trying to get the news from the RSS feed and I am not getting the exact information.
I am using the requests and BeautifulSoup to achieve the goal.
I have the following object.
<item>
...
1
vote
0
answers
104
views
removing relative links from rss feed in python django
When creating the news, relative links were added to the text of the news itself by [link_name] (downloads/generic/2020.04.1/)
When I load this text through the standard rss feed handler
class ...
0
votes
0
answers
51
views
Inclusive RSS parsing in Python?
I'm parsing a set of rss feeds dynamically. This is my code which works for most sites.
class ParseFeeds:
@staticmethod
def parse(source):
logger = logging.getLogger(__name__)
...