1

In a previous question (Including Titles in HTML Requests), I learned how to customize an RSS/HTML request to search for news articles with the following keywords:

https://news.google.com/rss/search?q=allintitle:TRACKER%20DATA%20COVID++after:2020-01-08+before:2022-01-09&hl=en-US&gl=US&ceid=US:en

Within Google RSS, I can see the following information being displayed such as the "source URL":

<item>
<title>Hong Kong may consider tracking function on ‘Leave Home Safe’ app: health chief - South China Morning Post</title>
<link>https://news.google.com/rss/articles/CBMid2h0dHBzOi8vd3d3LnNjbXAuY29tL25ld3MvaG9uZy1rb25nL2hlYWx0aC1lbnZpcm9ubWVudC9hcnRpY2xlLzMxNjI2NDcvY29yb25hdmlydXMtaG9uZy1rb25nLW1heS1yZXNvcnQtaG9tZS1xdWFyYW50aW5l0gF3aHR0cHM6Ly9hbXAuc2NtcC5jb20vbmV3cy9ob25nLWtvbmcvaGVhbHRoLWVudmlyb25tZW50L2FydGljbGUvMzE2MjY0Ny9jb3JvbmF2aXJ1cy1ob25nLWtvbmctbWF5LXJlc29ydC1ob21lLXF1YXJhbnRpbmU?oc=5</link>
<guid isPermaLink="false">CBMid2h0dHBzOi8vd3d3LnNjbXAuY29tL25ld3MvaG9uZy1rb25nL2hlYWx0aC1lbnZpcm9ubWVudC9hcnRpY2xlLzMxNjI2NDcvY29yb25hdmlydXMtaG9uZy1rb25nLW1heS1yZXNvcnQtaG9tZS1xdWFyYW50aW5l0gF3aHR0cHM6Ly9hbXAuc2NtcC5jb20vbmV3cy9ob25nLWtvbmcvaGVhbHRoLWVudmlyb25tZW50L2FydGljbGUvMzE2MjY0Ny9jb3JvbmF2aXJ1cy1ob25nLWtvbmctbWF5LXJlc29ydC1ob21lLXF1YXJhbnRpbmU</guid>
<pubDate>Sat, 08 Jan 2022 08:00:00 GMT</pubDate>
<description><a href="https://news.google.com/rss/articles/CBMid2h0dHBzOi8vd3d3LnNjbXAuY29tL25ld3MvaG9uZy1rb25nL2hlYWx0aC1lbnZpcm9ubWVudC9hcnRpY2xlLzMxNjI2NDcvY29yb25hdmlydXMtaG9uZy1rb25nLW1heS1yZXNvcnQtaG9tZS1xdWFyYW50aW5l0gF3aHR0cHM6Ly9hbXAuc2NtcC5jb20vbmV3cy9ob25nLWtvbmcvaGVhbHRoLWVudmlyb25tZW50L2FydGljbGUvMzE2MjY0Ny9jb3JvbmF2aXJ1cy1ob25nLWtvbmctbWF5LXJlc29ydC1ob21lLXF1YXJhbnRpbmU?oc=5" target="_blank">Hong Kong may consider tracking function on ‘Leave Home Safe’ app: health chief</a>&nbsp;&nbsp;<font color="#6f6f6f">South China Morning Post</font></description>
<source url="https://www.scmp.com">South China Morning Post</source>
</item>

Is it possible to somehow add this information about the source URL into the query itself?

For example, suppose I am interested in viewing older versions of this page here: https://covid.cdc.gov/covid-data-tracker/#maps_new-admissions-rate-county. Could I then say that I only want to return source URLs from "https://covid.cdc.gov"?

3
  • 1
    The URI is not encrypted so most applications have changed queries to put the query parameters into the http headers or in the http body. When using HTTPS the headers and body are encrypted. Older applications still put parameters after the question mark in the URI. The "ITEM" above is xml data inside the http body
    – jdweng
    Commented Jun 30, 2023 at 21:30
  • @jdweng : thank you for your reply! so basically this means that what I am trying to do is not possible?
    – stats_noob
    Commented Jun 30, 2023 at 21:31
  • It is not recommended to have unencrypted data.
    – jdweng
    Commented Jul 1, 2023 at 7:37

1 Answer 1

1

It should be possible. You have to add the keyword site.

So now it looks like this:

https://news.google.com/rss/search?q=site:https://covid.cdc.gov&allintitle:TRACKER%20DATA%20COVID++after:2020-01-08+before:2022-01-09&hl=en-US&gl=US&ceid=US:en
1
  • @ Jack Fleeting: Thank you so much for all your help today! Just a question about all this: Is it possible to view an old version of a website using RSS feeds? Or in general, this is not possible? Thank you so much!
    – stats_noob
    Commented Jul 1, 2023 at 3:03

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.