All Questions
15 questions
0
votes
1
answer
52
views
Web scraping but not scraping changes
Trying to monitor changes on this page: at5.nl/zoek/pijp . "pijp" is a query keyword here. It shows a list of articles with the latest on top:
[enter image description here][1]
When I scrape ...
1
vote
1
answer
230
views
How to mimic "Save As" of a webpage to get its source code?
I'm trying to download the source code of a webpage.
When I tried "View Page Source", the portion of the webpage that I'm interested in is not in it. It seems to be "hidden" in
<...
0
votes
1
answer
26
views
Is it possible to detect uniquely-named files if you have the URL path?
Let's say someone has posted a resource at https://this-site.com/files/pdfs/some_file_name.pdf
Another resource is then posted at that URL, which we don't know the name of. However, the pathname is ...
1
vote
3
answers
628
views
how to get the contents of this web page in csv or tsv format with curl or wget
I have the following link:
http://fsop.caac.gov.cn/g145/CARS/WebSiteQueryServlet?method=loadAircraftConditionsResultPage&enterpriseName=%E6%AD%A6%E6%B1%89%E8%88%AA%E8%BE%BE%E8%88%AA%E7%A9%BA%E7%A7%...
1
vote
1
answer
443
views
How do I curl a page that needs first clicking a consent button?
I like to curl a URL like this. (The URL requires consent in EU countries when you have no cookies for the site.)
I cobbled up a puppeteer script that does this, but it looks quite heavyweight and ...
0
votes
0
answers
128
views
Download image from html element
I need to download the src file which contains the image. we can just copy the source and paste it on the next tab so that we can download the image. but the thing is I need to download image ...
0
votes
0
answers
154
views
CURL/WGET After a Delay
I understand that I can use CURL/WGET to get the contents of a page. But what if I want to get the page after it 'settles'? Or get the contents after a delay?
Example: a web page initially displays. ...
2
votes
1
answer
341
views
Using cURL with two matching ranges
I am trying to download a batch of images using cURL.
The images are all saved in separate directories, named the same as the image filenames.
I tired using the square brackets to set two range's, ...
0
votes
2
answers
662
views
how to just get the HTML output structure of a site
I guess it shows my kookiness here but how do I just get the HTML presentation of a website? for ex., I am trying to retrieve from a Wix site the HTML structure (what is actually being viewed by a ...
2
votes
1
answer
2k
views
How to get exact page content in wget if error code is 404
I have two url one is working url another one is page deleted url.working url is fine but for page deleted url instead of getting the exact page content wget receives 404
Working url
import os
def ...
11
votes
2
answers
10k
views
httrack wget curl scrape & fetch
There are a number of tools on the internet for downloading a static copy of a website, such as HTTrack. There are also many tools, some commercial, for “scraping” content from a website, such as ...
0
votes
1
answer
3k
views
How to download image and save image name based on URL?
How do I download all images from a web page and prefix the image names with the web page's URL (all symbols replaced with underscores)?
For example, if I were to download all images from http://www....
0
votes
1
answer
399
views
How to download all listed files from a webpage where the URL’s do not have filenames defined
I would like to download all the datasets from this page: http://www.data.gov/catalog/geodata/category/0/agency/0/filter/sort/page/1/count/20
I have tried wget, but here is the challenge:
There is ...
2
votes
1
answer
150
views
PHP - detecting changes in external database-driven site
For a homework project, I'm creating a PHP driven website which main function is aggregating news about various university courses.
The main problem is this: (almost) each course has it's own website. ...
3
votes
4
answers
3k
views
Retrieve partial web page
Is there any way of limiting the amount of data CURL will fetch? I'm screen scraping data off a page that is 50kb, however the data I require is in the top 1/4 of the page so I really only need to ...