Skip to main content

All Questions

Tagged with
0 votes
1 answer
52 views

Web scraping but not scraping changes

Trying to monitor changes on this page: at5.nl/zoek/pijp . "pijp" is a query keyword here. It shows a list of articles with the latest on top: [enter image description here][1] When I scrape ...
Robert Jan Vencken's user avatar
1 vote
1 answer
230 views

How to mimic "Save As" of a webpage to get its source code?

I'm trying to download the source code of a webpage. When I tried "View Page Source", the portion of the webpage that I'm interested in is not in it. It seems to be "hidden" in <...
Rayne's user avatar
  • 15k
0 votes
1 answer
26 views

Is it possible to detect uniquely-named files if you have the URL path?

Let's say someone has posted a resource at https://this-site.com/files/pdfs/some_file_name.pdf Another resource is then posted at that URL, which we don't know the name of. However, the pathname is ...
Harrison Cramer's user avatar
1 vote
3 answers
628 views

how to get the contents of this web page in csv or tsv format with curl or wget

I have the following link: http://fsop.caac.gov.cn/g145/CARS/WebSiteQueryServlet?method=loadAircraftConditionsResultPage&enterpriseName=%E6%AD%A6%E6%B1%89%E8%88%AA%E8%BE%BE%E8%88%AA%E7%A9%BA%E7%A7%...
moth's user avatar
  • 2,429
1 vote
1 answer
443 views

How do I curl a page that needs first clicking a consent button?

I like to curl a URL like this. (The URL requires consent in EU countries when you have no cookies for the site.) I cobbled up a puppeteer script that does this, but it looks quite heavyweight and ...
HappyFace's user avatar
  • 4,153
0 votes
0 answers
128 views

Download image from html element

I need to download the src file which contains the image. we can just copy the source and paste it on the next tab so that we can download the image. but the thing is I need to download image ...
Guru roxz's user avatar
0 votes
0 answers
154 views

CURL/WGET After a Delay

I understand that I can use CURL/WGET to get the contents of a page. But what if I want to get the page after it 'settles'? Or get the contents after a delay? Example: a web page initially displays. ...
Rick Hellewell's user avatar
2 votes
1 answer
341 views

Using cURL with two matching ranges

I am trying to download a batch of images using cURL. The images are all saved in separate directories, named the same as the image filenames. I tired using the square brackets to set two range's, ...
paper100's user avatar
0 votes
2 answers
662 views

how to just get the HTML output structure of a site

I guess it shows my kookiness here but how do I just get the HTML presentation of a website? for ex., I am trying to retrieve from a Wix site the HTML structure (what is actually being viewed by a ...
Basixp's user avatar
  • 115
2 votes
1 answer
2k views

How to get exact page content in wget if error code is 404

I have two url one is working url another one is page deleted url.working url is fine but for page deleted url instead of getting the exact page content wget receives 404 Working url import os def ...
Mounarajan's user avatar
  • 1,437
11 votes
2 answers
10k views

httrack wget curl scrape & fetch

There are a number of tools on the internet for downloading a static copy of a website, such as HTTrack. There are also many tools, some commercial, for “scraping” content from a website, such as ...
Malik A. Rumi's user avatar
0 votes
1 answer
3k views

How to download image and save image name based on URL?

How do I download all images from a web page and prefix the image names with the web page's URL (all symbols replaced with underscores)? For example, if I were to download all images from http://www....
thdoan's user avatar
  • 19.2k
0 votes
1 answer
399 views

How to download all listed files from a webpage where the URL’s do not have filenames defined

I would like to download all the datasets from this page: http://www.data.gov/catalog/geodata/category/0/agency/0/filter/sort/page/1/count/20 I have tried wget, but here is the challenge: There is ...
kefiren's user avatar
2 votes
1 answer
150 views

PHP - detecting changes in external database-driven site

For a homework project, I'm creating a PHP driven website which main function is aggregating news about various university courses. The main problem is this: (almost) each course has it's own website. ...
Stan's user avatar
  • 23
3 votes
4 answers
3k views

Retrieve partial web page

Is there any way of limiting the amount of data CURL will fetch? I'm screen scraping data off a page that is 50kb, however the data I require is in the top 1/4 of the page so I really only need to ...
James's user avatar
  • 1,407