Skip to main content

Questions tagged [web-scraping]

Web scraping is automated information extraction from web sites.

1 vote
1 answer
249 views

In general, to get Open Graph protocol (OGP) data for a given web page, one would need to retrieve the actual HTML, and then extract the meta tags from it. However, this has two problems: Instead of ...
Lamron's user avatar
  • 119
1 vote
2 answers
804 views

I've written a web scraper and would like it to run as quickly as possible. The scraping isn't trivial; I scrape a few web-pages, gather links from them, scrape those, then gather links from those, ...
Pavlin's user avatar
  • 159
-4 votes
3 answers
246 views

Let's say I was to create a scraper. At some point I'll need to come up with algorithm of identifing whether or not a piece of a newly scraped text matches the one that's already in the DB. How would ...
Nicholas E. Harding's user avatar
-1 votes
1 answer
101 views

I'm working on a project where I want to display a list of wedding venues within X miles of a users location. My first thought is that I will use some type of web scraper to pull in a list of venues. ...
tdammon's user avatar
  • 121
-2 votes
3 answers
92 views

Is there a way through encryption/keys/jwt or anything else to ensure that the data being sent through a POST request is only data coming from another request I made on the client to a 3rd party ...
David's user avatar
  • 219
-3 votes
1 answer
100 views

I am trying to build a system that tells the user on which platforms (like Netflix, prime, etc.) a movie or series is available. What is the best way to go about it? I have considered the following: ...
Jacob Antony's user avatar
1 vote
1 answer
110 views

Let me explain my thoughts about architecture of the project I'm working on. The project code repository consist of: Scrapy component - of course it serves to scrape data, process it and calculate ...
Bob's user avatar
  • 13
3 votes
1 answer
11k views

I have a website which offers pages in the format of https://www.example.com/X where X is a sequential, unique number increasing by one every time a page is created by the users and never reused even ...
nicktheone's user avatar
0 votes
2 answers
271 views

I have a service that fetches data from a target source (not through an API but via scraping) which can change. I want to do pagination so that I return 35 items per page but the target source is 25 ...
Alexander Hunt's user avatar
1 vote
1 answer
563 views

I am creating a DSL for a scraping library I am writing. I would like advice on how to design a DSL, and if the designs I have below are good ones. Apologies if this is an open-ended question, but it ...
andykais's user avatar
  • 111
0 votes
1 answer
917 views

I'm developing a Scrap app to extract some information from a sit. To get that information I have to be logged in to that site. So I use Http post and pass the data needed for login using FormData ...
alexpfx's user avatar
  • 313
0 votes
1 answer
197 views

I have built a very basic webcrawler running off my laptop so it has limited memory and limited hard drive space. The way I have it now is I'm using MongoDB to store the links I find on pages. I make ...
Lokasa Mawati's user avatar
0 votes
1 answer
114 views

I'm looking to do some simple data mining that consists of going once per day to a single page and collect the following information: List of movie theaters Movies today on each theater Session times ...
dR_'s user avatar
  • 11
2 votes
1 answer
255 views

Currently, my thoughts are that GET requests would be feasible by using the concept of screen scraping combined with a cron job that runs at a set interval to scrape data from the GUI and sync to my ...
J. Munson's user avatar
  • 137
0 votes
1 answer
189 views

At a job that I recently started, I inherited some of the projects from the guy who previously held this position. One of the projects was a program that used a Website Platform's public API to get ...
SH7890's user avatar
  • 277

15 30 50 per page