Skip to main content

Questions tagged [web-crawler]

2 votes
2 answers
208 views

I'm designing a "polite" web crawler using Airflow with the Celery Executor, PostgreSQL for metadata and actual content used by the crawler, and Redis as the Celery broker. My goal is to ...
sebap123's user avatar
  • 129
0 votes
1 answer
917 views

I'm developing a Scrap app to extract some information from a sit. To get that information I have to be logged in to that site. So I use Http post and pass the data needed for login using FormData ...
alexpfx's user avatar
  • 313
0 votes
1 answer
197 views

I have built a very basic webcrawler running off my laptop so it has limited memory and limited hard drive space. The way I have it now is I'm using MongoDB to store the links I find on pages. I make ...
Lokasa Mawati's user avatar
2 votes
1 answer
1k views

I am building a web application crawler that crawls for HTTP requests (GET, PUT, POST, ...). It is designed for one specific purpose; bug bounty hunting. It enables pentesters to insert exploit ...
Tijme's user avatar
  • 31
1 vote
0 answers
1k views

The following is an example using https://github.com/GoogleChrome/puppeteer 'use strict'; const puppeteer = require('puppeteer'); (async() => { // const browser = await puppeteer.launch(); // ...
alex's user avatar
  • 383
1 vote
1 answer
1k views

So, as part of my final year project, I'm writing a web crawler in Java to gather website data that I will then process. One of the attributes I need to gather is "number of popups". I know a pop-up ...
Sophie Brown's user avatar
3 votes
1 answer
221 views

I am building a web spider to crawl through several different sites, but one of them uses javascript buttons instead of links for several functions. And while I could learn to follow them, it adds an ...
Devon M's user avatar
  • 532
4 votes
1 answer
338 views

I am currently working on a pet project in Python with scrapy that scrapes several ebay-like sites for real-estate offers in my area. The thing is that some of the sites do seem to provide more ...
nikitautiu's user avatar
5 votes
1 answer
752 views

I'm building a SPA (single page application) so that when a browser request a page from my server, it only receives a small HTML and a big JavaScript app that then asks the appropriate data from the ...
Pablo Fernandez's user avatar
-2 votes
1 answer
2k views

I'm currently developing a web crawler. The first version was developed in Node.js and runs pretty well. The issues that I encountered with Node.js are in no particular order: slow URL and query-...
m_vdbeek's user avatar
  • 125
4 votes
0 answers
119 views

Repost from here as I think it may be more suited to this exchange. I'm trying to implement DRUM (Disk Repository with Update Management) as per the IRLBot paper (relevant pages start at 4) but as ...
Isaac's user avatar
  • 183
2 votes
2 answers
2k views

Im running a service that crawls many websites daily. The crawlers are run as jobs processed by a bunch of independent background worker processes, that picks up the jobs as they get enqueued. Now ...
Niels Kristian's user avatar
1 vote
2 answers
384 views

I wasn't sure how to ask the question. But basically, it's a textbook scenario. I'm working on a site that's article based, but the article information is stored in a database. Then the page is ...
Sinaesthetic's user avatar
-1 votes
1 answer
3k views

I just subscribed to a facebook page which post links to different open source projects or code archives. I'll like to save those links and descriptions to a local db. How can I do that? I heard ...
dole doug's user avatar
  • 197
4 votes
1 answer
13k views

I've been thinking about a side project that envolves web data scraping. Ok, I read the Getting data from a webpage in a stable and efficient way question and the discussion gave me some insights. ...
salaniojr's user avatar

15 30 50 per page