Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

java search-engine crawler flexible web-crawler crawlers filesystem-crawler collector-http collector-fs

Updated Apr 23, 2025
Java

ScaleUnlimited / flink-crawler

Star

Continuous scalable web crawler built on top of Flink and crawler-commons

crawler spider web-crawler crawling flink web-crawling

Updated Apr 8, 2019
Java

commoncrawl / nutch

Star

Common Crawl fork of Apache Nutch

java big-data hadoop web-crawler commoncrawl

Updated Apr 1, 2025
Java

HHN / crawler4j

Star

Open Source Web Crawler for Java - A fork of yasserg/crawler4j

java crawler spider web-crawler crawler4j web-spider

Updated Apr 21, 2025
Java

rajagopal28 / simple-web-crawler

Star

Java based web-crawler program which makes use of pool based multi-threading, simple UI with Swing and jsoup to nested web crawling

Updated Oct 10, 2022
Java

apache / nutch-webapp

Star

Apache Nutch is an extensible and scalable web crawler

java hadoop web-crawler nutch crawling apache

Updated Jul 7, 2023
Java

vladimanaev / web-spider

Star

web crawler allowing full page render crawl using HtmlUnit

crawler web-crawler web-scraper web-scraping web-crawling htmlunit web-spider webpage-scraper

Updated Dec 15, 2017
Java

xujiahaha / price-monitoring-system

Star

microservices spring-boot rabbitmq web-crawler

Updated Aug 22, 2017
Java

jiup / gospy

Star

🕷 a flexible web crawler framework

framework spider web-crawler

Updated Nov 7, 2017
Java

Frog-Front / web-crawler

Star

A Library for web crawling websites harvesting URLs of embedded links and images

java bot spider web-crawler webcrawler

Updated Sep 1, 2022
Java

ctkqiang / LianJiaScraper

Star

这是一个基于Spring Boot框架开发的链家房源数据爬虫系统。本项目致力于为用户提供一个便捷、高效的房源数据采集解决方案。通过自动化爬取链家网站的房源信息，系统能够实时获取各个城市的房源详情，包括房屋价格、位置、面积、户��等关键信息。

real-estate java spider spring-boot web-crawler web-scraping china lianjia ctkqiang zhong-guo-jing-ji

Updated Apr 10, 2025
Java

pkgodara / WebCrawler

Star

Java Web Crawler Program to get all links or images download from websites and use Google or Bing search options .

java web-crawler google-search

Updated Apr 17, 2017
Java

kenych / java-web-crawler

Star

This is pretty basic example of web page crawling in java and is not fully production ready crawler and is done for test purposes only

web-crawler concurrency java8

Updated Sep 1, 2022
Java

Improve this page

Add a description, image, and links to the web-crawler topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawler topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-crawler

Here are 89 public repositories matching this topic...

ssssssss-team / spider-flow

apache / nutch

apache / incubator-stormcrawler

VIDA-NYU / ache

USCDataScience / sparkler

commoncrawl / news-crawl

crawler-commons / crawler-commons

Norconex / crawlers

ScaleUnlimited / flink-crawler

commoncrawl / nutch

HHN / crawler4j

rajagopal28 / simple-web-crawler

apache / nutch-webapp

vladimanaev / web-spider

xujiahaha / price-monitoring-system

jiup / gospy

Frog-Front / web-crawler

ctkqiang / LianJiaScraper

pkgodara / WebCrawler

kenych / java-web-crawler

Improve this page

Add this topic to your repo