03 Feb 17:12

43f61e7

v2.8.0 Latest

Latest

v2.8.0

Firecrawl v2.8.0 is here!

Firecrawl v2.8.0 brings major improvements to agent workflows, developer tooling, and self-hosted deployments across the API and SDKs, including our new Skill.

Parallel Agents for running thousands of /agent queries simultaneously, powered by our new Spark 1 Fast model.
Firecrawl CLI with full support for scrape, search, crawl, and map commands.
Firecrawl Skill for enabling AI agents (Claude Code, Codex, OpenCode) to use Firecrawl autonomously.
Three new models powering /agent: Spark 1 Fast for instant retrieval (currently only available in Playground), Spark 1 Mini for complex research queries, and Spark 1 Pro for advanced extraction tasks.
Agent enhancements including webhooks, model selection, and new MCP Server tools.
Platform-wide performance improvements including faster search execution and optimized Redis calls.
SDK improvements including Zod v4 compatibility.

And much more, check it out below!

New Features

Parallel Agents
Execute thousands of /agent queries in parallel with automatic failure handling and intelligent waterfall execution. Powered by Spark 1-Fast for instant retrieval, automatically upgrading to Spark 1 Mini for complex queries requiring full research.
Firecrawl CLI
New command-line interface for Firecrawl with full support for scrape, search, crawl, and map commands. Install with npm install -g firecrawl-cli.
Firecrawl Skill
Enables agents like Claude Cursor, Codex, and OpenCode to use Firecrawl for web scraping and data extraction, installable via npx skills add firecrawl/cli.
Spark Model Family
Three new models powering /agent: Spark 1 Fast for instant retrieval (currently available in Playground), Spark 1 Mini (default) for everyday extraction tasks at 60% lower cost, and Spark 1 Pro for complex multi-domain research requiring maximum accuracy. Spark 1 Pro achieves ~50% recall while Mini delivers ~40% recall, both significantly outperforming tools costing 4-7x more per task.
Firecrawl MCP Server Agent Tools
New firecrawl_agent and firecrawl_agent_status tools for autonomous web data gathering via MCP-enabled agents.
Agent Webhooks
Agent endpoint now supports webhooks for real-time notifications on job completion and progress.
Agent Model Selection
Agent endpoint now accepts a model parameter and includes model info in status responses.
Multi-Arch Docker Images
Self-hosted deployments now support linux/arm64 architecture in addition to amd64.
Sitemap-Only Crawl Mode
New crawl option to exclusively use sitemap URLs without following links.
ignoreCache Map Parameter
New option to bypass cached results when mapping URLs.
Custom Headers for /map
Map endpoint now supports custom request headers.
Background Image Extraction
Scraper now extracts background images from CSS styles.
Improved Error Messages
All user-facing error messages now include detailed explanations to help diagnose issues.

API Improvements

Search without concurrency limits — scrapes in search now execute directly without queue overhead.
Return 400 for unsupported actions with clear errors when requested actions aren't supported by available engines.
Job ID now included in search metadata for easier tracking.
Metadata responses now include detected timezone.
Backfill metadata title from og:title or twitter:title when missing.
Preserve gid parameter when rewriting Google Sheets URLs.
Fixed v2 path in batch scrape status pagination.
Validate team ownership when appending to existing crawls.
Screenshots with custom viewport or quality settings now bypass cache.
Optimized Redis calls across endpoints.
Reduced excessive robots.txt fetching and parsing.
Minimum request timeout parameter now configurable.

SDK Improvements

JavaScript SDK

Zod v4 Compatibility — schema conversion now works with Zod v4 with improved error detection.
Watcher Exports — Watcher and WatcherOptions now exported from the SDK entrypoint.
Agent Webhook Support — new webhook options for agent calls.
Error Retry Polling — SDK retries polling after transient errors.
Job ID in Exceptions — error exceptions now include jobId for debugging.

Python SDK

Manual pagination helpers for iterating through results.
Agent webhook support added to agent client.
Agent endpoint now accepts model selection parameter.
Metadata now includes concurrency limit information.
Fixed max_pages handling in crawl requests.

Dashboard Improvements

Dark mode is now supported.
On the usage page, you can now view credit usage broken down by day.
On the activity logs page, you can now filter by the API key that was used.
The "images" output format is now supported in the Playground.
All admins can now manage their team's subscriptions.

Quality & Performance

Skip markdown conversion checks for large HTML documents.
Export Google Docs as HTML instead of PDF for improved performance.
Improved branding format with better logo detection and error messages for PDFs and documents.
Improved lopdf metadata loading performance.
Updated html-to-markdown module with multiple bug fixes.
Increased markdown service body limit and added request ID logging.
Better Sentry filtering for cancelled jobs and engine errors.
Fixed extract race conditions and RabbitMQ poison pill handling.
Centralized Firecrawl configuration across the codebase.
Multiple security vulnerability fixes, including CVE-2025-59466 and lodash prototype pollution.

Self-Hosted Improvements

CLI custom API URL support via firecrawl --api-url http://localhost:3002 for local instances.
ARM64 Docker support via multi-arch images for Apple Silicon and ARM servers.
Fixed docker-compose database credentials out of the box.
Fixed Playwright service startup caused by Chromium path issues.
Updated Node.js to major version 22 instead of a pinned minor.
Added RabbitMQ health check endpoint.
Fixed PostgreSQL port exposure in docker-compose.

New Contributors

Full Changelog: v2.7.0...v2.8.0

What's Changed

refactor(api): centralize firecrawl config by @amplitudesxd in #2496
fix(config): add .catch to NUQ worker port defaults for error handling by @amplitudesxd in #2505
(sdk)fix/same timeout as api now by @rafaelsideguide in #2503
(sdks)feat/added concurrency info to metadata by @rafaelsideguide in #2502
fix: make srcset URLs absolute in HTML transformation by @Chadha93 in #2515
feat(api/admin/crawl-monitor): add endpoint for monitoring crawl system by @mogery in #2518
feat(api/logRequest): associate requests with API keys by @mogery in #2519
Fix Config Load on Tests by @abimaelmartell in #2506
feat(api): update model usage to gpt-4o-mini by @amplitudesxd in #2520
feat(api/scrapeURL): engpicker integ by @mogery in #2523
fix(playwright-service-ts): wasn't starting up due to the lack of chromium under /tmp/.cache by @dmlarionov in #2512
added timezone to metadata response by @rafaelsideguide in #2526
Increase Go Service Write Timeout by @abimaelmartell in #2489
(python-sdk)fix/max_pages by @rafaelsideguide in #2527
Use invoiced billing for certain expansion packs by @micahstairs in #2532
Fix PostgreSQL port exposure in docker-compose by @abimaelmartell in #2530
(feat/partners) Allow email to be optional for partners API by @nickscamara in #2533
Advanced model for recursive schemas by @amplitudesxd in #2535
feat(api): update gpt-4o usage to gpt-4.1 by @amplitudesxd in #2536
fix(api): cost tracking by @amplitudesxd in #2537
Update Sentry for ZDR compliance by @abimaelmartell in #2529
sanitize null-byte strings and report robustInsert failures to Sentry by @abimaelmartell in #2538
Dont log Feature Flog Errors to Sentry by @abimaelmartell in #2540
Debug logs to Extract Updates by @abimaelmartell in #2539
Update test site build by @abimaelmartell in #2543
feat: increase precrawl limits by @delong3 in #2544
fix(api): engines for robots and scrape + reduced sitemap limit by @delong3 in #2545
Webhook dispatcher by @amplitudesxd in #2534
(feat/partner-integrations) Rotate endpoint by @nickscamara in #2547
fix: extract race condition by @amplitudesxd in https://github.com/fir...

Contributors

gemyago, pcgeek86, and 14 other contributors

Assets 2

05 Dec 17:40

nickscamara

v2.7.0

ccbd8d6

v2.7.0

Firecrawl v2.7.0 is here!

ZDR Search support for enterprise customers.
Improved Branding Format with better detection.
Partner Integrations API now in closed beta.
Faster and more accurate screenshots.
Self-hosted improvements

And a lot more enhacements, check it out below!

New Features

Improved Branding Extract
Better logo and color detection for more accurate brand extraction results.
NOQ Scrape System (Experimental)
New scrape pipeline with improved stability and integrated concurrency checks.
Enhanced Redirect Handling
URLs now resolve before mapping, with safer redirect-chain detection and new abort timeouts.
Enterprise Search Parameters
New enterprise-level options available for the /search endpoint.
Integration-Based User Creation
Users can now be automatically created when coming from referring integrations.
minAge Scrape Parameter
Allows requiring a minimum cached age before re-scraping.
Extract Billing Credits
Extract jobs now use the same credit billing system as other endpoints.
Self-Host: Configurable Crawl Concurrency
Self-hosted deployments can now set custom concurrency limits.
Sentry Enhancements
Added Vercel AI integration, configurable sampling rates, and improved exception filtering.
UUIDv7 IDs
All new resources use lexicographically sortable UUIDv7.

API Improvements

DNS Resolution Errors Now Return 200 for more consistent failure handling.
Improved URL Mapping Logic including sitemap maxAge fixes, recursive sitemap support, Vue/Angular router normalization, and skipping subdomain logic for IP addresses.
Partial Results for Multi-Source Search instead of failing all sources.
Concurrency Metadata Added to scrape job responses.
Enhanced Metrics including total wait time, LLM usage, and format details.
Batch Scrape Upgrades
- Added missing /v2/batch/scrape/:jobId/errors endpoint
- Fixed pagination off-by-one bug
More Robust Error Handling for PDF/document engines, pydantic parsing, Zod validation, URL validation, and billing edge cases.

SDK Improvements

JavaScript SDK

Returns job ID from synchronous methods.
Improved WebSocket document event handling.
Fixed types, Deno WS, and added support for ignoreQueryParameter.
Version bump with internal cleanup.

Python SDK

Added extra metadata fields.
Improved batch validation handling.

Quality & Performance

Reduced log file size and improved tmp file cleanup.
Updated Express version and patched vulnerable packages.
Disabled markdown conversion for sitemap scrapes for improved performance.
Better precrawl logging and formatting.
Skip URL rewriting for published Google Docs.
Prevent empty cookie headers during webhook callbacks.

Self-Hosted Improvements

Disabled concurrency limit enforcement for self-hosted mode.
PostgreSQL credentials now configurable via environment variables.
Docker-compose build instructions fixed.

👥 New Contributors

Full Changelog: v2.6.0...v2.7.0

What's Changed

(feat/dns) DNS Resolution errors should be a 200 by @nickscamara in #2402
Improve Logo and Color Detection on Branding Extract by @abimaelmartell in #2362
(js-sdk) fix: ws 'document' event implementation by @rafaelsideguide in #2415
(js-sdk): Return job ID from synchronous methods by @abimaelmartell in #2414
(js-sdk): Fix types by @abimaelmartell in #2416
(js-sdk): Bump Version by @abimaelmartell in #2417
fix(api): disable gcs logging without db auth by @delong3 in #2418
feat: noq scrape system by @delong3 in #2419
Muv2 exp add more logs by @tomkosm in #2421
feat(api): noq concurrency check integration by @delong3 in #2424
Fix concurrency backfill bug by @micahstairs in #2425
feat: redirect to docs when hitting main api endpoint by @amplitudesxd in #2426
feat(api): total wait time in request metrics by @delong3 in #2428
(fix/search) rm legacy external search apis by @nickscamara in #2420
fix(api): various vulnerable packages (2025/11/21) by @mogery in #2431
fix(api): pdf + document engines not respecting skipTlsVerification flag and error handling for uncidi by @delong3 in #2435
fix(api): /map returning less urls with sitemap include by @delong3 in #2440
feat: resolve redirects before mapping urls by @amplitudesxd in #2439
fix(api): tally system rework by @mogery in #2430
fix(api): update URL handling of resolved redirects by @amplitudesxd in #2442
fix(api): opaque fire engine delete + poll interval by @delong3 in #2443
feat(api): add abort timeout for resolveRedirects by @amplitudesxd in #2444
(python-sdk) feat: added extra fields to metadata by @rafaelsideguide in #2441
fix(api): handle case with no billed teams in tallyBilling function by @amplitudesxd in #2446
fix: Add support for ignoreQueryParameter in map SDKs by @Chadha93 in #2429
fix(api): vue + angular router url normalization by @delong3 in #2447
feat(api): usedLlm + formats in request metrics by @delong3 in #2448
feat(api): switch to uuidv7 by @mogery in #2449
Add Sentry Settings by @abimaelmartell in #2451
Filter Sentry Exceptions by @abimaelmartell in #2453
Add minAge parameter to scrape (ENG-4073) by @amplitudesxd in #2452
Fix typos by @omahs in #2457
fix docker-compose service build instructions by @davidkhala in #2406
Cleanup tmp files from downloadFile by @abimaelmartell in #2455
Optimize Logs File Size by @abimaelmartell in #2456
Update express version by @abimaelmartell in #2465
Annotate test failures on CI by @abimaelmartell in #2462
fix(api/precrawl): precrawl logging + format + skip index by @delong3 in #2466
fix(go-html-to-md): request body max 60MB by @delong3 in #2467
fix(api): dns + crawl denial errors by @delong3 in #2469
Disable markdown conversion for sitemap scrapes by @abimaelmartell in #2461
Add missing /v2/batch/scrape/:jobId/errors endpoint by @devin-ai-integration[bot] in #2471
fix: improve pydantic parsing error handling | ENG-4070 by @Chadha93 in #2450
fix: Make PostgreSQL credentials configurable via environment variables by @DraPraks in #2388
feat(api): create users via referring integrations by @mogery in #2463
feat: muv2 exp apikey env by @tomkosm in #2472
(feat/search) Enterprise params by @nickscamara in #2412
Validate UUID from URL in Requests by @abimaelmartell in #2392
Disable Concurrency Limit on Self Hosted by @abimaelmartell in #2475
(js sdk)fix/ws deno by @rafaelsideguide in #2476
fix(api): sitemap max age for map requests by @delong3 in #2479
fix(api): sitemap max age for recursive sitemaps by @delong3 in #2480
feat: new app database shape by @mogery in #2445
chore(api): disable x-powered-by by @amplitudesxd in #2483
Skip subdomain logic for IP addresses by @abimaelmartell in #2477
Attempt Fix Search Tests by @abimaelmartell in #2478
fix(api): don't bill where stealth proxy was unsupported by @amplitudesxd in #2484
feat(extract): port to billing credits by @mogery in #2482
feat: [self-host] - add support to configure concurrency for crawl...

Contributors

abimaelmartell, davidkhala, and 12 other contributors

Assets 2

14 Nov 16:13

nickscamara

v2.6.0

c61826d

v2.6.0

Highlights

Unified Billing Model - Credits and tokens merged into single system. Extract now uses credits (15 tokens = 1 credit), existing tokens work everywhere.
Full Release of Branding Format - Full support across Playground, MCP, JS and Python SDKs.
Change Tracking - Faster and more reliable detection of web page content updates.
Reliability and Speed Improvements - All endpoints significantly faster with improved reliability.
Instant Credit Purchases - Buy credit packs directly from dashboard without waiting for auto-recharge.
Improved Markdown Parsing - Enhanced markdown conversion and main content extraction accuracy.
Core Stability Fixes - Fixed change-tracking issues, PDF timeouts, and improved error handling.

What's Changed

fix(mu): Bug fix on v2 exp by @tomkosm in #2345
Allow index use with waitFor (ENG-3481) by @amplitudesxd in #2346
Fix autoCharge return, add top level guard by @abimaelmartell in #2341
fix: import MAX_MAP_LIMIT from types.ts to resolve 1000 URL cap by @prashu0705 in #2333
chore: improve llm extract logging by @amplitudesxd in #2348
fix: error truncation by @amplitudesxd in #2349
feat(go-html-to-md): enhance markdown conversion with robust PRE and … by @rafaelsideguide in #2321
feat(billing): merge credits and tokens by @mogery in #2352
chore: update geoip database by @amplitudesxd in #2354
Implement Branding Format by @abimaelmartell in #2326
Filter non-HTTP(S) protocols with separate error message by @devin-ai-integration[bot] in #2357
Add branding format support to JS and Python SDKs by @devin-ai-integration[bot] in #2360
Fix: Handle invalid favicon URLs gracefully in metadata extraction by @abimaelmartell in #2361
feat: allow disabling webhook delivery by @amplitudesxd in #2367
feat: add engine forcing by domain pattern by @devin-ai-integration[bot] in #2371
revert nuq commits by @amplitudesxd in #2376
CI: Remove npm audit from server tests by @abimaelmartell in #2385
ci: Fix dependency audit by @abimaelmartell in #2386
(fix/ctracking) Fix change tracking issues by @nickscamara in #2391
update: Adds support for recursive schema for python-sdk with model selection by @Chadha93 in #2266
fix: image search field mapping in Python SDK by @naaa760 in #2244
fix(api/scrape): document + pdf scrape loop by @delong3 in #2396

New Contributors

@prashu0705 made their first contribution in #2333
@naaa760 made their first contribution in #2244

Full Changelog: v2.5.0...v2.6.0

Contributors

abimaelmartell, tomkosm, and 9 other contributors

Assets 2

30 Oct 04:02

nickscamara

v2.5.0

8bcc5b4

v2.5.0 - The World's Best Web Data API

We now have the highest quality and most comprehensive web data API available powered by our new semantic index and custom browser stack.

See the benchmarks below:

New Features

Implemented scraping for .xlsx (Excel) files.
Introduced new crawl architecture and NUQ concurrency tracking system.
Per-owner/group concurrency limiting + dynamic concurrency calculation.
Added group backlog handling and improved group operations.
Added /search pricing update
Added team flag to skip country check.
Always populate NUQ metrics for improved observability.
New test-site app for improved CI testing.
Extract metadata from document head for richer output.

Enhancements & Improvements

Improved blocklist loading and unsupported site error messages.
Updated x402-express version.
Improved includePaths handling for subdomains.
Updated self-hosted search to use DuckDuckGo.
JS & Python SDKs no longer require API key for self-hosted deployments.
Python SDK timeout handling improvements.
Rust client now uses tracing instead of print.
Reduced noise in auto-recharge Slack notifications.

Fixes

Ensured crawl robots.txt warnings surface reliably.
Resolved concurrency deadlocks and duplicate job handling.
Fixed search country defaults and pricing logic bugs.
Fixed port conflicts in harness environments.
Fixed viewport dimension support and screenshot behavior in Playwright.
Resolved CI test flakiness (playwright cache, prod tests).

👋 New Contributors

Full diff: v2.4.0...v2.5.0

What's Changed

More verbose blocklist loading errors by @amplitudesxd in #2277
Update x402-express Version by @abimaelmartell in #2279
Revise unsupported site error message by @micahstairs in #2286
feat: index precrawl by @delong3 in #2289
fix: ensure includePaths apply to subdomains when allowSubdomains is enabled by @abimaelmartell in #2278
Fix search country parameter to default to undefined when location is set by @devin-ai-integration[bot] in #2283
Fix Port Conflict in Harness by @abimaelmartell in #2285
js-sdk: require API key only for cloud API (not self-hosted) by @abimaelmartell in #2237
feat: Implement Scraping Excel xlsx files by @abimaelmartell in #2284
feat(nuq): concurrency tracking by @mogery in #2291
fix(crawl): surface robots.txt warning reliably by @ftonato in #2287
feat(nuq): add source for max_concurrency by @mogery in #2293
feat(nuq/concurrency-tracking): fix deadlock by @mogery in #2295
Replace self-hosted Google with DDG search (ENG-3499) by @amplitudesxd in #2225
python-sdk: Fix timeout handling across api calls by @abimaelmartell in #2288
python-sdk: Don't require API Key when running Self Hosted by @abimaelmartell in #2290
Add team flag to skip country check by @devin-ai-integration[bot] in #2300
Update /search endpoint pricing to 2 credits per 10 search results by @devin-ai-integration[bot] in #2299
Fix search pricing bug by @devin-ai-integration[bot] in #2301
feat(nuq): per-owner-per-group concurrency limiting by @mogery in #2302
update: handle circular refs as well in recursive schema by @Chadha93 in #2298
feat(nuq): dynamically calculate current concurrency by @mogery in #2305
feat(nuq): group_id, job backlogs, and group add operations by @mogery in #2309
feat(ci): new test-site app + updated jest tests by @delong3 in #2312
feat: new crawl architecture by @mogery in #2320
Moved index for backlog query after the table creation by @c4nc in #2323
fix(ci): playwright cache + prod tests by @delong3 in #2314
Improve slack notifications for scale auto-recharges by @micahstairs in #2325
Make auto-recharge notifications less noisy by @micahstairs in #2327
fix: viewport dimension support for Playwright engine screenshots by @ftonato in #2329
feat: always populate nuq metrics by @amplitudesxd in #2328
fix: scrape viewport test by @amplitudesxd in #2330
Revert "Merge pull request #2329 from firecrawl/devin/ENG-3639-175924… by @micahstairs in #2332
fix(nuq): per-instance listen channel ID by @mogery in #2336
fix(auto_charge): add a cooldown to the new recharge route by @mogery in #2338
chore: update last scrape rpc by @amplitudesxd in #2339
Rust client: use tracing instead of print by @codetheweb in #2324
Extract metadata from document head (ENG-3822) by @amplitudesxd in #2342
fix(nuq,concurrency-limit): handle if there are duplicate jobs in the concurrency queue by @mogery in #2343

New Contributors

@delong3 made their first contribution in #2289
@c4nc made their first contribution in #2323
@codetheweb made their first contribution in #2324

Full Changelog: v2.4.0...v2.5.0

Contributors

c4nc, abimaelmartell, and 8 other contributors

Assets 2

13 Oct 15:09

nickscamara

v2.4.0

40a7d38

v2.4.0

New Features

New PDF Search Category - You can now search for only pdfs via our v2/search endpoints by specifying .pdf category
Gemini 2.5 Flash CLI Image Editor — Create and edit images directly in the CLI using Firecrawl + Gemini 2.5 Flash integration (#2172)
x402 Search Endpoint (/v2/x402) — Added a next-gen search API with improved accuracy and speed (#2218)
RabbitMQ Event System — Firecrawl jobs now support event-based communication and prefetching from Postgres (#2230, #2233)
Improved Crawl Status API — More accurate and real-time crawl status reporting using the new crawl_status_2 RPC (#2239)
Low-Results & Robots.txt Warnings — Users now receive clear feedback when crawls are limited by robots.txt or yield few results (#2248)
Enhanced Tracing (OpenTelemetry) — Much-improved distributed tracing for better observability across services (#2219)
Metrics & Analytics — Added request-level metrics for both Scrape and Search endpoints (#2216)
Self-Hosted Webhook Support — Webhooks can now be delivered to private IP addresses for self-hosted environments (#2232)

Improvements

Reduced Docker Image Size — Playwright service image size reduced by 1 GB by only installing Chromium (#2210)
Python SDK Enhancements — Added "cancelled" job status handling and poll interval fixes (#2240, #2265)
Faster Node SDK Timeouts — Axios timeouts now propagate correctly, improving reliability under heavy loads (#2235)
Improved Crawl Parameter Previews — Enhanced prompts and validation for crawl parameter previews (#2220)
Zod Schema Validation — Stricter API parameter validation with rejection of extra fields (#2058)
Better Redis Job Handling — Fixed edge cases in getDoneJobsOrderedUntil for more stable Redis retrieval (#2258)
Markdown & YouTube Fixes — Fixed YouTube cache and empty markdown summary bugs (#2226, #2261)
Updated Docs & Metadata — README updates and new metadata fields added to the JS SDK (#2250, #2254)
Improved API Port Configuration — The API now respects environment-defined ports (#2209)

Fixes

Fixed recursive $ref schema validation edge cases (#2238)
Fixed enum arrays being incorrectly converted to objects (#2224)
Fixed harness timeouts and self-hosted docker-compose.yaml issues (#2242, #2252)

New Contributors

🔗 Full Changelog: v2.3.0 → v2.4.0

What's Changed

fix: add missing poll_interval param in watcher by @Chadha93 in #2155
feat: Add Firecrawl + Gemini 2.5 Flash Image CLI Editor by @MAVRICK-1 in #2172
Add environment variable to disable blocklist by @amplitudesxd in #2197
Fix ARM builds by @amplitudesxd in #2198
fix(v1/search): if f-e search is available, only use that by @mogery in #2199
Upgrade html-to-markdown dependency (ENG-3563) by @amplitudesxd in #2195
feat(map): add crawler and scrape options to job logging by @ftonato in #2203
refactor: integrate facilitator in payment middleware by @ftonato in #2213
(feat/metrics) Scrape and Search Request Metrics by @nickscamara in #2216
(feat/big-query) Big Query by @nickscamara in #2217
feat(api): add x402 search endpoint to /v2 by @ftonato in #2218
feat(api/otel): much improved tracing by @mogery in #2219
fix: Add Zod validation to reject additionalProperties in schema parameters by @devin-ai-integration[bot] in #2058
Reduce playwright-service image size by 1 GB by installing only Chromium by @bernie43 in #2210
fix: enum arrays being converted to objects by @Chadha93 in #2224
feat(nuq): RabbitMQ support for job finish events and waiting by @mogery in #2230
fix: Use port from env.PORT for API by @abimaelmartell in #2209
feat(nuq/rabbitmq): add prefetching jobs from psql to rabbitmq by @mogery in #2233
fix: skip summary generation when markdown is empty by @devin-ai-integration[bot] in #2226
Propagate timeout to Axios in Node SDK (ENG-3474) by @amplitudesxd in #2235
feat(api/crawl-status): use crawl_status_2 RPC by @mogery in #2239
Allow self-hosted webhook delivery to private IP addresses by @abimaelmartell in #2232
Update harness timeout by @amplitudesxd in #2242
python-sdk: include "cancelled" in CrawlJob.status and exit wait loop on cancel (fixes #2190) by @Jeelislive in #2240
feat(api/ci): test with RabbitMQ on prod by @mogery in #2241
(fix/crawl-params) Enhance crawl param preview prompt further by @nickscamara in #2220
build(deps): bump actions/checkout from 3 to 5 by @dependabot[bot] in #2115
fix: harness by @amplitudesxd in #2249
Fix a self-hosted docker-compose.yaml bug caused by a recent firecrawl change by @th3w1zard1 in #2252
fix: handle $ref for recursive schema validation by @Chadha93 in #2238
Add missing metadata fields to JS SDK (ENG-3439) by @amplitudesxd in #2250
Update README.md by @nickscamara in #2254
fix: handle edge case in getDoneJobsOrderedUntil function for Redis job retrieval by @ftonato in #2258
Fix YouTube cache markdown bug by @devin-ai-integration[bot] in #2261
feat(api): add warnings for low results and robots.txt restrictions in map and crawl controllers by @ftonato in #2248
Test new mu alternative by @tomkosm in #2263
chore(python-sdk): Bump version to 4.3.7 for poll_interval fix by @devin-ai-integration[bot] in #2265
Feat/test new mu alt by @tomkosm in #2267
(feat/search-index) Search Index by @nickscamara in #2268
Feat/test new mu alt by @tomkosm in #2270
(feat/search-index) Separate service by @nickscamara in #2271
fix: additional queue_scrape for nuq schema by @Chadha93 in #2272
(feat/search) Pdf search category by @nickscamara in #2276

New Contributors

@Chadha93 made their first contribution in #2155
@MAVRICK-1 made their first contribution in #2172
@bernie43 made their first contribution in #2210
@abimaelmartell made their first contribution in #2209
@th3w1zard1 made their first contribution in #2252

Full Changelog: v2.3.0...v2.4.0

Contributors

abimaelmartell, th3w1zard1, and 11 other contributors

Assets 2

19 Sep 19:01

nickscamara

v2.3.0

3c4ec3b

v2.3.0

New Features

YouTube Support: You can now get YouTube transcripts
Enterprise Auto-Recharge: Added enterprise support for auto-recharge
odt and .rtf: Now support odt and rtf file parsing
Docx Parsing: 50x faster docx parsing
K8s Deployment: Added NuQ worker deployment example
Self Host: Tons of improvements for our self host users

Improvements & Fixes

Stability: Fixed timeout race condition, infinite scrape loop, and location query bug
Tooling: Replaced ts-prune with knip, updated pnpm with minimumReleaseAge
Docs: Added Rust to CONTRIBUTING and fixed typos
Security: Fixed pkgvuln issue

What's Changed

Update blocklist by @micahstairs in #2150
docs: fix typo and punctuation in CONTRIBUTING.md by @jarrensj in #2149
Fix timeout error message race condition for ENG-3372 by @devin-ai-integration[bot] in #2144
Add exceptions to blocklist by @micahstairs in #2156
fix: pkgvuln by @mogery in #2158
Replace ts prune with knip (ENG-3540) by @amplitudesxd in #2148
feat(auto-recharge): enterprise by @mogery in #2127
feat(scrapeURL/index): index metrics by @mogery in #2160
Update pnpm and add minimumReleaseAge (ENG-3560) by @amplitudesxd in #2162
feat(api/scrapeURL): add special support for YouTube watch pages by @mogery in #2157
fix(scrapeURL/index): locations array querying bug by @mogery in #2164
Fix infinite loop when scraping a forbidden webpage (ENG-3339) by @amplitudesxd in #2147
Add Rust to CONTRIBUTING by @oalsing in #2180
feat(scrapeURL/summary): use gpt-5-mini by @mogery in #2174
Custom Rust document parser (ENG-3489) by @amplitudesxd in #2159
feat: add NuQ worker deployment to Kubernetes examples by @devin-ai-integration[bot] in #2163
feat(api): move blocklist to DB by @mogery in #2186

New Contributors

@jarrensj made their first contribution in #2149
@oalsing made their first contribution in #2180

Full Changelog: v2.2.0...v2.3.0

Contributors

oalsing, micahstairs, and 4 other contributors

Assets 2

12 Sep 14:59

nickscamara

v2.2.0

1740e42

v2.2.0

Features

MCP version 3 is live. Stable support for cloud mcp with HTTP Transport and SSE modes. Compatible with v2 and v1 from.
Webhooks: Now we support signatures + extract support + event failures
Map is now 15x faster + supports more urls
Search reliability improvements
Usage is now tracked by API Key
Support for additional locations (CA, CZ, IL, IN, IT, PL, and PT)
Queue status endpoint
Added maxPages parameter to v2 scrape API for pdf parsing

Improvements

API:
- New /team/queue-status endpoint.
- Added nuq feature.
- Added VIASOCKET integration.
- Historical credit/token usage endpoints with expanded data.
Student Program: Support for more universities + students to get free credits through our student program
Map: 15x faster and increased the limit to 100k
Scrape API: Added maxPages parameter for PDF parser.
Python SDK:
- Added get_queue_status to aio + normalization of docs in search results.
- SDKs: Added next cursor pagination and integration param support.
Infrastructure: Added static IP proxy pool + proxy location support.
Webhooks: Implemented signatures, refactored sending, added scrape error events.
Performance: Optimized map, converted Rust natives to single NAPI library.
CI/CD: Revamped CI, added pre-commit hooks, cross-platform harness.

🐛 Fixes

Corrected concurrency limit scaling.
Fixed search result links/descriptions and retry mechanism for empty results.
Re-signed expired screenshot URLs.
Trimmed null chars from PDF titles + fixed encoding.
Fixed sitemap parsing and added .gz sitemap support.
Fixed js-sdk zod-to-json-schema import.
Fixed webhook data format regression.
Improved credit handling in account object.

🛠️ Chores & Other

Removed unused dependencies, updated CONTRIBUTING.md.
Added debug logging, ignored scripts during CI build.
Various dependency bumps and build improvements.

🔗 Full Changelog: v2.1.0...v2.2.0

What's Changed

feat(sdks): next cursor pagination by @rafaelsideguide in #2067
feat: add maxPages parameter to PDF parser in v2 scrape API by @devin-ai-integration[bot] in #2047
fix(concurrency-limit): scale! by @mogery in #2071
feat(api): add /team/queue-status endpoint by @mogery in #2063
build(deps): bump actions/checkout from 3 to 5 by @dependabot[bot] in #1998
build(deps): bump actions/setup-python from 4 to 5 by @dependabot[bot] in #2028
build(deps): bump docker/login-action from 1 to 3 by @dependabot[bot] in #1996
build(deps): bump docker/build-push-action from 5 to 6 by @dependabot[bot] in #1995
build(deps): bump actions/setup-node from 3 to 4 by @dependabot[bot] in #1997
feat: historical credit/token usage endpoints + more data in existing usage endpoints by @mogery in #2077
fix(api/tsconfig): remove baseUrl by @mogery in #2078
fix(search): get links and descriptions correctly by @mogery in #2076
fix(api/crawler/sitemap): bump sitemap limit by @mogery in #2079
fix(api/scrapeURL/index): re-sign expired screenshot URLs by @mogery in #2080
fix(python-sdk): added missing get_queue_status in aio and added to t… by @rafaelsideguide in #2081
Fix go-html-to-md on Windows (ENG-3398) by @amplitudesxd in #2082
fix(js-sdk): zod-to-json-schema import by @rafaelsideguide in #2083
Replace custom address validation functions with ipaddr.js (ENG-3404) by @amplitudesxd in #2084
fix(api/native/pdf-parser): trim null chars out of pdf titles by @mogery in #2086
Fix sitemap parsing (ENG-3361) by @amplitudesxd in #2085
Implement webhook signatures (ENG-3018) by @amplitudesxd in #2087
Format api and add pre-commit hooks (ENG-3408) by @amplitudesxd in #2088
Fix pre-commit hook by @amplitudesxd in #2089
Ignore scripts during CI build by @amplitudesxd in #2090
Add more debug logging to crawler by @amplitudesxd in #2091
Add proxy location support to crawl and map endpoints (ENG-3361) by @amplitudesxd in #2092
post-incident changes by @mogery in #2095
feat(python-sdk): normalize docs in search results by @rafaelsideguide in #2098
Feat(sdks): integration param by @rafaelsideguide in #2096
Refactor webhook sending (ENG-3426) by @amplitudesxd in #2094
Fix webhook data format regression by @amplitudesxd in #2106
Update Type Annotations for v2 Async Search (SearchResponse → SearchData) by @devin-ai-integration[bot] in #2097
Fire webhook events for scrape/batch scrape errors (ENG-3463) by @amplitudesxd in #2107
Add static IP proxy pool (ENG-3420) by @amplitudesxd in #2103
Convert all Rust natives to a single library using NAPI (ENG-3397) by @amplitudesxd in #2105
Filter out invalidated index records (ENG-3396) by @amplitudesxd in #2102
feat(sdk): added agent option by @rafaelsideguide in #2108
feat(api): nuq by @mogery in #1984
Make harness cross-platform compatible (ENG-3477) by @amplitudesxd in #2110
Update error messages for self-hosted instances by @devin-ai-integration[bot] in #2119
Revise /extract's error message when no content could be fetched from URLs by @micahstairs in #2109
Fix graceful credit handling in account object (ENG-3495) by @amplitudesxd in #2120
fix(scrapeURL/f-e/scrape): bad failed schema by @mogery in #2123
feat(ci): revamp by @mogery in #2124
fix(api/native/pdf): get title with proper encoding by @mogery in #2125
chore: remove unused dependencies + various CI fixes by @mogery in #2128
Perform watching inside of harness (ENG-3514) by @amplitudesxd in #2131
Add gzipped sitemap support (ENG-3520) by @amplitudesxd in #2132
Remove .xml.gz from include entries by @amplitudesxd in #2134
fix(api): rearchitect crawl kickoff by @mogery in #2133
Update CONTRIBUTING.md by @nickscamara in #2141
Update CONTRIBUTING.md by @nickscamara in #2142
(fix/search) Implement retry mechanisms for empty results by @nickscamara in #2140
Optimize map (ENG-3526) by @amplitudesxd in #2138
feat(api): add VIASOCKET integration by @ftonato in #2143

Full Changelog: v2.1.0...v2.2.0

Contributors

ftonato, micahstairs, and 6 other contributors

Assets 2

29 Aug 17:22

nickscamara

v2.1.0

6b39d12

v2.1.0

Firecrawl v2.1.0 is here!

✨ New Features

Search Categories: Filter search results by specific categories using the categories parameter:
- github: Search within GitHub repositories, code, issues, and documentation
- research: Search academic and research websites (arXiv, Nature, IEEE, PubMed, etc.)
- More coming soon
Image Extraction: Added image extraction support to the v2 scrape endpoint.
Data Attribute Scraping: Now supports extraction of data-* attributes.
Hash-Based Routing: Crawl endpoints now handle hash-based routes.
Improved Google Drive Scraping: Added ability to scrape TXT, PDF, and Sheets from Google Drive.
PDF Enhancements: Extracts PDF titles and shows them in metadata.
API Enhancements:
- Map endpoint supports up to 100k results.
Helm Chart: Initial Helm chart added for Firecrawl deployment.
Security: Improved protection against XFF spoofing.

🛠 Fixes

Fixed UTF-8 encoding in Google search scraper.
Restored crawl status in preview mode.
Fixed missing methods in Python SDK.
Corrected JSON response handling for v2 search with scrapeOptions.formats.
Fixed field population for credits_billed in v0 scrape.
Improved document field overlay in v2 search.

👥 New Contributors

🔗 Full Changelog

What's Changed

fix: handle UTF-8 encoding properly in Google search scraper by @kelter-antunes in #1924
feat(api): add image extraction support to v2 scrape endpoint by @vishkrish200 in #2008
feat(api): support extraction of data-* attributes in scrape endpoints by @vishkrish200 in #2006
feat: add initial Helm chart for Firecrawl deployment by @JakobStadlhuber in #1262
feat(api/crawl): support hash-based routing by @mogery in #2031
fix(python-sdk): missing methods in client by @rafaelsideguide in #2050
feat(countryCheck): better protection against XFF spoofing by @mogery in #2051
fix: include json in v2 /search response when using scrapeOptions.formats by @ieedan in #2052
feat(scrapeURL/rewrite): scrape Google Drive TXT/PDF files and sheets by @mogery in #2053
Update README.md by @nickscamara in #2060
(fix/crawl) Re-enable crawl status in preview mode by @nickscamara in #2061
feat(pdf-parser): get PDF title and show in metadata by @mogery in #2062
fix(v2/search): overlay doc fields via spread operator by @mogery in #2054
feat(api): propagate api_key_id towards billing function by @mogery in #2049
feat(api/map): use new RPCs + set limit max to 100k by @mogery in #2065
fix(api/v0/scrape): populate credits_billed field by @mogery in #2066

New Contributors

@kelter-antunes made their first contribution in #1924
@vishkrish200 made their first contribution in #2008
@ieedan made their first contribution in #2052

Full Changelog: v2.0.1...v2.1.0

Contributors

kelter-antunes, nickscamara, and 5 other contributors

Assets 2

26 Aug 17:18

mogery

v2.0.1

e8cf098

v2.0.1

This release fixes the "SSRF Vulnerability via malicious webhook" security advisory. It is recommended that people using the self-hosted version of Firecrawl update to v2.0.1 immediately. More info in the advisory: GHSA-p2wg-prhf-jx79

Assets 2

19 Aug 15:27

nickscamara

v2.0.0

4934b21

v2.0.0

Introducing v2.0.0

Key Improvements

Faster by default: Requests are cached with maxAge defaulting to 2 days, and sensible defaults like blockAds, skipTlsVerification, and removeBase64Images are enabled.
New summary format: You can now specify "summary" as a format to directly receive a concise summary of the page content.
Updated JSON extraction: JSON extraction and change tracking now use an object format: { type: "json", prompt, schema }. The old "extract" format has been renamed to "json".
Enhanced screenshot options: Use the object form: { type: "screenshot", fullPage, quality, viewport }.
New search sources: Search across "news" and "images" in addition to web results by setting the sources parameter.
Smart crawling with prompts: Pass a natural-language prompt to crawl and the system derives paths/limits automatically. Use the new crawl-params-preview endpoint to inspect the derived options before starting a job.

Quick migration checklist

Replace v1 client usage with v2 clients:
- JS: const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' })
- Python: firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')
- API: use the new https://api.firecrawl.dev/v2/ endpoints.
Update formats:
- Use "summary" where needed
- JSON mode: Use { type: "json", prompt, schema } for JSON extraction
- Screenshot and Screenshot@fullPage: Use screenshot object format when specifying options
Adopt standardized async flows in the SDKs:
- Crawls: startCrawl + getCrawlStatus (or crawl waiter)
- Batch: startBatchScrape + getBatchScrapeStatus (or batchScrape waiter)
- Extract: startExtract + getExtractStatus (or extract waiter)
Crawl options mapping (see below)
Check crawl prompt with crawl-params-preview

SDK surface (v2)

JS/TS

Method name changes (v1 → v2)

Scrape, Search, and Map

v1 (FirecrawlApp)	v2 (Firecrawl)
`scrapeUrl(url, ...)`	`scrape(url, options?)`
`search(query, ...)`	`search(query, options?)`
`mapUrl(url, ...)`	`map(url, options?)`

Crawling

v1	v2
`crawlUrl(url, ...)`	`crawl(url, options?)` (waiter)
`asyncCrawlUrl(url, ...)`	`startCrawl(url, options?)`
`checkCrawlStatus(id, ...)`	`getCrawlStatus(id)`
`cancelCrawl(id)`	`cancelCrawl(id)`
`checkCrawlErrors(id)`	`getCrawlErrors(id)`

Batch Scraping

v1	v2
`batchScrapeUrls(urls, ...)`	`batchScrape(urls, opts?)` (waiter)
`asyncBatchScrapeUrls(urls, ...)`	`startBatchScrape(urls, opts?)`
`checkBatchScrapeStatus(id, ...)`	`getBatchScrapeStatus(id)`
`checkBatchScrapeErrors(id)`	`getBatchScrapeErrors(id)`

Extraction

v1	v2
`extract(urls?, params?)`	`extract(args)`
`asyncExtract(urls, params?)`	`startExtract(args)`
`getExtractStatus(id)`	`getExtractStatus(id)`

Other / Removed

v1	v2
`generateLLMsText(...)`	(not in v2 SDK)
`checkGenerateLLMsTextStatus(id)`	(not in v2 SDK)
`crawlUrlAndWatch(...)`	`watcher(jobId, ...)`
`batchScrapeUrlsAndWatch(...)`	`watcher(jobId, ...)`

Type name changes (v1 → v2)

Core Document Types

v1	v2
`FirecrawlDocument`	`Document`
`FirecrawlDocumentMetadata`	`DocumentMetadata`

Scrape, Search, and Map Types

v1	v2
`ScrapeParams`	`ScrapeOptions`
`ScrapeResponse`	`Document`
`SearchParams`	`SearchRequest`
`SearchResponse`	`SearchData`
`MapParams`	`MapOptions`
`MapResponse`	`MapData`

Crawl Types

v1	v2
`CrawlParams`	`CrawlOptions`
`CrawlStatusResponse`	`CrawlJob`

Batch Operations

v1	v2
`BatchScrapeStatusResponse`	`BatchScrapeJob`

Action Types

v1	v2
`Action`	`ActionOption`

Error Types

v1	v2
`FirecrawlError`	`SdkError`
`ErrorResponse`	`ErrorDetails`

Python (sync)

Method name changes (v1 → v2)

Scrape, Search, and Map

v1	v2
`scrape_url(...)`	`scrape(...)`
`search(...)`	`search(...)`
`map_url(...)`	`map(...)`

Crawling

v1	v2
`crawl_url(...)`	`crawl(...)` (waiter)
`async_crawl_url(...)`	`start_crawl(...)`
`check_crawl_status(...)`	`get_crawl_status(...)`
`cancel_crawl(...)`	`cancel_crawl(...)`

Batch Scraping

v1	v2
`batch_scrape_urls(...)`	`batch_scrape(...)` (waiter)
`async_batch_scrape_urls(...)`	`start_batch_scrape(...)`
`get_batch_scrape_status(...)`	`get_batch_scrape_status(...)`
`get_batch_scrape_errors(...)`	`get_batch_scrape_errors(...)`

Extraction

v1	v2
`extract(...)`	`extract(...)`
`start_extract(...)`	`start_extract(...)`
`get_extract_status(...)`	`get_extract_status(...)`

Other / Removed

v1	v2
`generate_llms_text(...)`	(not in v2 SDK)
`get_generate_llms_text_status(...)`	(not in v2 SDK)
`watch_crawl(...)`	`watcher(job_id, ...)`

Python (async)

AsyncFirecrawl mirrors the same methods (all awaitable).

Formats and scrape options

Use string formats for basics: "markdown", "html", "rawHtml", "links", "summary".
Instead of parsePDF use parsers: [ { "type": "pdf" } | "pdf" ].
Use object formats for JSON, change tracking, and screenshots:

JSON format

 curl -X POST https://api.firecrawl.dev/v2/scrape \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer YOUR_API_KEY' \
     -d '{
       "url": "https://docs.firecrawl.dev/",
       "formats": [{
         "type": "json",
         "prompt": "Extract the company mission from the page."
       }]
     }'

Screenshot format

  curl -X POST https://api.firecrawl.dev/v2/scrape \
      -H 'Content-Type: application/json' \
      -H 'Authorization: Bearer YOUR_API_KEY' \
      -d '{
        "url": "https://docs.firecrawl.dev/",
        "formats": [{
          "type": "screenshot",
          "fullPage": true,
          "quality": 80,
          "viewport": { "width": 1280, "height": 800 }
        }]
      }'

Crawl options mapping (v1 → v2)

v1	v2
`allowBackwardCrawling`	(removed) use `crawlEntireDomain`
`maxDepth`	(removed) use `maxDiscoveryDepth`
`ignoreSitemap` (bool)	`sitemap` (e.g., `"only"`, `"skip"`, or `"include"`)
(none)	`prompt`

Crawl prompt + params preview

See crawl params preview examples:

  curl -X POST https://api.firecrawl.dev/v2/crawl-params-preview \
      -H 'Content-Type: application/json' \
      -H 'Authorization: Bearer YOUR_API_KEY' \
      -d '{
        "url": "https://docs.firecrawl.dev",
        "prompt": "Extract docs and blog"
      }'

What's Changed

Add a couple exceptions to our blocked list by @micahstairs in #1816
fix(api/v1/types): depth check throws error if URL is invalid by @mogery in #1821
(feat/rtxt) Improved robots control on scrape via flags by @nickscamara in #1820
fix/actions dict attributeError by @rafaelsideguide in https://gi...

Contributors

wwhurley, ChetanGoti, and 10 other contributors

Assets 2

Releases: firecrawl/firecrawl

v2.8.0

v2.8.0

Firecrawl v2.8.0 is here!

New Features

API Improvements

SDK Improvements

JavaScript SDK

Python SDK

Dashboard Improvements

Quality & Performance

Self-Hosted Improvements

New Contributors

What's Changed

Contributors

Uh oh!

v2.7.0

Firecrawl v2.7.0 is here!

New Features

API Improvements

SDK Improvements

JavaScript SDK

Python SDK

Quality & Performance

Self-Hosted Improvements

👥 New Contributors

What's Changed

Contributors

Uh oh!

v2.6.0

v2.6.0

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v2.5.0 - The World's Best Web Data API

v2.5.0 - The World's Best Web Data API

New Features

Enhancements & Improvements

Fixes

👋 New Contributors

What's Changed

New Contributors

Contributors

Uh oh!

v2.4.0

v2.4.0

New Features

Improvements

Fixes

New Contributors

What's Changed

New Contributors

Contributors

Uh oh!

v2.3.0

v2.3.0

New Features

Improvements & Fixes

What's Changed

New Contributors

Contributors

Uh oh!

v2.2.0

v2.2.0

Features

Improvements

🐛 Fixes

🛠️ Chores & Other

What's Changed

Contributors

Uh oh!

v2.1.0

Firecrawl v2.1.0 is here!

✨ New Features

🛠 Fixes

👥 New Contributors

What's Changed

New Contributors