Releases: firecrawl/firecrawl
v2.8.0
v2.8.0
Firecrawl v2.8.0 is here!
Firecrawl v2.8.0 brings major improvements to agent workflows, developer tooling, and self-hosted deployments across the API and SDKs, including our new Skill.
- Parallel Agents for running thousands of
/agentqueries simultaneously, powered by our new Spark 1 Fast model. - Firecrawl CLI with full support for scrape, search, crawl, and map commands.
- Firecrawl Skill for enabling AI agents (Claude Code, Codex, OpenCode) to use Firecrawl autonomously.
- Three new models powering /agent: Spark 1 Fast for instant retrieval (currently only available in Playground), Spark 1 Mini for complex research queries, and Spark 1 Pro for advanced extraction tasks.
- Agent enhancements including webhooks, model selection, and new MCP Server tools.
- Platform-wide performance improvements including faster search execution and optimized Redis calls.
- SDK improvements including Zod v4 compatibility.
And much more, check it out below!
New Features
-
Parallel Agents
Execute thousands of/agentqueries in parallel with automatic failure handling and intelligent waterfall execution. Powered by Spark 1-Fast for instant retrieval, automatically upgrading to Spark 1 Mini for complex queries requiring full research. -
Firecrawl CLI
New command-line interface for Firecrawl with full support for scrape, search, crawl, and map commands. Install withnpm install -g firecrawl-cli. -
Firecrawl Skill
Enables agents like Claude Cursor, Codex, and OpenCode to use Firecrawl for web scraping and data extraction, installable vianpx skills add firecrawl/cli. -
Spark Model Family
Three new models powering /agent: Spark 1 Fast for instant retrieval (currently available in Playground), Spark 1 Mini (default) for everyday extraction tasks at 60% lower cost, and Spark 1 Pro for complex multi-domain research requiring maximum accuracy. Spark 1 Pro achieves ~50% recall while Mini delivers ~40% recall, both significantly outperforming tools costing 4-7x more per task. -
Firecrawl MCP Server Agent Tools
Newfirecrawl_agentandfirecrawl_agent_statustools for autonomous web data gathering via MCP-enabled agents. -
Agent Webhooks
Agent endpoint now supports webhooks for real-time notifications on job completion and progress. -
Agent Model Selection
Agent endpoint now accepts amodelparameter and includes model info in status responses. -
Multi-Arch Docker Images
Self-hosted deployments now supportlinux/arm64architecture in addition toamd64. -
Sitemap-Only Crawl Mode
New crawl option to exclusively use sitemap URLs without following links. -
ignoreCacheMap Parameter
New option to bypass cached results when mapping URLs. -
Custom Headers for
/map
Map endpoint now supports custom request headers. -
Background Image Extraction
Scraper now extracts background images from CSS styles. -
Improved Error Messages
All user-facing error messages now include detailed explanations to help diagnose issues.
API Improvements
- Search without concurrency limits — scrapes in search now execute directly without queue overhead.
- Return
400for unsupported actions with clear errors when requested actions aren't supported by available engines. - Job ID now included in search metadata for easier tracking.
- Metadata responses now include detected timezone.
- Backfill metadata title from
og:titleortwitter:titlewhen missing. - Preserve
gidparameter when rewriting Google Sheets URLs. - Fixed v2 path in batch scrape status pagination.
- Validate team ownership when appending to existing crawls.
- Screenshots with custom viewport or quality settings now bypass cache.
- Optimized Redis calls across endpoints.
- Reduced excessive
robots.txtfetching and parsing. - Minimum request timeout parameter now configurable.
SDK Improvements
JavaScript SDK
- Zod v4 Compatibility — schema conversion now works with Zod v4 with improved error detection.
- Watcher Exports —
WatcherandWatcherOptionsnow exported from the SDK entrypoint. - Agent Webhook Support — new webhook options for agent calls.
- Error Retry Polling — SDK retries polling after transient errors.
- Job ID in Exceptions — error exceptions now include
jobIdfor debugging.
Python SDK
- Manual pagination helpers for iterating through results.
- Agent webhook support added to agent client.
- Agent endpoint now accepts model selection parameter.
- Metadata now includes concurrency limit information.
- Fixed
max_pageshandling in crawl requests.
Dashboard Improvements
- Dark mode is now supported.
- On the usage page, you can now view credit usage broken down by day.
- On the activity logs page, you can now filter by the API key that was used.
- The "images" output format is now supported in the Playground.
- All admins can now manage their team's subscriptions.
Quality & Performance
- Skip markdown conversion checks for large HTML documents.
- Export Google Docs as HTML instead of PDF for improved performance.
- Improved branding format with better logo detection and error messages for PDFs and documents.
- Improved
lopdfmetadata loading performance. - Updated
html-to-markdownmodule with multiple bug fixes. - Increased markdown service body limit and added request ID logging.
- Better Sentry filtering for cancelled jobs and engine errors.
- Fixed extract race conditions and RabbitMQ poison pill handling.
- Centralized Firecrawl configuration across the codebase.
- Multiple security vulnerability fixes, including CVE-2025-59466 and lodash prototype pollution.
Self-Hosted Improvements
- CLI custom API URL support via
firecrawl --api-url http://localhost:3002for local instances. - ARM64 Docker support via multi-arch images for Apple Silicon and ARM servers.
- Fixed docker-compose database credentials out of the box.
- Fixed Playwright service startup caused by Chromium path issues.
- Updated Node.js to major version 22 instead of a pinned minor.
- Added RabbitMQ health check endpoint.
- Fixed PostgreSQL port exposure in docker-compose.
New Contributors
Full Changelog: v2.7.0...v2.8.0
What's Changed
- refactor(api): centralize firecrawl config by @amplitudesxd in #2496
- fix(config): add .catch to NUQ worker port defaults for error handling by @amplitudesxd in #2505
- (sdk)fix/same timeout as api now by @rafaelsideguide in #2503
- (sdks)feat/added concurrency info to metadata by @rafaelsideguide in #2502
- fix: make srcset URLs absolute in HTML transformation by @Chadha93 in #2515
- feat(api/admin/crawl-monitor): add endpoint for monitoring crawl system by @mogery in #2518
- feat(api/logRequest): associate requests with API keys by @mogery in #2519
- Fix Config Load on Tests by @abimaelmartell in #2506
- feat(api): update model usage to gpt-4o-mini by @amplitudesxd in #2520
- feat(api/scrapeURL): engpicker integ by @mogery in #2523
- fix(playwright-service-ts): wasn't starting up due to the lack of chromium under /tmp/.cache by @dmlarionov in #2512
- added timezone to metadata response by @rafaelsideguide in #2526
- Increase Go Service Write Timeout by @abimaelmartell in #2489
- (python-sdk)fix/max_pages by @rafaelsideguide in #2527
- Use invoiced billing for certain expansion packs by @micahstairs in #2532
- Fix PostgreSQL port exposure in docker-compose by @abimaelmartell in #2530
- (feat/partners) Allow email to be optional for partners API by @nickscamara in #2533
- Advanced model for recursive schemas by @amplitudesxd in #2535
- feat(api): update gpt-4o usage to gpt-4.1 by @amplitudesxd in #2536
- fix(api): cost tracking by @amplitudesxd in #2537
- Update Sentry for ZDR compliance by @abimaelmartell in #2529
- sanitize null-byte strings and report robustInsert failures to Sentry by @abimaelmartell in #2538
- Dont log Feature Flog Errors to Sentry by @abimaelmartell in #2540
- Debug logs to Extract Updates by @abimaelmartell in #2539
- Update test site build by @abimaelmartell in #2543
- feat: increase precrawl limits by @delong3 in #2544
- fix(api): engines for robots and scrape + reduced sitemap limit by @delong3 in #2545
- Webhook dispatcher by @amplitudesxd in #2534
- (feat/partner-integrations) Rotate endpoint by @nickscamara in #2547
- fix: extract race condition by @amplitudesxd in https://github.com/fir...
v2.7.0
Firecrawl v2.7.0 is here!
- ZDR Search support for enterprise customers.
- Improved Branding Format with better detection.
- Partner Integrations API now in closed beta.
- Faster and more accurate screenshots.
- Self-hosted improvements
And a lot more enhacements, check it out below!
New Features
-
Improved Branding Extract
Better logo and color detection for more accurate brand extraction results. -
NOQ Scrape System (Experimental)
New scrape pipeline with improved stability and integrated concurrency checks. -
Enhanced Redirect Handling
URLs now resolve before mapping, with safer redirect-chain detection and new abort timeouts. -
Enterprise Search Parameters
New enterprise-level options available for the/searchendpoint. -
Integration-Based User Creation
Users can now be automatically created when coming from referring integrations. -
minAgeScrape Parameter
Allows requiring a minimum cached age before re-scraping. -
Extract Billing Credits
Extract jobs now use the same credit billing system as other endpoints. -
Self-Host: Configurable Crawl Concurrency
Self-hosted deployments can now set custom concurrency limits. -
Sentry Enhancements
Added Vercel AI integration, configurable sampling rates, and improved exception filtering. -
UUIDv7 IDs
All new resources use lexicographically sortable UUIDv7.
API Improvements
- DNS Resolution Errors Now Return 200 for more consistent failure handling.
- Improved URL Mapping Logic including sitemap
maxAgefixes, recursive sitemap support, Vue/Angular router normalization, and skipping subdomain logic for IP addresses. - Partial Results for Multi-Source Search instead of failing all sources.
- Concurrency Metadata Added to scrape job responses.
- Enhanced Metrics including total wait time, LLM usage, and format details.
- Batch Scrape Upgrades
- Added missing
/v2/batch/scrape/:jobId/errorsendpoint - Fixed pagination off-by-one bug
- Added missing
- More Robust Error Handling for PDF/document engines, pydantic parsing, Zod validation, URL validation, and billing edge cases.
SDK Improvements
JavaScript SDK
- Returns job ID from synchronous methods.
- Improved WebSocket
documentevent handling. - Fixed types, Deno WS, and added support for
ignoreQueryParameter. - Version bump with internal cleanup.
Python SDK
- Added extra metadata fields.
- Improved batch validation handling.
Quality & Performance
- Reduced log file size and improved tmp file cleanup.
- Updated Express version and patched vulnerable packages.
- Disabled markdown conversion for sitemap scrapes for improved performance.
- Better precrawl logging and formatting.
- Skip URL rewriting for published Google Docs.
- Prevent empty cookie headers during webhook callbacks.
Self-Hosted Improvements
- Disabled concurrency limit enforcement for self-hosted mode.
- PostgreSQL credentials now configurable via environment variables.
- Docker-compose build instructions fixed.
👥 New Contributors
Full Changelog: v2.6.0...v2.7.0
What's Changed
- (feat/dns) DNS Resolution errors should be a 200 by @nickscamara in #2402
- Improve Logo and Color Detection on Branding Extract by @abimaelmartell in #2362
- (js-sdk) fix: ws 'document' event implementation by @rafaelsideguide in #2415
- (js-sdk): Return job ID from synchronous methods by @abimaelmartell in #2414
- (js-sdk): Fix types by @abimaelmartell in #2416
- (js-sdk): Bump Version by @abimaelmartell in #2417
- fix(api): disable gcs logging without db auth by @delong3 in #2418
- feat: noq scrape system by @delong3 in #2419
- Muv2 exp add more logs by @tomkosm in #2421
- feat(api): noq concurrency check integration by @delong3 in #2424
- Fix concurrency backfill bug by @micahstairs in #2425
- feat: redirect to docs when hitting main api endpoint by @amplitudesxd in #2426
- feat(api): total wait time in request metrics by @delong3 in #2428
- (fix/search) rm legacy external search apis by @nickscamara in #2420
- fix(api): various vulnerable packages (2025/11/21) by @mogery in #2431
- fix(api): pdf + document engines not respecting skipTlsVerification flag and error handling for uncidi by @delong3 in #2435
- fix(api): /map returning less urls with sitemap include by @delong3 in #2440
- feat: resolve redirects before mapping urls by @amplitudesxd in #2439
- fix(api): tally system rework by @mogery in #2430
- fix(api): update URL handling of resolved redirects by @amplitudesxd in #2442
- fix(api): opaque fire engine delete + poll interval by @delong3 in #2443
- feat(api): add abort timeout for resolveRedirects by @amplitudesxd in #2444
- (python-sdk) feat: added extra fields to metadata by @rafaelsideguide in #2441
- fix(api): handle case with no billed teams in tallyBilling function by @amplitudesxd in #2446
- fix: Add support for ignoreQueryParameter in map SDKs by @Chadha93 in #2429
- fix(api): vue + angular router url normalization by @delong3 in #2447
- feat(api): usedLlm + formats in request metrics by @delong3 in #2448
- feat(api): switch to uuidv7 by @mogery in #2449
- Add Sentry Settings by @abimaelmartell in #2451
- Filter Sentry Exceptions by @abimaelmartell in #2453
- Add minAge parameter to scrape (ENG-4073) by @amplitudesxd in #2452
- Fix typos by @omahs in #2457
- fix docker-compose service build instructions by @davidkhala in #2406
- Cleanup tmp files from downloadFile by @abimaelmartell in #2455
- Optimize Logs File Size by @abimaelmartell in #2456
- Update express version by @abimaelmartell in #2465
- Annotate test failures on CI by @abimaelmartell in #2462
- fix(api/precrawl): precrawl logging + format + skip index by @delong3 in #2466
- fix(go-html-to-md): request body max 60MB by @delong3 in #2467
- fix(api): dns + crawl denial errors by @delong3 in #2469
- Disable markdown conversion for sitemap scrapes by @abimaelmartell in #2461
- Add missing /v2/batch/scrape/:jobId/errors endpoint by @devin-ai-integration[bot] in #2471
- fix: improve pydantic parsing error handling | ENG-4070 by @Chadha93 in #2450
- fix: Make PostgreSQL credentials configurable via environment variables by @DraPraks in #2388
- feat(api): create users via referring integrations by @mogery in #2463
- feat: muv2 exp apikey env by @tomkosm in #2472
- (feat/search) Enterprise params by @nickscamara in #2412
- Validate UUID from URL in Requests by @abimaelmartell in #2392
- Disable Concurrency Limit on Self Hosted by @abimaelmartell in #2475
- (js sdk)fix/ws deno by @rafaelsideguide in #2476
- fix(api): sitemap max age for map requests by @delong3 in #2479
- fix(api): sitemap max age for recursive sitemaps by @delong3 in #2480
- feat: new app database shape by @mogery in #2445
- chore(api): disable x-powered-by by @amplitudesxd in #2483
- Skip subdomain logic for IP addresses by @abimaelmartell in #2477
- Attempt Fix Search Tests by @abimaelmartell in #2478
- fix(api): don't bill where stealth proxy was unsupported by @amplitudesxd in #2484
- feat(extract): port to billing credits by @mogery in #2482
- feat: [self-host] - add support to configure concurrency for crawl...
v2.6.0
v2.6.0
Highlights
- Unified Billing Model - Credits and tokens merged into single system. Extract now uses credits (15 tokens = 1 credit), existing tokens work everywhere.
- Full Release of Branding Format - Full support across Playground, MCP, JS and Python SDKs.
- Change Tracking - Faster and more reliable detection of web page content updates.
- Reliability and Speed Improvements - All endpoints significantly faster with improved reliability.
- Instant Credit Purchases - Buy credit packs directly from dashboard without waiting for auto-recharge.
- Improved Markdown Parsing - Enhanced markdown conversion and main content extraction accuracy.
- Core Stability Fixes - Fixed change-tracking issues, PDF timeouts, and improved error handling.
What's Changed
- fix(mu): Bug fix on v2 exp by @tomkosm in #2345
- Allow index use with waitFor (ENG-3481) by @amplitudesxd in #2346
- Fix autoCharge return, add top level guard by @abimaelmartell in #2341
- fix: import MAX_MAP_LIMIT from types.ts to resolve 1000 URL cap by @prashu0705 in #2333
- chore: improve llm extract logging by @amplitudesxd in #2348
- fix: error truncation by @amplitudesxd in #2349
- feat(go-html-to-md): enhance markdown conversion with robust PRE and … by @rafaelsideguide in #2321
- feat(billing): merge credits and tokens by @mogery in #2352
- chore: update geoip database by @amplitudesxd in #2354
- Implement Branding Format by @abimaelmartell in #2326
- Filter non-HTTP(S) protocols with separate error message by @devin-ai-integration[bot] in #2357
- Add branding format support to JS and Python SDKs by @devin-ai-integration[bot] in #2360
- Fix: Handle invalid favicon URLs gracefully in metadata extraction by @abimaelmartell in #2361
- feat: allow disabling webhook delivery by @amplitudesxd in #2367
- feat: add engine forcing by domain pattern by @devin-ai-integration[bot] in #2371
- revert nuq commits by @amplitudesxd in #2376
- CI: Remove npm audit from server tests by @abimaelmartell in #2385
- ci: Fix dependency audit by @abimaelmartell in #2386
- (fix/ctracking) Fix change tracking issues by @nickscamara in #2391
- update: Adds support for recursive schema for
python-sdkwith model selection by @Chadha93 in #2266 - fix: image search field mapping in Python SDK by @naaa760 in #2244
- fix(api/scrape): document + pdf scrape loop by @delong3 in #2396
New Contributors
- @prashu0705 made their first contribution in #2333
- @naaa760 made their first contribution in #2244
Full Changelog: v2.5.0...v2.6.0
v2.5.0 - The World's Best Web Data API
v2.5.0 - The World's Best Web Data API
We now have the highest quality and most comprehensive web data API available powered by our new semantic index and custom browser stack.
See the benchmarks below:
New Features
- Implemented scraping for
.xlsx(Excel) files. - Introduced new crawl architecture and NUQ concurrency tracking system.
- Per-owner/group concurrency limiting + dynamic concurrency calculation.
- Added group backlog handling and improved group operations.
- Added
/searchpricing update - Added team flag to skip country check.
- Always populate NUQ metrics for improved observability.
- New test-site app for improved CI testing.
- Extract metadata from document head for richer output.
Enhancements & Improvements
- Improved blocklist loading and unsupported site error messages.
- Updated x402-express version.
- Improved includePaths handling for subdomains.
- Updated self-hosted search to use DuckDuckGo.
- JS & Python SDKs no longer require API key for self-hosted deployments.
- Python SDK timeout handling improvements.
- Rust client now uses
tracinginstead ofprint. - Reduced noise in auto-recharge Slack notifications.
Fixes
- Ensured crawl robots.txt warnings surface reliably.
- Resolved concurrency deadlocks and duplicate job handling.
- Fixed search country defaults and pricing logic bugs.
- Fixed port conflicts in harness environments.
- Fixed viewport dimension support and screenshot behavior in Playwright.
- Resolved CI test flakiness (playwright cache, prod tests).
👋 New Contributors
Full diff: v2.4.0...v2.5.0
What's Changed
- More verbose blocklist loading errors by @amplitudesxd in #2277
- Update x402-express Version by @abimaelmartell in #2279
- Revise unsupported site error message by @micahstairs in #2286
- feat: index precrawl by @delong3 in #2289
- fix: ensure includePaths apply to subdomains when allowSubdomains is enabled by @abimaelmartell in #2278
- Fix search country parameter to default to undefined when location is set by @devin-ai-integration[bot] in #2283
- Fix Port Conflict in Harness by @abimaelmartell in #2285
- js-sdk: require API key only for cloud API (not self-hosted) by @abimaelmartell in #2237
- feat: Implement Scraping Excel xlsx files by @abimaelmartell in #2284
- feat(nuq): concurrency tracking by @mogery in #2291
- fix(crawl): surface robots.txt warning reliably by @ftonato in #2287
- feat(nuq): add source for max_concurrency by @mogery in #2293
- feat(nuq/concurrency-tracking): fix deadlock by @mogery in #2295
- Replace self-hosted Google with DDG search (ENG-3499) by @amplitudesxd in #2225
- python-sdk: Fix timeout handling across api calls by @abimaelmartell in #2288
- python-sdk: Don't require API Key when running Self Hosted by @abimaelmartell in #2290
- Add team flag to skip country check by @devin-ai-integration[bot] in #2300
- Update /search endpoint pricing to 2 credits per 10 search results by @devin-ai-integration[bot] in #2299
- Fix search pricing bug by @devin-ai-integration[bot] in #2301
- feat(nuq): per-owner-per-group concurrency limiting by @mogery in #2302
- update: handle circular refs as well in recursive schema by @Chadha93 in #2298
- feat(nuq): dynamically calculate current concurrency by @mogery in #2305
- feat(nuq): group_id, job backlogs, and group add operations by @mogery in #2309
- feat(ci): new test-site app + updated jest tests by @delong3 in #2312
- feat: new crawl architecture by @mogery in #2320
- Moved index for backlog query after the table creation by @c4nc in #2323
- fix(ci): playwright cache + prod tests by @delong3 in #2314
- Improve slack notifications for scale auto-recharges by @micahstairs in #2325
- Make auto-recharge notifications less noisy by @micahstairs in #2327
- fix: viewport dimension support for Playwright engine screenshots by @ftonato in #2329
- feat: always populate nuq metrics by @amplitudesxd in #2328
- fix: scrape viewport test by @amplitudesxd in #2330
- Revert "Merge pull request #2329 from firecrawl/devin/ENG-3639-175924… by @micahstairs in #2332
- fix(nuq): per-instance listen channel ID by @mogery in #2336
- fix(auto_charge): add a cooldown to the new recharge route by @mogery in #2338
- chore: update last scrape rpc by @amplitudesxd in #2339
- Rust client: use
tracinginstead of print by @codetheweb in #2324 - Extract metadata from document head (ENG-3822) by @amplitudesxd in #2342
- fix(nuq,concurrency-limit): handle if there are duplicate jobs in the concurrency queue by @mogery in #2343
New Contributors
- @delong3 made their first contribution in #2289
- @c4nc made their first contribution in #2323
- @codetheweb made their first contribution in #2324
Full Changelog: v2.4.0...v2.5.0
v2.4.0
v2.4.0
New Features
- New PDF Search Category - You can now search for only pdfs via our v2/search endpoints by specifying .pdf category
- Gemini 2.5 Flash CLI Image Editor — Create and edit images directly in the CLI using Firecrawl + Gemini 2.5 Flash integration (#2172)
- x402 Search Endpoint (
/v2/x402) — Added a next-gen search API with improved accuracy and speed (#2218) - RabbitMQ Event System — Firecrawl jobs now support event-based communication and prefetching from Postgres (#2230, #2233)
- Improved Crawl Status API — More accurate and real-time crawl status reporting using the new
crawl_status_2RPC (#2239) - Low-Results & Robots.txt Warnings — Users now receive clear feedback when crawls are limited by robots.txt or yield few results (#2248)
- Enhanced Tracing (OpenTelemetry) — Much-improved distributed tracing for better observability across services (#2219)
- Metrics & Analytics — Added request-level metrics for both Scrape and Search endpoints (#2216)
- Self-Hosted Webhook Support — Webhooks can now be delivered to private IP addresses for self-hosted environments (#2232)
Improvements
- Reduced Docker Image Size — Playwright service image size reduced by 1 GB by only installing Chromium (#2210)
- Python SDK Enhancements — Added
"cancelled"job status handling and poll interval fixes (#2240, #2265) - Faster Node SDK Timeouts — Axios timeouts now propagate correctly, improving reliability under heavy loads (#2235)
- Improved Crawl Parameter Previews — Enhanced prompts and validation for crawl parameter previews (#2220)
- Zod Schema Validation — Stricter API parameter validation with rejection of extra fields (#2058)
- Better Redis Job Handling — Fixed edge cases in
getDoneJobsOrderedUntilfor more stable Redis retrieval (#2258) - Markdown & YouTube Fixes — Fixed YouTube cache and empty markdown summary bugs (#2226, #2261)
- Updated Docs & Metadata — README updates and new metadata fields added to the JS SDK (#2250, #2254)
- Improved API Port Configuration — The API now respects environment-defined ports (#2209)
Fixes
- Fixed recursive
$refschema validation edge cases (#2238) - Fixed enum arrays being incorrectly converted to objects (#2224)
- Fixed harness timeouts and self-hosted
docker-compose.yamlissues (#2242, #2252)
New Contributors
🔗 Full Changelog: v2.3.0 → v2.4.0
What's Changed
- fix: add missing
poll_intervalparam in watcher by @Chadha93 in #2155 - feat: Add Firecrawl + Gemini 2.5 Flash Image CLI Editor by @MAVRICK-1 in #2172
- Add environment variable to disable blocklist by @amplitudesxd in #2197
- Fix ARM builds by @amplitudesxd in #2198
- fix(v1/search): if f-e search is available, only use that by @mogery in #2199
- Upgrade html-to-markdown dependency (ENG-3563) by @amplitudesxd in #2195
- feat(map): add crawler and scrape options to job logging by @ftonato in #2203
- refactor: integrate facilitator in payment middleware by @ftonato in #2213
- (feat/metrics) Scrape and Search Request Metrics by @nickscamara in #2216
- (feat/big-query) Big Query by @nickscamara in #2217
- feat(api): add x402 search endpoint to /v2 by @ftonato in #2218
- feat(api/otel): much improved tracing by @mogery in #2219
- fix: Add Zod validation to reject additionalProperties in schema parameters by @devin-ai-integration[bot] in #2058
- Reduce playwright-service image size by 1 GB by installing only Chromium by @bernie43 in #2210
- fix: enum arrays being converted to objects by @Chadha93 in #2224
- feat(nuq): RabbitMQ support for job finish events and waiting by @mogery in #2230
- fix: Use port from env.PORT for API by @abimaelmartell in #2209
- feat(nuq/rabbitmq): add prefetching jobs from psql to rabbitmq by @mogery in #2233
- fix: skip summary generation when markdown is empty by @devin-ai-integration[bot] in #2226
- Propagate timeout to Axios in Node SDK (ENG-3474) by @amplitudesxd in #2235
- feat(api/crawl-status): use crawl_status_2 RPC by @mogery in #2239
- Allow self-hosted webhook delivery to private IP addresses by @abimaelmartell in #2232
- Update harness timeout by @amplitudesxd in #2242
- python-sdk: include "cancelled" in CrawlJob.status and exit wait loop on cancel (fixes #2190) by @Jeelislive in #2240
- feat(api/ci): test with RabbitMQ on prod by @mogery in #2241
- (fix/crawl-params) Enhance crawl param preview prompt further by @nickscamara in #2220
- build(deps): bump actions/checkout from 3 to 5 by @dependabot[bot] in #2115
- fix: harness by @amplitudesxd in #2249
- Fix a self-hosted docker-compose.yaml bug caused by a recent firecrawl change by @th3w1zard1 in #2252
- fix: handle
$reffor recursive schema validation by @Chadha93 in #2238 - Add missing metadata fields to JS SDK (ENG-3439) by @amplitudesxd in #2250
- Update README.md by @nickscamara in #2254
- fix: handle edge case in getDoneJobsOrderedUntil function for Redis job retrieval by @ftonato in #2258
- Fix YouTube cache markdown bug by @devin-ai-integration[bot] in #2261
- feat(api): add warnings for low results and robots.txt restrictions in map and crawl controllers by @ftonato in #2248
- Test new mu alternative by @tomkosm in #2263
- chore(python-sdk): Bump version to 4.3.7 for poll_interval fix by @devin-ai-integration[bot] in #2265
- Feat/test new mu alt by @tomkosm in #2267
- (feat/search-index) Search Index by @nickscamara in #2268
- Feat/test new mu alt by @tomkosm in #2270
- (feat/search-index) Separate service by @nickscamara in #2271
- fix: additional
queue_scrapefor nuq schema by @Chadha93 in #2272 - (feat/search) Pdf search category by @nickscamara in #2276
New Contributors
- @Chadha93 made their first contribution in #2155
- @MAVRICK-1 made their first contribution in #2172
- @bernie43 made their first contribution in #2210
- @abimaelmartell made their first contribution in #2209
- @th3w1zard1 made their first contribution in #2252
Full Changelog: v2.3.0...v2.4.0
v2.3.0
v2.3.0
New Features
- YouTube Support: You can now get YouTube transcripts
- Enterprise Auto-Recharge: Added enterprise support for auto-recharge
- odt and .rtf: Now support odt and rtf file parsing
- Docx Parsing: 50x faster docx parsing
- K8s Deployment: Added NuQ worker deployment example
- Self Host: Tons of improvements for our self host users
Improvements & Fixes
- Stability: Fixed timeout race condition, infinite scrape loop, and location query bug
- Tooling: Replaced ts-prune with knip, updated pnpm with minimumReleaseAge
- Docs: Added Rust to CONTRIBUTING and fixed typos
- Security: Fixed
pkgvulnissue
What's Changed
- Update blocklist by @micahstairs in #2150
- docs: fix typo and punctuation in CONTRIBUTING.md by @jarrensj in #2149
- Fix timeout error message race condition for ENG-3372 by @devin-ai-integration[bot] in #2144
- Add exceptions to blocklist by @micahstairs in #2156
- fix: pkgvuln by @mogery in #2158
- Replace ts prune with knip (ENG-3540) by @amplitudesxd in #2148
- feat(auto-recharge): enterprise by @mogery in #2127
- feat(scrapeURL/index): index metrics by @mogery in #2160
- Update pnpm and add minimumReleaseAge (ENG-3560) by @amplitudesxd in #2162
- feat(api/scrapeURL): add special support for YouTube watch pages by @mogery in #2157
- fix(scrapeURL/index): locations array querying bug by @mogery in #2164
- Fix infinite loop when scraping a forbidden webpage (ENG-3339) by @amplitudesxd in #2147
- Add Rust to CONTRIBUTING by @oalsing in #2180
- feat(scrapeURL/summary): use gpt-5-mini by @mogery in #2174
- Custom Rust document parser (ENG-3489) by @amplitudesxd in #2159
- feat: add NuQ worker deployment to Kubernetes examples by @devin-ai-integration[bot] in #2163
- feat(api): move blocklist to DB by @mogery in #2186
New Contributors
Full Changelog: v2.2.0...v2.3.0
v2.2.0
v2.2.0
Features
- MCP version 3 is live. Stable support for cloud mcp with HTTP Transport and SSE modes. Compatible with v2 and v1 from.
- Webhooks: Now we support signatures + extract support + event failures
- Map is now 15x faster + supports more urls
- Search reliability improvements
- Usage is now tracked by API Key
- Support for additional locations (CA, CZ, IL, IN, IT, PL, and PT)
- Queue status endpoint
- Added
maxPagesparameter to v2 scrape API for pdf parsing
Improvements
- API:
- New
/team/queue-statusendpoint. - Added
nuqfeature. - Added
VIASOCKETintegration. - Historical credit/token usage endpoints with expanded data.
- New
- Student Program: Support for more universities + students to get free credits through our student program
- Map: 15x faster and increased the limit to 100k
- Scrape API: Added
maxPagesparameter for PDF parser. - Python SDK:
- Added
get_queue_statusto aio + normalization of docs in search results. - SDKs: Added next cursor pagination and integration param support.
- Added
- Infrastructure: Added static IP proxy pool + proxy location support.
- Webhooks: Implemented signatures, refactored sending, added scrape error events.
- Performance: Optimized map, converted Rust natives to single NAPI library.
- CI/CD: Revamped CI, added pre-commit hooks, cross-platform harness.
🐛 Fixes
- Corrected concurrency limit scaling.
- Fixed search result links/descriptions and retry mechanism for empty results.
- Re-signed expired screenshot URLs.
- Trimmed null chars from PDF titles + fixed encoding.
- Fixed sitemap parsing and added
.gzsitemap support. - Fixed js-sdk
zod-to-json-schemaimport. - Fixed webhook data format regression.
- Improved credit handling in account object.
🛠️ Chores & Other
- Removed unused dependencies, updated CONTRIBUTING.md.
- Added debug logging, ignored scripts during CI build.
- Various dependency bumps and build improvements.
🔗 Full Changelog: v2.1.0...v2.2.0
What's Changed
- feat(sdks): next cursor pagination by @rafaelsideguide in #2067
- feat: add maxPages parameter to PDF parser in v2 scrape API by @devin-ai-integration[bot] in #2047
- fix(concurrency-limit): scale! by @mogery in #2071
- feat(api): add /team/queue-status endpoint by @mogery in #2063
- build(deps): bump actions/checkout from 3 to 5 by @dependabot[bot] in #1998
- build(deps): bump actions/setup-python from 4 to 5 by @dependabot[bot] in #2028
- build(deps): bump docker/login-action from 1 to 3 by @dependabot[bot] in #1996
- build(deps): bump docker/build-push-action from 5 to 6 by @dependabot[bot] in #1995
- build(deps): bump actions/setup-node from 3 to 4 by @dependabot[bot] in #1997
- feat: historical credit/token usage endpoints + more data in existing usage endpoints by @mogery in #2077
- fix(api/tsconfig): remove baseUrl by @mogery in #2078
- fix(search): get links and descriptions correctly by @mogery in #2076
- fix(api/crawler/sitemap): bump sitemap limit by @mogery in #2079
- fix(api/scrapeURL/index): re-sign expired screenshot URLs by @mogery in #2080
- fix(python-sdk): added missing get_queue_status in aio and added to t… by @rafaelsideguide in #2081
- Fix go-html-to-md on Windows (ENG-3398) by @amplitudesxd in #2082
- fix(js-sdk): zod-to-json-schema import by @rafaelsideguide in #2083
- Replace custom address validation functions with ipaddr.js (ENG-3404) by @amplitudesxd in #2084
- fix(api/native/pdf-parser): trim null chars out of pdf titles by @mogery in #2086
- Fix sitemap parsing (ENG-3361) by @amplitudesxd in #2085
- Implement webhook signatures (ENG-3018) by @amplitudesxd in #2087
- Format api and add pre-commit hooks (ENG-3408) by @amplitudesxd in #2088
- Fix pre-commit hook by @amplitudesxd in #2089
- Ignore scripts during CI build by @amplitudesxd in #2090
- Add more debug logging to crawler by @amplitudesxd in #2091
- Add proxy location support to crawl and map endpoints (ENG-3361) by @amplitudesxd in #2092
- post-incident changes by @mogery in #2095
- feat(python-sdk): normalize docs in search results by @rafaelsideguide in #2098
- Feat(sdks): integration param by @rafaelsideguide in #2096
- Refactor webhook sending (ENG-3426) by @amplitudesxd in #2094
- Fix webhook data format regression by @amplitudesxd in #2106
- Update Type Annotations for v2 Async Search (SearchResponse → SearchData) by @devin-ai-integration[bot] in #2097
- Fire webhook events for scrape/batch scrape errors (ENG-3463) by @amplitudesxd in #2107
- Add static IP proxy pool (ENG-3420) by @amplitudesxd in #2103
- Convert all Rust natives to a single library using NAPI (ENG-3397) by @amplitudesxd in #2105
- Filter out invalidated index records (ENG-3396) by @amplitudesxd in #2102
- feat(sdk): added agent option by @rafaelsideguide in #2108
- feat(api): nuq by @mogery in #1984
- Make harness cross-platform compatible (ENG-3477) by @amplitudesxd in #2110
- Update error messages for self-hosted instances by @devin-ai-integration[bot] in #2119
- Revise /extract's error message when no content could be fetched from URLs by @micahstairs in #2109
- Fix graceful credit handling in account object (ENG-3495) by @amplitudesxd in #2120
- fix(scrapeURL/f-e/scrape): bad failed schema by @mogery in #2123
- feat(ci): revamp by @mogery in #2124
- fix(api/native/pdf): get title with proper encoding by @mogery in #2125
- chore: remove unused dependencies + various CI fixes by @mogery in #2128
- Perform watching inside of harness (ENG-3514) by @amplitudesxd in #2131
- Add gzipped sitemap support (ENG-3520) by @amplitudesxd in #2132
- Remove .xml.gz from
includeentries by @amplitudesxd in #2134 - fix(api): rearchitect crawl kickoff by @mogery in #2133
- Update CONTRIBUTING.md by @nickscamara in #2141
- Update CONTRIBUTING.md by @nickscamara in #2142
- (fix/search) Implement retry mechanisms for empty results by @nickscamara in #2140
- Optimize map (ENG-3526) by @amplitudesxd in #2138
- feat(api): add VIASOCKET integration by @ftonato in #2143
Full Changelog: v2.1.0...v2.2.0
v2.1.0
Firecrawl v2.1.0 is here!
✨ New Features
- Search Categories: Filter search results by specific categories using the
categoriesparameter:github: Search within GitHub repositories, code, issues, and documentationresearch: Search academic and research websites (arXiv, Nature, IEEE, PubMed, etc.)- More coming soon
- Image Extraction: Added image extraction support to the v2 scrape endpoint.
- Data Attribute Scraping: Now supports extraction of
data-*attributes. - Hash-Based Routing: Crawl endpoints now handle hash-based routes.
- Improved Google Drive Scraping: Added ability to scrape TXT, PDF, and Sheets from Google Drive.
- PDF Enhancements: Extracts PDF titles and shows them in metadata.
- API Enhancements:
- Map endpoint supports up to 100k results.
- Helm Chart: Initial Helm chart added for Firecrawl deployment.
- Security: Improved protection against XFF spoofing.
🛠 Fixes
- Fixed UTF-8 encoding in Google search scraper.
- Restored crawl status in preview mode.
- Fixed missing methods in Python SDK.
- Corrected JSON response handling for v2 search with
scrapeOptions.formats. - Fixed field population for
credits_billedin v0 scrape. - Improved document field overlay in v2 search.
👥 New Contributors
What's Changed
- fix: handle UTF-8 encoding properly in Google search scraper by @kelter-antunes in #1924
- feat(api): add image extraction support to v2 scrape endpoint by @vishkrish200 in #2008
- feat(api): support extraction of data-* attributes in scrape endpoints by @vishkrish200 in #2006
- feat: add initial Helm chart for Firecrawl deployment by @JakobStadlhuber in #1262
- feat(api/crawl): support hash-based routing by @mogery in #2031
- fix(python-sdk): missing methods in client by @rafaelsideguide in #2050
- feat(countryCheck): better protection against XFF spoofing by @mogery in #2051
- fix: include json in v2 /search response when using scrapeOptions.formats by @ieedan in #2052
- feat(scrapeURL/rewrite): scrape Google Drive TXT/PDF files and sheets by @mogery in #2053
- Update README.md by @nickscamara in #2060
- (fix/crawl) Re-enable crawl status in preview mode by @nickscamara in #2061
- feat(pdf-parser): get PDF title and show in metadata by @mogery in #2062
- fix(v2/search): overlay doc fields via spread operator by @mogery in #2054
- feat(api): propagate api_key_id towards billing function by @mogery in #2049
- feat(api/map): use new RPCs + set limit max to 100k by @mogery in #2065
- fix(api/v0/scrape): populate credits_billed field by @mogery in #2066
New Contributors
- @kelter-antunes made their first contribution in #1924
- @vishkrish200 made their first contribution in #2008
- @ieedan made their first contribution in #2052
Full Changelog: v2.0.1...v2.1.0
v2.0.1
This release fixes the "SSRF Vulnerability via malicious webhook" security advisory. It is recommended that people using the self-hosted version of Firecrawl update to v2.0.1 immediately. More info in the advisory: GHSA-p2wg-prhf-jx79
v2.0.0
Introducing v2.0.0
Key Improvements
-
Faster by default: Requests are cached with
maxAgedefaulting to 2 days, and sensible defaults likeblockAds,skipTlsVerification, andremoveBase64Imagesare enabled. -
New summary format: You can now specify
"summary"as a format to directly receive a concise summary of the page content. -
Updated JSON extraction: JSON extraction and change tracking now use an object format:
{ type: "json", prompt, schema }. The old"extract"format has been renamed to"json". -
Enhanced screenshot options: Use the object form:
{ type: "screenshot", fullPage, quality, viewport }. -
New search sources: Search across
"news"and"images"in addition to web results by setting thesourcesparameter. -
Smart crawling with prompts: Pass a natural-language
promptto crawl and the system derives paths/limits automatically. Use the new crawl-params-preview endpoint to inspect the derived options before starting a job.
Quick migration checklist
- Replace v1 client usage with v2 clients:
- JS:
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' }) - Python:
firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY') - API: use the new
https://api.firecrawl.dev/v2/endpoints.
- JS:
- Update formats:
- Use
"summary"where needed - JSON mode: Use
{ type: "json", prompt, schema }for JSON extraction - Screenshot and Screenshot@fullPage: Use screenshot object format when specifying options
- Use
- Adopt standardized async flows in the SDKs:
- Crawls:
startCrawl+getCrawlStatus(orcrawlwaiter) - Batch:
startBatchScrape+getBatchScrapeStatus(orbatchScrapewaiter) - Extract:
startExtract+getExtractStatus(orextractwaiter)
- Crawls:
- Crawl options mapping (see below)
- Check crawl
promptwithcrawl-params-preview
SDK surface (v2)
JS/TS
Method name changes (v1 → v2)
Scrape, Search, and Map
| v1 (FirecrawlApp) | v2 (Firecrawl) |
|---|---|
scrapeUrl(url, ...) |
scrape(url, options?) |
search(query, ...) |
search(query, options?) |
mapUrl(url, ...) |
map(url, options?) |
Crawling
| v1 | v2 |
|---|---|
crawlUrl(url, ...) |
crawl(url, options?) (waiter) |
asyncCrawlUrl(url, ...) |
startCrawl(url, options?) |
checkCrawlStatus(id, ...) |
getCrawlStatus(id) |
cancelCrawl(id) |
cancelCrawl(id) |
checkCrawlErrors(id) |
getCrawlErrors(id) |
Batch Scraping
| v1 | v2 |
|---|---|
batchScrapeUrls(urls, ...) |
batchScrape(urls, opts?) (waiter) |
asyncBatchScrapeUrls(urls, ...) |
startBatchScrape(urls, opts?) |
checkBatchScrapeStatus(id, ...) |
getBatchScrapeStatus(id) |
checkBatchScrapeErrors(id) |
getBatchScrapeErrors(id) |
Extraction
| v1 | v2 |
|---|---|
extract(urls?, params?) |
extract(args) |
asyncExtract(urls, params?) |
startExtract(args) |
getExtractStatus(id) |
getExtractStatus(id) |
Other / Removed
| v1 | v2 |
|---|---|
generateLLMsText(...) |
(not in v2 SDK) |
checkGenerateLLMsTextStatus(id) |
(not in v2 SDK) |
crawlUrlAndWatch(...) |
watcher(jobId, ...) |
batchScrapeUrlsAndWatch(...) |
watcher(jobId, ...) |
Type name changes (v1 → v2)
Core Document Types
| v1 | v2 |
|---|---|
FirecrawlDocument |
Document |
FirecrawlDocumentMetadata |
DocumentMetadata |
Scrape, Search, and Map Types
| v1 | v2 |
|---|---|
ScrapeParams |
ScrapeOptions |
ScrapeResponse |
Document |
SearchParams |
SearchRequest |
SearchResponse |
SearchData |
MapParams |
MapOptions |
MapResponse |
MapData |
Crawl Types
| v1 | v2 |
|---|---|
CrawlParams |
CrawlOptions |
CrawlStatusResponse |
CrawlJob |
Batch Operations
| v1 | v2 |
|---|---|
BatchScrapeStatusResponse |
BatchScrapeJob |
Action Types
| v1 | v2 |
|---|---|
Action |
ActionOption |
Error Types
| v1 | v2 |
|---|---|
FirecrawlError |
SdkError |
ErrorResponse |
ErrorDetails |
Python (sync)
Method name changes (v1 → v2)
Scrape, Search, and Map
| v1 | v2 |
|---|---|
scrape_url(...) |
scrape(...) |
search(...) |
search(...) |
map_url(...) |
map(...) |
Crawling
| v1 | v2 |
|---|---|
crawl_url(...) |
crawl(...) (waiter) |
async_crawl_url(...) |
start_crawl(...) |
check_crawl_status(...) |
get_crawl_status(...) |
cancel_crawl(...) |
cancel_crawl(...) |
Batch Scraping
| v1 | v2 |
|---|---|
batch_scrape_urls(...) |
batch_scrape(...) (waiter) |
async_batch_scrape_urls(...) |
start_batch_scrape(...) |
get_batch_scrape_status(...) |
get_batch_scrape_status(...) |
get_batch_scrape_errors(...) |
get_batch_scrape_errors(...) |
Extraction
| v1 | v2 |
|---|---|
extract(...) |
extract(...) |
start_extract(...) |
start_extract(...) |
get_extract_status(...) |
get_extract_status(...) |
Other / Removed
| v1 | v2 |
|---|---|
generate_llms_text(...) |
(not in v2 SDK) |
get_generate_llms_text_status(...) |
(not in v2 SDK) |
watch_crawl(...) |
watcher(job_id, ...) |
Python (async)
AsyncFirecrawlmirrors the same methods (all awaitable).
Formats and scrape options
- Use string formats for basics:
"markdown","html","rawHtml","links","summary". - Instead of
parsePDFuseparsers: [ { "type": "pdf" } | "pdf" ]. - Use object formats for JSON, change tracking, and screenshots:
JSON format
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev/",
"formats": [{
"type": "json",
"prompt": "Extract the company mission from the page."
}]
}'Screenshot format
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev/",
"formats": [{
"type": "screenshot",
"fullPage": true,
"quality": 80,
"viewport": { "width": 1280, "height": 800 }
}]
}'Crawl options mapping (v1 → v2)
| v1 | v2 |
|---|---|
allowBackwardCrawling |
(removed) use crawlEntireDomain |
maxDepth |
(removed) use maxDiscoveryDepth |
ignoreSitemap (bool) |
sitemap (e.g., "only", "skip", or "include") |
| (none) | prompt |
Crawl prompt + params preview
See crawl params preview examples:
curl -X POST https://api.firecrawl.dev/v2/crawl-params-preview \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"prompt": "Extract docs and blog"
}'What's Changed
- Add a couple exceptions to our blocked list by @micahstairs in #1816
- fix(api/v1/types): depth check throws error if URL is invalid by @mogery in #1821
- (feat/rtxt) Improved robots control on scrape via flags by @nickscamara in #1820
- fix/actions dict attributeError by @rafaelsideguide in https://gi...