Skip to content

fix(api/crawler/sitemap): bump sitemap limit#2079

Merged
mogery merged 2 commits intomainfrom
mogery/eng-3309
Sep 1, 2025
Merged

fix(api/crawler/sitemap): bump sitemap limit#2079
mogery merged 2 commits intomainfrom
mogery/eng-3309

Conversation

@mogery
Copy link
Copy Markdown
Member

@mogery mogery commented Sep 1, 2025

Summary by cubic

Increase sitemap processing limit from 20 to 100 to improve crawl coverage on larger sites and avoid early cutoffs. Updates limit checks in crawler.ts and sitemap.ts to align with ENG-3309’s need for bigger sitemap support.

@mogery mogery requested a review from nickscamara as a code owner September 1, 2025 12:13
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.

}

if (this.sitemapsHit.size >= 20) {
if (this.sitemapsHit.size >= 100) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the magic number 100 with a shared constant to keep the sitemap limit consistent across modules and avoid future drift.

Prompt for AI agents
Address the following comment on apps/api/src/scraper/WebScraper/crawler.ts at line 867:

<comment>Replace the magic number 100 with a shared constant to keep the sitemap limit consistent across modules and avoid future drift.</comment>

<file context>
@@ -864,7 +864,7 @@ export class WebCrawler {
     }
 
-    if (this.sitemapsHit.size &gt;= 20) {
+    if (this.sitemapsHit.size &gt;= 100) {
       this.logger.warn(&quot;Sitemap limit hit!&quot;, { crawlId: this.jobId, url: this.baseUrl });
     }
</file context>
Copy link
Copy Markdown
Member Author

@mogery mogery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cubic-dev-ai review pls

@mogery
Copy link
Copy Markdown
Member Author

mogery commented Sep 1, 2025

@cubic-dev-ai review pls

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai bot commented Sep 1, 2025

@cubic-dev-ai review pls

@mogery I've started the AI code review. It'll take a few minutes to complete.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

@mogery mogery merged commit 68d3b84 into main Sep 1, 2025
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant