Conversation
|
@cubic review |
@nickscamara I've started the AI code review. It'll take a few minutes to complete. |
There was a problem hiding this comment.
3 issues found across 15 files
Prompt for AI agents (all 3 issues)
Understand the root cause of the following 3 issues and fix them.
<file name="apps/api/src/lib/search-index/query.ts">
<violation number="1" location="apps/api/src/lib/search-index/query.ts:102">
Because helpers already limit their results, slicing here makes any offset > 0 return an empty page. Pagination needs the upstream searches to fetch limit + offset (or another strategy) before slicing.</violation>
</file>
<file name="apps/api/src/lib/search-index/pinecone-service.ts">
<violation number="1" location="apps/api/src/lib/search-index/pinecone-service.ts:70">
isPineconeEnabled returns false whenever PINECONE_INDEX_NAME is unset, but getPineconeIndex defaults to "firecrawl-search", so the Pinecone path is never exercised when relying on that fallback.</violation>
</file>
<file name="apps/api/src/lib/search-index/chunker.ts">
<violation number="1" location="apps/api/src/lib/search-index/chunker.ts:190">
When a section overflow occurs, the overlap is computed from the new sentence, so the next chunk just repeats part of that sentence instead of carrying context from the previous chunk. That breaks the intended overlap semantics and drops prior context.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
|
@cubic re-run |
@nickscamara I've started the AI code review. It'll take a few minutes to complete. |
There was a problem hiding this comment.
7 issues found across 17 files
Prompt for AI agents (all 7 issues)
Understand the root cause of the following 7 issues and fix them.
<file name="apps/api/src/lib/search-index/query.ts">
<violation number="1" location="apps/api/src/lib/search-index/query.ts:102">
Offset pagination always returns empty results when offset > 0 because the helpers already cap the list to `limit`. Fetch `limit + offset` items or plumb the offset into the queries before slicing.</violation>
</file>
<file name="apps/api/src/lib/search-index/chunker.ts">
<violation number="1" location="apps/api/src/lib/search-index/chunker.ts:164">
`currentOffset` stays 0 in the sentence-based branch, so all emitted chunks report a start offset of 0 rather than their true position in the document.</violation>
<violation number="2" location="apps/api/src/lib/search-index/chunker.ts:185">
Chunks created by splitting a large section always report `startOffset: section.offset`, so later chunks from that section point to the wrong location in the source text.</violation>
<violation number="3" location="apps/api/src/lib/search-index/chunker.ts:190">
Resetting the sentence-splitting chunk with `sentence.slice(...)` duplicates the current sentence (overlapText already contains it), so each new chunk includes the sentence twice and corrupts token counts.</violation>
</file>
<file name="apps/api/src/lib/search-index/service.ts">
<violation number="1" location="apps/api/src/lib/search-index/service.ts:418">
This metric increments the embedding quota even when the embedding batch failed, because it keyes off `embeddingsEnabled` instead of actual successful records.</violation>
<violation number="2" location="apps/api/src/lib/search-index/service.ts:426">
`embeddingsEnabled` only reflects feature toggles, so after a caught embedding failure this still returns `embeddingsGenerated: true`, preventing callers from detecting the failure and retrying.</violation>
</file>
<file name="apps/api/src/services/search-index-db.ts">
<violation number="1" location="apps/api/src/services/search-index-db.ts:64">
When SEARCH_INDEX_SUPABASE_* vars are missing, the proxy returns a function for every property, so nested accesses like `search_index_supabase_service.storage.from(...)` blow up with `TypeError` instead of the intended configuration error. Throw directly in the proxy when the client is absent.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
There was a problem hiding this comment.
6 issues found across 17 files
Prompt for AI agents (all 6 issues)
Understand the root cause of the following 6 issues and fix them.
<file name="apps/api/src/scraper/scrapeURL/transformers/sendToSearchIndex.ts">
<violation number="1" location="apps/api/src/scraper/scrapeURL/transformers/sendToSearchIndex.ts:41">
The Authorization/Cookie guard only checks exact-case keys, so lowercase headers ("authorization"/"cookie") slip through, letting authenticated or private content be indexed.</violation>
</file>
<file name="apps/api/src/services/search-index-db.ts">
<violation number="1" location="apps/api/src/services/search-index-db.ts:64">
Returning a function here means nested Supabase properties (like auth.admin) resolve to undefined and trigger a TypeError instead of the intended configuration error, so the proxy no longer delivers the helpful message when the client is missing.</violation>
</file>
<file name="apps/api/src/lib/search-index/embeddings.ts">
<violation number="1" location="apps/api/src/lib/search-index/embeddings.ts:242">
The rate uses $0.0001 per million tokens, but OpenAI charges $0.02 per million for text-embedding-3-small, so this underestimates cost by 200×. Please update the multiplier to reflect the published pricing.</violation>
</file>
<file name="apps/api/src/controllers/v2/f-search.ts">
<violation number="1" location="apps/api/src/controllers/v2/f-search.ts:117">
`hasMore` will always be false because `result.total` is capped at `offset + limit`, so clients never see that more pages are available.</violation>
</file>
<file name="apps/api/src/lib/search-index/chunker.ts">
<violation number="1" location="apps/api/src/lib/search-index/chunker.ts:164">
`startOffset`/`endOffset` metadata for sentence-based chunks is wrong because `currentOffset` never advances, so every chunk from this branch reports the document start instead of its true position.</violation>
</file>
<file name="apps/api/src/lib/search-index/service.ts">
<violation number="1" location="apps/api/src/lib/search-index/service.ts:426">
If embedding generation throws, this still reports `embeddingsGenerated` as true so callers and metrics think the vectors exist even though we logged a failure. Track the actual success (e.g. based on `pineconeRecords` or a flag) before returning.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
|
@cubic re-run |
@nickscamara I've started the AI code review. It'll take a few minutes to complete. |
There was a problem hiding this comment.
3 issues found across 18 files
Prompt for AI agents (all 3 issues)
Understand the root cause of the following 3 issues and fix them.
<file name="apps/api/src/lib/search-index/query.ts">
<violation number="1" location="apps/api/src/lib/search-index/query.ts:184">
Hybrid mode hard-caps both the BM25 RPC and Pinecone query to 100 rows, so pagination after the first 100 documents fails (offsets ≥ 100 return empty results despite more matches). Please request at least the supplied fetchLimit so hybrid pagination works.</violation>
</file>
<file name="apps/api/src/lib/search-index/chunker.ts">
<violation number="1" location="apps/api/src/lib/search-index/chunker.ts:195">
Updating sentenceChunkOffset by the full chunk length drops the overlap’s original position, so the next chunk’s startOffset/endOffset no longer line up with the source text when overlapText is prepended.</violation>
</file>
<file name="apps/api/src/lib/search-index/service.ts">
<violation number="1" location="apps/api/src/lib/search-index/service.ts:478">
Hard-coding the normalized URL to https collapses http/https variants: http-only pages will be stored under an unreachable https URL, and distinct http/https resources will overwrite each other because the url_hash is built from this normalized value.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
Summary by cubic
Introduces a real-time search index. Enables fast, filtered search with RRF ranking and a safe canary rollout via sampling.