Skip to content

feat: support configurable embedding dimensions via VECTOR_STORE_DIMENSIONS#469

Open
TreyDong wants to merge 1 commit intoplastic-labs:mainfrom
TreyDong:feat/embed-dim-config
Open

feat: support configurable embedding dimensions via VECTOR_STORE_DIMENSIONS#469
TreyDong wants to merge 1 commit intoplastic-labs:mainfrom
TreyDong:feat/embed-dim-config

Conversation

@TreyDong
Copy link
Copy Markdown

@TreyDong TreyDong commented Mar 31, 2026

Summary

Currently embedding_client.py and models.py hardcode 1536 as the embedding dimension. This prevents using embedding providers with different dimensions (e.g., BGE-M3 at 1024 dimensions).

Changes

Replace hardcoded 1536 with settings.VECTOR_STORE.DIMENSIONS in:

  • embedding_client.py: 3 Gemini embed_content calls (lines ~85, ~119, ~255)
  • models.py: 2 Vector() columns — MessageEmbedding.embedding and Document.embedding — and add from .config import settings

Motivation

The VECTOR_STORE_DIMENSIONS config field already existed in src/config.py but was not being used in embedding_client.py or models.py. This change wires it up, allowing users to configure embedding dimensions via the VECTOR_STORE_DIMENSIONS environment variable instead of being locked to 1536.

This is particularly useful for embedding providers like BGE-M3 (1024 dimensions).

Backwards Compatibility

Default remains 1536 when the VECTOR_STORE_DIMENSIONS env var is unset.

Issue

Closes #463

Summary by CodeRabbit

  • Chores
    • Updated embedding vector configuration to use centralized settings instead of hardcoded values, improving system flexibility and maintainability.
…NSIONS

Replace hardcoded 1536 with settings.VECTOR_STORE.DIMENSIONS in:
- embedding_client.py: 3 Gemini embed_content calls
- models.py: 2 Vector() columns (MessageEmbedding, Document)

The VECTOR_STORE_DIMENSIONS config field already existed but was not
being used. This change wires it up, allowing users to configure
embedding dimensions via the VECTOR_STORE_DIMENSIONS environment
variable (e.g., 1024 for BGE-M3) instead of being locked to 1536.

Backwards compatible: default remains 1536 when env var is unset.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e84ef657-4ff7-4052-bdc8-16ba7035dbdf

📥 Commits

Reviewing files that changed from the base of the PR and between a5423b5 and 82a08e9.

📒 Files selected for processing (2)
  • src/embedding_client.py
  • src/models.py

Walkthrough

The changes replace hardcoded embedding dimension values of 1536 with the configurable settings.VECTOR_STORE.DIMENSIONS setting across embedding client request configuration and vector column definitions in data models.

Changes

Cohort / File(s) Summary
Embedding Client Configuration
src/embedding_client.py
Replaced hardcoded output_dimensionality: 1536 with settings.VECTOR_STORE.DIMENSIONS in three embedding request call sites: embed(), simple_batch_embed(), and _process_batch() methods.
Database Model Definitions
src/models.py
Added settings import from .config and updated vector column definitions in MessageEmbedding and Document classes from Vector(1536) to Vector(settings.VECTOR_STORE.DIMENSIONS).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 A rabbit hops with glee,
No more hardcoded 1536!
Dimensions flex, now configured free,
BGE-M3's 1024 won't be hexed. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: replacing hardcoded embedding dimensions with a configurable settings value.
Linked Issues check ✅ Passed All requirements from issue #463 are met: hardcoded 1536 values in embedding_client.py (3 locations) and models.py (2 locations) are replaced with settings.VECTOR_STORE.DIMENSIONS, and settings import is added to models.py.
Out of Scope Changes check ✅ Passed All changes are directly related to issue #463 and the stated objective of replacing hardcoded embedding dimensions with configurable settings values.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@VVoruganti
Copy link
Copy Markdown
Collaborator

This would require additional changes to the alembic migrations and models.py file to support storing different sized embeddings.

Please also update the alembic migrations and sqlalchemy files accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants