Release/v1.3.0 alpha.1#6
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates configuration parameters and adds support for new datasets in the vector database benchmarking tool, specifically for the v1.3.0 alpha.1 release.
Changes:
- Updated IVF-GAS index parameters (nlist and nprobe values) in product configuration
- Added new food dataset configuration file with multiple index types
- Refactored dataset preparation script to use case-based configuration with support for new datasets
- Added utility script for generating random benchmark datasets
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| vectordb_bench/config-files/envector_products_config.yml | Updated nlist from 32768 to 1024 and nprobe from 6 to 16 for IVF-GAS configuration |
| vectordb_bench/config-files/envector_food_config.yml | New configuration file for FOOD512D101K dataset with FLAT, IVF-FLAT, and IVF-GAS index configurations |
| scripts/prepare_random_dataset.py | New script for generating random normalized vectors and ground truth neighbors for benchmarking |
| scripts/prepare_dataset.py | Refactored to use case-based dataset configuration with simplified CLI interface |
| README.md | Updated documentation with new dataset references and corrected example commands |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| --train-centroids True \ | ||
| --centroids-path "./centroids/embeddinggemma-300m/centroids.npy" \ | ||
| --nlist 32768 \ | ||
| --nlist 1024 \ |
There was a problem hiding this comment.
이거 embedding-gemma 쓸 때 는 32768 맞지않나요??
| nlist: 128 | ||
| nprobe: 6 | ||
| train_centroids: true | ||
| centroids_path: food/centroids/centroids_128.npy | ||
|
|
||
| # GAS: enVector-customized ANN | ||
| envectorivfgas: | ||
| <<: [*base_dataset, *base_envector] | ||
| index_name: food101_ivfgas | ||
| db_label: FOOD512D101K-IVFGAS | ||
| nlist: 1024 |
There was a problem hiding this comment.
IVF FLAT / IVF GAS 에서의 nlist 값이 다른데 의도하신걸까요?
| - `PRODUCTS512D400K` | ||
| - `FASHION512D200K` | ||
| - `FOOD512D75K` | ||
| - `PRODUCTS512D400K`: [cryptolab-playground/amazon-products-clip-vit-b-32](https://huggingface.co/datasets/cryptolab-playground/amazon-products-clip-vit-b-32) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Update v1.3.0-alpha.1 to update datasets