Automate your RAG research
| Problem | What AutoRAG-Research does |
|---|---|
| Every dataset has a different format. | We unify the formats and pre-computed embeddings for you. Just download and use. |
| Comparing against SOTA pipelines requires implementing each one. | We implement SOTA pipelines from papers. Benchmark yours against them. |
| Every paper claims SOTA. Which one actually is? | Run all pipelines on your data with one command and compare. |
Which pipeline is really SOTA? What datasets are out there? Find it all here.
We provide pre-processed datasets with unified formats. Some include pre-computed embeddings.
Text
| Dataset | Pipeline Support | Description |
|---|---|---|
| BEIR | Retrieval | Standard IR benchmark across 14 diverse domains (scifact, nq, hotpotqa, ...) |
| MTEB | Retrieval | Large-scale embedding benchmark with any MTEB retrieval task |
| RAGBench | Retrieval + Generation | End-to-end RAG evaluation with generation ground truth across 12 domains |
| MrTyDi | Retrieval | Multilingual retrieval across 11 languages |
| BRIGHT | Retrieval + Generation | Reasoning-intensive retrieval with gold answers |
Image
| Dataset | Pipeline Support | Description |
|---|---|---|
| ViDoRe | Retrieval + Generation* | Visual document QA with 1:1 query-to-page mapping |
| ViDoRe v2 | Retrieval | Visual document retrieval with corpus-level search |
| ViDoRe v3 | Retrieval | Visual document retrieval across 8 industry domains |
| VisRAG | Retrieval + Generation* | Vision-based RAG benchmark (ChartQA, SlideVQA, DocVQA, ...) |
Text + Image
| Dataset | Pipeline Support | Description |
|---|---|---|
| Open-RAGBench | Retrieval + Generation | arXiv PDF RAG with generation ground truth and multimodal understanding |
* Generation ground truth is available only for some sub-datasets.
SOTA pipelines implemented from papers, ready to run. There are two ways to build a RAG pipeline:
Standalone retrieval pipelines. Use them on their own for retrieval-only evaluation. If you also want to evaluate generation quality, combine any retrieval pipeline with an LLM using the BasicRAG generation pipeline — it takes a retrieval pipeline as input, feeds the retrieved results to an LLM, and produces generated answers you can evaluate with generation metrics.
| Pipeline | Description | Reference |
|---|---|---|
| Vector Search (DPR) | Dense vector similarity search (single-vector and multi-vector MaxSim) | EMNLP 20 |
| BM25 | Sparse full-text retrieval | - |
| HyDE | Hypothetical Document Embeddings | ACL 23 |
| Hybrid RRF | Reciprocal Rank Fusion of two retrieval pipelines | - |
| Hybrid CC | Convex Combination fusion of two retrieval pipelines | - |
These pipelines handle retrieval and generation together as a single algorithm. Each implements a specific paper's approach end-to-end.
| Pipeline | Description | Reference |
|---|---|---|
| BasicRAG | Any retrieval pipeline + LLM | NeurIPS 20 |
| IRCoT | Interleaving Retrieval with Chain-of-Thought | ACL 23 |
| ET2RAG | Majority voting on context subsets | Preprint / 25 |
| VisRAG | Vision-language model generation from retrieved images | ICLR 25 |
| MAIN-RAG | Multi-Agent Filtering RAG | ACL 25 |
Retrieval — Set-based: Recall, Precision, F1 / Rank-aware: nDCG, MRR, MAP
Generation — N-gram based: BLEU, METEOR, ROUGE / Embedding based: BERTScore, SemScore
Missing something? Open an issue and we will implement it. Or check our Plugin guide.
We strongly recommend using uv as your virtual environment manager. If you use uv, you must activate the virtual environment first — otherwise the CLI will not use your uv environment.
Option 1: Install Script (Recommended, Mac OS / Linux)
The install script handles Python environment, package installation, and PostgreSQL setup in one go.
curl -LsSf https://raw.githubusercontent.com/NomaDamas/AutoRAG-Research/main/scripts/install.sh -o install.sh
bash install.shManual Install
- Create and activate a virtual environment (Python 3.10+):
# uv (recommended)
uv venv .venv --python ">=3.10"
source .venv/bin/activate
# or standard venv
python3 -m venv .venv
source .venv/bin/activate- Install the package:
# uv (recommended)
uv add autorag-research
# or pip
pip install autorag-research- Set up PostgreSQL with VectorChord (Docker recommended):
autorag-research init
cd postgresql && docker compose up -d- Initialize configuration files:
autorag-research initThis creates configs/ with database, pipeline, metric, and experiment YAML files.
Now you can edit YAML files to setup your own experiments.
# 1. See available datasets
autorag-research show datasets
# 2-1. Ingest a dataset
autorag-research ingest --name beir --extra dataset-name=scifact
# 2-2. Or download a pre-ingested dataset including pre-computed embeddings
autorag-rsearch show datasets beir # type your ingestor name to see if pre-ingested versions are available
autorag-research data restore beir beir_arguana_test_qwen_3_0.6b # example command
# 3. Configure LLM — pick or create a config in configs/llm/
vim configs/llm/openai-gpt5-mini.yaml
# You should set your embedding models in embedding/ folder if needed
# 4. Edit experiment config — choose pipelines and metrics
vim configs/experiment.yaml
# 5. Check your DB connection
vim configs/db.yaml
# 6. Run your experiment
autorag-research run --db-name=beir_scifact_test
# 7. View results in a Gradio leaderboard UI (need to load your env variable for DB connection)
python -m autorag_research.reporting.uiconfigs/experiment.yaml is where you define which pipelines and metrics to run:
db_name: beir_scifact_test
pipelines:
retrieval: [bm25, vector_search]
generation: [basic_rag]
metrics:
retrieval: [recall, ndcg]
generation: [rouge]Generation pipelines (and some retrieval pipelines like HyDE) require an LLM. The llm field in each pipeline config references a file in configs/llm/ by name (without .yaml):
# configs/pipelines/generation/basic_rag.yaml
llm: openai-gpt5-mini # → loads configs/llm/openai-gpt5-mini.yamlPre-configured LLM options include anthropic-claude-4.5-sonnet, openai-gpt5-mini, google-gemini-3-flash, ollama, vllm, and more. See all options in configs/llm/.
For the full YAML configuration guide, see the Documentation.
| Command | Description |
|---|---|
autorag-research init |
Download default config files to ./configs/ |
autorag-research show datasets |
List available pre-built datasets to download |
autorag-research show ingestors |
List available data ingestors and their parameters |
autorag-research show pipelines |
List available pipeline configurations |
autorag-research show metrics |
List available evaluation metrics |
autorag-research show databases |
List ingested database schemas |
autorag-research ingest --name <name> |
Ingest a dataset into PostgreSQL |
autorag-research drop database --db-name <name> |
Drop a PostgreSQL database quickly |
autorag-research run --db-name <name> |
Run experiment with configured pipelines and metrics |
You can also type --help in any command to see detailed usage instructions.
Also, we provide a CLI Reference.
AutoRAG-Research supports a plugin system so you can add your own retrieval pipelines, generation pipelines, or evaluation metrics — and use them alongside the built-in ones in the same experiment.
A plugin is a standalone Python package. You implement your logic, register it via Python's entry_points, and the framework discovers and loads it automatically. No need to fork the repo or modify the core codebase.
What you can build:
| Plugin Type | What it does | Base Class |
|---|---|---|
| Retrieval Pipeline | Custom search/retrieval logic | BaseRetrievalPipeline |
| Generation Pipeline | Custom retrieve-then-generate logic | BaseGenerationPipeline |
| Retrieval Metric | Custom retrieval evaluation metric | BaseRetrievalMetricConfig |
| Generation Metric | Custom generation evaluation metric | BaseGenerationMetricConfig |
How it works:
# 1. Scaffold — generates a ready-to-edit project with config, code, YAML, and tests
autorag-research plugin create my_search --type=retrieval
# 2. Implement — edit the generated pipeline.py (or metric.py)
cd my_search_plugin
vim src/my_search_plugin/pipeline.py
# 3. Install — register the plugin in your environment
pip install -e .
# 4. Sync — copy the plugin's YAML config into your project's configs/ directory
autorag-research plugin sync
# 5. Use — add it to experiment.yaml and run like any built-in pipeline
autorag-research run --db-name=my_datasetAfter plugin sync, your plugin appears in configs/pipelines/ or configs/metrics/ and can be referenced in experiment.yaml just like any built-in component.
For the full implementation guide, see the Plugin Documentation.
AutoRAG-Research ships with an agent skill that lets AI coding agents (Claude Code, Codex, Kiro, Cursor, etc.) query your pipeline results directly from PostgreSQL using natural language.
# Install globally
npx skills add NomaDamas/AutoRAG-Research --skill autorag-queryYou: "Which pipeline has the best BLEU score?"
Agent: "hybrid_search_v2 achieved the highest BLEU score of 0.85."
For detailed usage, script options, and query templates, see the Agent Skill documentation.
We are open source project and always welcome contributions who love RAG! Feel free to open issues or submit pull requests on GitHub. You can check our Contribution Guide for more details.
This project is made by the creator of AutoRAG, Jeffrey & Bobb Kim. All works are done in NomaDamas, AI Hacker House in Seoul, Korea.