GitHub - NomaDamas/AutoRAG-Research: Automate your RAG research

Automate your RAG research

What is AutoRAG-Research?

Problem	What AutoRAG-Research does
Every dataset has a different format.	We unify the formats and pre-computed embeddings for you. Just download and use.
Comparing against SOTA pipelines requires implementing each one.	We implement SOTA pipelines from papers. Benchmark yours against them.
Every paper claims SOTA. Which one actually is?	Run all pipelines on your data with one command and compare.

Which pipeline is really SOTA? What datasets are out there? Find it all here.

Available Datasets

We provide pre-processed datasets with unified formats. Some include pre-computed embeddings.

Text

Dataset	Pipeline Support	Description
BEIR	Retrieval	Standard IR benchmark across 14 diverse domains (scifact, nq, hotpotqa, ...)
MTEB	Retrieval	Large-scale embedding benchmark with any MTEB retrieval task
RAGBench	Retrieval + Generation	End-to-end RAG evaluation with generation ground truth across 12 domains
MrTyDi	Retrieval	Multilingual retrieval across 11 languages
BRIGHT	Retrieval + Generation	Reasoning-intensive retrieval with gold answers

Image

Dataset	Pipeline Support	Description
ViDoRe	Retrieval + Generation*	Visual document QA with 1:1 query-to-page mapping
ViDoRe v2	Retrieval	Visual document retrieval with corpus-level search
ViDoRe v3	Retrieval	Visual document retrieval across 8 industry domains
VisRAG	Retrieval + Generation*	Vision-based RAG benchmark (ChartQA, SlideVQA, DocVQA, ...)

Text + Image

Dataset	Pipeline Support	Description
Open-RAGBench	Retrieval + Generation	arXiv PDF RAG with generation ground truth and multimodal understanding

* Generation ground truth is available only for some sub-datasets.

Available Pipelines

SOTA pipelines implemented from papers, ready to run. There are two ways to build a RAG pipeline:

1. Retrieval Pipeline

Standalone retrieval pipelines. Use them on their own for retrieval-only evaluation. If you also want to evaluate generation quality, combine any retrieval pipeline with an LLM using the BasicRAG generation pipeline — it takes a retrieval pipeline as input, feeds the retrieved results to an LLM, and produces generated answers you can evaluate with generation metrics.

Pipeline	Description	Reference
Vector Search (DPR)	Dense vector similarity search (single-vector and multi-vector MaxSim)	EMNLP 20
BM25	Sparse full-text retrieval	-
HyDE	Hypothetical Document Embeddings	ACL 23
Hybrid RRF	Reciprocal Rank Fusion of two retrieval pipelines	-
Hybrid CC	Convex Combination fusion of two retrieval pipelines	-

2. Generation Pipeline

These pipelines handle retrieval and generation together as a single algorithm. Each implements a specific paper's approach end-to-end.

Pipeline	Description	Reference
BasicRAG	Any retrieval pipeline + LLM	NeurIPS 20
IRCoT	Interleaving Retrieval with Chain-of-Thought	ACL 23
ET2RAG	Majority voting on context subsets	Preprint / 25
VisRAG	Vision-language model generation from retrieved images	ICLR 25
MAIN-RAG	Multi-Agent Filtering RAG	ACL 25

Available Metrics

Retrieval — Set-based: Recall, Precision, F1 / Rank-aware: nDCG, MRR, MAP

Generation — N-gram based: BLEU, METEOR, ROUGE / Embedding based: BERTScore, SemScore

Missing something? Open an issue and we will implement it. Or check our Plugin guide.

Setup

Install

We strongly recommend using uv as your virtual environment manager. If you use uv, you must activate the virtual environment first — otherwise the CLI will not use your uv environment.

Option 1: Install Script (Recommended, Mac OS / Linux)

The install script handles Python environment, package installation, and PostgreSQL setup in one go.

curl -LsSf https://raw.githubusercontent.com/NomaDamas/AutoRAG-Research/main/scripts/install.sh -o install.sh
bash install.sh

Manual Install

Create and activate a virtual environment (Python 3.10+):

# uv (recommended)
uv venv .venv --python ">=3.10"
source .venv/bin/activate

# or standard venv
python3 -m venv .venv
source .venv/bin/activate

Install the package:

# uv (recommended)
uv add autorag-research

# or pip
pip install autorag-research

Set up PostgreSQL with VectorChord (Docker recommended):

autorag-research init
cd postgresql && docker compose up -d

Initialize configuration files:

autorag-research init

This creates configs/ with database, pipeline, metric, and experiment YAML files. Now you can edit YAML files to setup your own experiments.

Quick Start

# 1. See available datasets
autorag-research show datasets

# 2-1. Ingest a dataset
autorag-research ingest --name beir --extra dataset-name=scifact

# 2-2. Or download a pre-ingested dataset including pre-computed embeddings
autorag-rsearch show datasets beir # type your ingestor name to see if pre-ingested versions are available
autorag-research data restore beir beir_arguana_test_qwen_3_0.6b # example command

# 3. Configure LLM — pick or create a config in configs/llm/
vim configs/llm/openai-gpt5-mini.yaml
# You should set your embedding models in embedding/ folder if needed

# 4. Edit experiment config — choose pipelines and metrics
vim configs/experiment.yaml

# 5. Check your DB connection
vim configs/db.yaml

# 6. Run your experiment
autorag-research run --db-name=beir_scifact_test

# 7. View results in a Gradio leaderboard UI (need to load your env variable for DB connection)
python -m autorag_research.reporting.ui

configs/experiment.yaml is where you define which pipelines and metrics to run:

db_name: beir_scifact_test

pipelines:
  retrieval: [bm25, vector_search]
  generation: [basic_rag]

metrics:
  retrieval: [recall, ndcg]
  generation: [rouge]

Generation pipelines (and some retrieval pipelines like HyDE) require an LLM. The llm field in each pipeline config references a file in configs/llm/ by name (without .yaml):

# configs/pipelines/generation/basic_rag.yaml
llm: openai-gpt5-mini   # → loads configs/llm/openai-gpt5-mini.yaml

Pre-configured LLM options include anthropic-claude-4.5-sonnet, openai-gpt5-mini, google-gemini-3-flash, ollama, vllm, and more. See all options in configs/llm/.

For the full YAML configuration guide, see the Documentation.

Commands

Command	Description
`autorag-research init`	Download default config files to `./configs/`
`autorag-research show datasets`	List available pre-built datasets to download
`autorag-research show ingestors`	List available data ingestors and their parameters
`autorag-research show pipelines`	List available pipeline configurations
`autorag-research show metrics`	List available evaluation metrics
`autorag-research show databases`	List ingested database schemas
`autorag-research ingest --name <name>`	Ingest a dataset into PostgreSQL
`autorag-research drop database --db-name <name>`	Drop a PostgreSQL database quickly
`autorag-research run --db-name <name>`	Run experiment with configured pipelines and metrics

You can also type --help in any command to see detailed usage instructions. Also, we provide a CLI Reference.

Build Your Own Plugin

AutoRAG-Research supports a plugin system so you can add your own retrieval pipelines, generation pipelines, or evaluation metrics — and use them alongside the built-in ones in the same experiment.

A plugin is a standalone Python package. You implement your logic, register it via Python's entry_points, and the framework discovers and loads it automatically. No need to fork the repo or modify the core codebase.

What you can build:

Plugin Type	What it does	Base Class
Retrieval Pipeline	Custom search/retrieval logic	`BaseRetrievalPipeline`
Generation Pipeline	Custom retrieve-then-generate logic	`BaseGenerationPipeline`
Retrieval Metric	Custom retrieval evaluation metric	`BaseRetrievalMetricConfig`
Generation Metric	Custom generation evaluation metric	`BaseGenerationMetricConfig`

How it works:

# 1. Scaffold — generates a ready-to-edit project with config, code, YAML, and tests
autorag-research plugin create my_search --type=retrieval

# 2. Implement — edit the generated pipeline.py (or metric.py)
cd my_search_plugin
vim src/my_search_plugin/pipeline.py

# 3. Install — register the plugin in your environment
pip install -e .

# 4. Sync — copy the plugin's YAML config into your project's configs/ directory
autorag-research plugin sync

# 5. Use — add it to experiment.yaml and run like any built-in pipeline
autorag-research run --db-name=my_dataset

After plugin sync, your plugin appears in configs/pipelines/ or configs/metrics/ and can be referenced in experiment.yaml just like any built-in component.

For the full implementation guide, see the Plugin Documentation.

Agent Skill: Query Results with Natural Language

AutoRAG-Research ships with an agent skill that lets AI coding agents (Claude Code, Codex, Kiro, Cursor, etc.) query your pipeline results directly from PostgreSQL using natural language.

# Install globally
npx skills add NomaDamas/AutoRAG-Research --skill autorag-query

You: "Which pipeline has the best BLEU score?"

Agent: "hybrid_search_v2 achieved the highest BLEU score of 0.85."

For detailed usage, script options, and query templates, see the Agent Skill documentation.

Contributing

We are open source project and always welcome contributions who love RAG! Feel free to open issues or submit pull requests on GitHub. You can check our Contribution Guide for more details.

Acknowledgements

This project is made by the creator of AutoRAG, Jeffrey & Bobb Kim. All works are done in NomaDamas, AI Hacker House in Seoul, Korea.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.agents/skills		.agents/skills
.claude		.claude
.codex		.codex
.github		.github
ai_instructions		ai_instructions
assets		assets
autorag_research		autorag_research
configs		configs
docs		docs
plugins/trust_align_metrics_plugin		plugins/trust_align_metrics_plugin
postgresql		postgresql
scripts		scripts
tests		tests
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
tox.ini		tox.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is AutoRAG-Research?

Available Datasets

Available Pipelines

1. Retrieval Pipeline

2. Generation Pipeline

Available Metrics

Setup

Install

Quick Start

Commands

Build Your Own Plugin

Agent Skill: Query Results with Natural Language

Contributing

Acknowledgements

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is AutoRAG-Research?

Available Datasets

Available Pipelines

1. Retrieval Pipeline

2. Generation Pipeline

Available Metrics

Setup

Install

Quick Start

Commands

Build Your Own Plugin

Agent Skill: Query Results with Natural Language

Contributing

Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages