An AI-powered research assistant that transforms vague research objectives into structured, citation-backed reports. Built for local execution on resource-constrained hardware (MacBook Pro M2, 8GB RAM).
Note: This project is not currently deployed as a hosted service. If you'd like to try it out, you're welcome to clone the repository and run it locally. The setup is straightforward and documented below. I'd appreciate any feedback or suggestions.
Traditional Retrieval-Augmented Generation (RAG) systems follow a simplistic pattern: retrieve documents, stuff them into context, and generate a response. This approach suffers from critical limitations in research-intensive tasks:
-
Single-Pass Retrieval - Standard RAG retrieves once and hopes for the best. Complex research topics require iterative exploration—uncovering a relevant paper often reveals new subtopics worth investigating.
-
No Critical Evaluation - RAG blindly synthesizes retrieved content without assessing source quality, identifying contradictions, or recognizing gaps in coverage.
-
Flat Knowledge Representation - Documents are treated as isolated chunks. The semantic relationships between papers—citations, methodological similarities, conflicting findings—are lost.
-
Context Limitations - Stuffing all retrieved documents into a single prompt leads to context overflow and poor synthesis quality.
-
Zero Observability - Most RAG pipelines are black boxes. When output quality degrades, debugging is nearly impossible.
This project reimagines research synthesis as a multi-agent workflow rather than a single retrieval-generation step. The system operates like a research team:
| Agent Node | Role | Advantage Over RAG |
|---|---|---|
| Planner | Decomposes topic into sub-queries | Systematic coverage vs. single-shot retrieval |
| Retriever | Multi-source search (arXiv, Semantic Scholar, Wikipedia) | Broader, more authoritative sources |
| Reader | Extracts key claims and methods per paper | Structured understanding vs. raw context stuffing |
| Synthesizer | Combines findings with proper attribution | Coherent narrative with citations |
| Critic | Identifies gaps, contradictions, weak coverage | Self-correction loop—retrieves more if needed |
| Finalizer | Produces structured report with all sections | Consistent, actionable output format |
The Critic node evaluates synthesis quality and can trigger additional retrieval rounds. This mimics how humans actually research: read papers, identify what's missing, search again. Traditional RAG cannot do this—it's strictly feed-forward.
| Capability | Description |
|---|---|
| Multi-Agent Supervisor | LangGraph workflow with 7 specialized nodes |
| Multi-Source Retrieval | arXiv, Semantic Scholar, Wikipedia integration |
| Knowledge Graph | NetworkX-based paper relationship mapping |
| Semantic Cache | Reduces redundant LLM calls by approximately 40% |
| Local Observability | Full execution tracing without external dependencies |
| Streamlit UI | Real-time research progress visualization |
| Export Formats | Markdown, PDF, BibTeX, JSON |
| Trend Detection | Identifies emerging research directions from metadata |
graph TB
subgraph "API Layer"
API[FastAPI<br/>REST + SSE]
end
subgraph "LangGraph Agent"
INTAKE[intake] --> PLANNER[planner]
PLANNER --> RETRIEVER[retriever]
RETRIEVER --> READER[reader]
READER --> SYNTHESIZER[synthesizer]
SYNTHESIZER --> CRITIC[critic]
CRITIC -->|gaps found| RETRIEVER
CRITIC --> FINALIZER[finalizer]
end
subgraph "Tools"
OLLAMA[Ollama<br/>llama3.2:3b]
ARXIV[arXiv API]
EMBED[sentence-transformers]
end
subgraph "Storage"
CHROMA[(ChromaDB)]
SQLITE[(SQLite)]
end
- Literature Review Acceleration: Reduce weeks of literature survey to hours with automated gap analysis and contradiction detection.
- Emerging Trend Identification: Surface research directions before they become mainstream, enabling early positioning.
- Competitive Intelligence: Rapidly understand state-of-the-art in any technical domain without manual paper hunting.
- Decision Support: Get structured, citation-backed answers to technical feasibility questions.
- Reproducible Research Workflows: Fully local, open-source stack eliminates vendor lock-in and ensures reproducibility.
- Resource-Efficient AI: Demonstrates that sophisticated AI workflows can run on consumer hardware (8GB RAM), democratizing access.
This project demonstrates that agentic architectures fundamentally outperform monolithic RAG for complex cognitive tasks. The patterns here—iterative refinement, self-critique, structured decomposition—apply far beyond research synthesis to any domain requiring systematic analysis.
-
Install Ollama
brew install ollama
-
Pull the model
ollama pull llama3.2:3b
-
Start Ollama server
ollama serve
# Clone and enter directory
cd agentic_research_copilot
# Install dependencies
pip install -r requirements.txt
# Copy environment file
cp .env.example .env
# Create data directories
make dirs
# Start the API server
make devmake uiVisit http://localhost:8501 for the Streamlit dashboard.
curl -X POST http://localhost:8000/v1/research \
-H "Content-Type: application/json" \
-d '{
"topic": "transformer attention mechanisms and their variants",
"depth": "normal",
"constraints": "Focus on papers from 2023-2024"
}'Response:
{
"run_id": "abc123-...",
"status": "pending",
"message": "Research started..."
}curl http://localhost:8000/v1/runs/{run_id}curl -N http://localhost:8000/v1/runs/{run_id}/stream# Markdown
curl http://localhost:8000/v1/runs/{run_id}/export?format=markdown
# BibTeX
curl http://localhost:8000/v1/runs/{run_id}/export?format=bibtex
# PDF
curl http://localhost:8000/v1/runs/{run_id}/export?format=pdf -o report.pdfcurl -X POST http://localhost:8000/v1/feedback \
-H "Content-Type: application/json" \
-d '{"run_id": "...", "rating": 5, "comment": "Comprehensive coverage."}'Every research run produces a structured report:
- TL;DR - Three-bullet executive summary
- Background - Context and foundational concepts
- Key Papers/Sources - 5-10 papers with links and summaries
- Disagreements/Contradictions - Conflicting findings across sources
- Gaps and Open Questions - Identified unknowns in the literature
- Research Trends - Emerging directions based on publication patterns
- Proposed Experiments - Actionable next steps
- References - Complete bibliography with links
# Run all tests
make test
# Run specific test file
pytest tests/test_tools_arxiv.py -v
# Run with coverage
pytest tests/ --cov=app --cov-report=html# Run evaluation suite (20 golden prompts)
make eval
# Check results
cat eval/results.json| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API URL |
OLLAMA_MODEL |
llama3.2:3b |
LLM model |
OLLAMA_NUM_CTX |
4096 |
Context window |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Embedding model |
CHROMA_DIR |
./chroma_data |
ChromaDB path |
SQLITE_PATH |
./app_data/app.db |
SQLite path |
| Component | Usage |
|---|---|
| Ollama + llama3.2:3b | ~2.5 GB |
| sentence-transformers | ~200 MB |
| FastAPI + ChromaDB | ~300 MB |
| Total | ~3 GB |
- Use
depth: "quick"- Limits to 5 sources - Set
OLLAMA_NUM_CTX=2048- Smaller context window - Close Streamlit UI - Saves approximately 150 MB
- Stop Ollama when idle -
ollama stop llama3.2:3b
app/
├── main.py # FastAPI application
├── api/ # API endpoints
├── core/ # Config, logging, SSE
├── db/ # SQLite models and repositories
├── agent/ # LangGraph workflow
├── tools/ # Ollama, arXiv, embeddings
├── intelligence/ # Knowledge graph, cache
├── traces/ # Local observability
└── export/ # Export formats
ui/
└── app.py # Streamlit dashboard
eval/
├── golden.json # 20 test prompts
└── run_eval.py # Evaluation runner
tests/
├── test_tools_arxiv.py
├── test_tools_vectordb.py
├── test_knowledge_graph.py
├── test_semantic_cache.py
└── test_api_research.py
- PDF Parsing - Skipped for memory efficiency; uses abstracts only
- Rate Limits - Semantic Scholar: 100 requests per 5 minutes
- Context Window - 4096 tokens limits long document processing
- CPU Inference - No GPU acceleration; slower but functional
Ollama not connecting:
ollama serve # Start the server
ollama list # Check available modelsOut of memory:
# Use smaller context
export OLLAMA_NUM_CTX=2048
# Use quick depth
curl -X POST http://localhost:8000/v1/research \
-d '{"topic": "...", "depth": "quick"}'ChromaDB errors:
# Reset the database
rm -rf chroma_data
make dirsBuilt by parth-6-5-4
For questions, issues, or contributions, please open an issue on GitHub.
This project is licensed under the MIT License. See the LICENSE file for details.
Built with LangGraph, Ollama, and FastAPI