A powerful web application for designing and implementing RAG (Retrieval-Augmented Generation) solutions with state-of-the-art techniques.
- Features
- Screenshots
- Getting Started
- Usage
- Project Structure
- Configuration
- Architecture
- Contributing
- Testing
- Evaluation & Benchmarks
- References
- License
- Contact & Acknowledgements
The application consists of two main components:
-
Frontend (UI): Built with Streamlit, this is the user-facing application where you configure your RAG pipeline.
- Data Section: Define your data sources (e.g., documents, PDFs, text files).
- Indexing & Storage: Set up your vector store (e.g., FAISS, Pinecone, ChromaDB) to efficiently store and retrieve embeddings.
- Retrieval & Reranking: Configure how to retrieve relevant documents and re-rank them for better quality.
- Generation & Prompting: Define your prompt templates and generation strategy.
- Model Management: Choose and configure your language models (e.g., GPT-4, Claude, LLaMA) for generation.
- Evaluation: Measure the performance of your RAG system using metrics like MRR, MAP, and ROUGE.
-
Backend (RAG Pipeline): The logic that runs in the background, handling data ingestion, embedding generation, retrieval, and generation.
- Data Ingestion: Load and process your documents (e.g., chunking, cleaning, embedding).
- Indexing: Store embeddings in your vector store.
- Retrieval: Use your vector store to find relevant documents.
- Reranking: Re-rank retrieved documents to improve the quality of the final answer.
- Generation: Use your language model to generate the final answer based on the retrieved and re-ranked documents.
The app provides a user-friendly interface to configure these components and monitor their performance.
- π Data Ingestion & Processing
- ποΈ Indexing & Storage
- π Retrieval & Reranking
- π Generation & Prompting
- π€ Model Management
- π Evaluation & Analytics
- Python 3.8+
- OS: Windows, MacOS, or Linux
- Streamlit for UI
pip install -r requirements.txtstreamlit run app.py- Data β Configure and upload your data sources (e.g., files, web, databases, APIs). # This is where you tell the app what information to work with.
- Chunking β Choose how to split your documents into smaller, manageable pieces for processing. # Chunking helps break big documents into parts that are easier for the model to handle.
- Embeddings β Select and generate vector representations (embeddings) for your document chunks. # Embeddings turn text into numbers so the computer can compare meanings.
- Vector Stores β Set up where and how your embeddings are stored for fast retrieval (e.g., Chroma, FAISS). # Vector stores are like databases for your embeddings, making search fast.
- Retrieval β Configure how the system finds relevant chunks for a userβs query. # Retrieval finds the most useful pieces of information for a question.
- Reranking β Improve the quality of results by reordering retrieved chunks using advanced models. # Reranking makes sure the best answers are at the top.
- Generation β Set up your language model and prompt strategy to generate answers from retrieved information. # This is where the app creates a final answer using the found information.
- Evaluation β Measure and analyze the performance of your RAG pipeline. # Evaluation helps you see how well your setup is working.
This section walks you through the typical flow a user will follow in RAG Studio, from start to finish:
-
Launch the App
- Run
streamlit run app.pyin your terminal to start the web interface. - The app opens in your browser, ready for configuration.
- Run
-
Add Your Data
- Go to the Data section.
- Upload or connect your data sources (documents, PDFs, text files, etc.).
- The app will list your uploaded or connected files.
-
Choose Chunking Strategy
- Select how to split your documents into smaller pieces (chunks).
- Options include fixed size, recursive, or by document type.
- Chunking helps the model process and retrieve information more efficiently.
-
Generate Embeddings
- Pick an embedding model (e.g., OpenAI, Google, or local models).
- Click to generate embeddings for your document chunks.
- Embeddings are stored as vectors for fast searching later.
-
Set Up Vector Store
- Choose where to store your embeddings (e.g., Chroma, FAISS).
- The app will handle saving and indexing your vectors for retrieval.
-
Configure Retrieval
- Decide how the app will search for relevant chunks when you ask a question.
- You can adjust retrieval settings for accuracy or speed.
-
Enable Reranking (Optional)
- Turn on reranking to improve the order of search results using advanced models.
- This step helps ensure the best information is used for answers.
-
Set Up Generation
- Choose your language model (e.g., GPT-4, Claude, LLaMA).
- Define prompt templates or use the defaults.
- The app will use the retrieved and reranked chunks to generate answers.
-
Ask Questions & Get Answers
- Enter your questions in the app interface.
- The app retrieves, reranks, and generates answers using your configured pipeline.
-
Evaluate Performance
- Go to the Evaluation section.
- Run evaluations to see how well your setup is working (metrics like MRR, MAP, ROUGE).
- Results are saved in
data/evaluation/results/for review.
-
Iterate & Improve
- Adjust chunking, embeddings, retrieval, or generation settings as needed.
- Re-run evaluations to track improvements.
This journey helps you build, test, and refine a complete RAG pipeline, all from a simple web interface.
rag_studio/
app.py # Main Streamlit app
backend.py # Backend logic for RAG pipeline
data/ # Data, embeddings, evaluation sets, vector stores
prompts/ # Prompt templates
requirements.txt # Python dependencies
README.md # This file
- Edit
app.pyandbackend.pyto set model parameters, data paths, and other options. - Place your data and embeddings in the
data/directory. - Store variables for API keys and secrets in .streamlit/secrets.toml file.
The application follows a modular design with separate sections for each major component of the RAG pipeline. The UI is built with Streamlit for a clean, responsive interface.
- Use the Evaluation section in the app to measure performance.
- Results are saved in
data/evaluation/results/.
This project is licensed under the MIT License.
- Author: 0xlong
- Thanks to the open-source community and contributors.