A comprehensive FastAPI-based application for HR document processing, intelligent question answering, web scraping, and automated email generation using RAG (Retrieval-Augmented Generation) technology.
- Document Processing: Upload and process various document types (PDF, TXT, Markdown)
- RAG Question Answering: Intelligent question answering based on your document corpus
- Web Scraping & Summarization: Scrape company websites and generate detailed summaries
- HR Email Generation: Automated generation of professional HR emails for various scenarios
- Vector Database Integration: Semantic search capabilities using Qdrant
- Multiple LLM Providers: Support for OpenAI, Cohere, and HuggingFace models
- Folder Upload: Batch upload of documents while preserving folder structure
- Python 3.10 or later
- MongoDB (via Docker)
- Qdrant Vector Database (via Docker)
-
Download and install MiniConda from the official documentation
-
Create a new environment:
conda create -n hr-toolkit python=3.10- Activate the environment:
conda activate hr-toolkitpip install -r requirements.txtcp .env.example .envEdit the .env file with your configuration:
- Set your API keys (OpenAI/OpenRouter, Cohere, HuggingFace)
- Configure MongoDB connection settings
- Set LLM model preferences
- Adjust file upload and processing parameters
cd docker
cp .env.example .envUpdate the Docker .env file with your credentials.
cd docker
docker compose up -dThis will start:
- MongoDB instance on port 27007
- Qdrant vector database
Start the FastAPI server:
uvicorn main:app --reload --host 0.0.0.0 --port 5000The API will be available at http://localhost:5000
API documentation will be available at http://localhost:5000/docs
GET /api/v1/- Welcome endpoint with app information
POST /api/v1/data/upload/{project_id}- Upload a single filePOST /api/v1/data/upload-folder/{project_id}- Upload multiple filesPOST /api/v1/data/process/{project_id}- Process uploaded files into chunks
POST /api/v1/nlp/index/push/{project_id}- Index processed chunks into vector databaseGET /api/v1/nlp/index/info/{project_id}- Get vector database collection informationPOST /api/v1/nlp/index/search/{project_id}- Semantic search in document collectionPOST /api/v1/nlp/index/answer/{project_id}- Get RAG-based answers to questions
POST /api/v1/web-scraping/summarize- Scrape and summarize a website
POST /api/v1/hr-email/generate- Generate professional HR emails
The application supports multiple LLM providers:
- OpenAI/OpenRouter: For generation (configured via
OPENROUTER_API_KEY) - Cohere: Alternative generation provider
- HuggingFace: For embeddings (sentence-transformers)
Qdrant is used for vector storage with configurable:
- Distance method (cosine, euclidean, dot)
- Embedding size
- Collection management
Supported formats:
- Text files (.txt)
- PDF documents (.pdf)
- Markdown files (.md)
Configurable parameters:
- Maximum file size
- Chunk size for processing
- Chunk overlap for context preservation