HR-Toolkit

A comprehensive FastAPI-based application for HR document processing, intelligent question answering, web scraping, and automated email generation using RAG (Retrieval-Augmented Generation) technology.

Features

Document Processing: Upload and process various document types (PDF, TXT, Markdown)
RAG Question Answering: Intelligent question answering based on your document corpus
Web Scraping & Summarization: Scrape company websites and generate detailed summaries
HR Email Generation: Automated generation of professional HR emails for various scenarios
Vector Database Integration: Semantic search capabilities using Qdrant
Multiple LLM Providers: Support for OpenAI, Cohere, and HuggingFace models
Folder Upload: Batch upload of documents while preserving folder structure

Requirements

Python 3.10 or later
MongoDB (via Docker)
Qdrant Vector Database (via Docker)

Installation

Install Python using MiniConda

Download and install MiniConda from the official documentation
Create a new environment:

conda create -n hr-toolkit python=3.10

Activate the environment:

conda activate hr-toolkit

Install Dependencies

pip install -r requirements.txt

Configure Environment Variables

cp .env.example .env

Edit the .env file with your configuration:

Set your API keys (OpenAI/OpenRouter, Cohere, HuggingFace)
Configure MongoDB connection settings
Set LLM model preferences
Adjust file upload and processing parameters

Docker Services Setup

Configure Docker Environment

cd docker
cp .env.example .env

Update the Docker .env file with your credentials.

Start Services

cd docker
docker compose up -d

This will start:

MongoDB instance on port 27007
Qdrant vector database

Run the Application

Start the FastAPI server:

uvicorn main:app --reload --host 0.0.0.0 --port 5000

The API will be available at http://localhost:5000

API documentation will be available at http://localhost:5000/docs

API Endpoints

Base

GET /api/v1/ - Welcome endpoint with app information

Data Management

POST /api/v1/data/upload/{project_id} - Upload a single file
POST /api/v1/data/upload-folder/{project_id} - Upload multiple files
POST /api/v1/data/process/{project_id} - Process uploaded files into chunks

NLP & RAG

POST /api/v1/nlp/index/push/{project_id} - Index processed chunks into vector database
GET /api/v1/nlp/index/info/{project_id} - Get vector database collection information
POST /api/v1/nlp/index/search/{project_id} - Semantic search in document collection
POST /api/v1/nlp/index/answer/{project_id} - Get RAG-based answers to questions

Web Scraping

POST /api/v1/web-scraping/summarize - Scrape and summarize a website

HR Email Generation

POST /api/v1/hr-email/generate - Generate professional HR emails

Configuration

LLM Providers

The application supports multiple LLM providers:

OpenAI/OpenRouter: For generation (configured via OPENROUTER_API_KEY)
Cohere: Alternative generation provider
HuggingFace: For embeddings (sentence-transformers)

Vector Database

Qdrant is used for vector storage with configurable:

Distance method (cosine, euclidean, dot)
Embedding size
Collection management

File Processing

Supported formats:

Text files (.txt)
PDF documents (.pdf)
Markdown files (.md)

Configurable parameters:

Maximum file size
Chunk size for processing
Chunk overlap for context preservation

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
docker		docker
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HR-Toolkit

Features

Requirements

Installation

Install Python using MiniConda

Install Dependencies

Configure Environment Variables

Docker Services Setup

Configure Docker Environment

Start Services

Run the Application

API Endpoints

Base

Data Management

NLP & RAG

Web Scraping

HR Email Generation

Configuration

LLM Providers

Vector Database

File Processing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

Islam-Reda-13/HR-Toolkit

Folders and files

Latest commit

History

Repository files navigation

HR-Toolkit

Features

Requirements

Installation

Install Python using MiniConda

Install Dependencies

Configure Environment Variables

Docker Services Setup

Configure Docker Environment

Start Services

Run the Application

API Endpoints

Base

Data Management

NLP & RAG

Web Scraping

HR Email Generation

Configuration

LLM Providers

Vector Database

File Processing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages