Generative AI that never leaves your server.
A complete, production-ready, 100% open-source stack to deploy LLMs and RAG (Retrieval-Augmented Generation) inside your corporate infrastructure. Zero data sent to external servers. Zero cloud vendor dependencies. Full GDPR compliance.
⚠️ Status: Active Development — v0.2.1
This project is under active development. APIs and configurations may change between releases. See ROADMAP.md for the planned feature timeline.
Author: Francesco Collovà
🇮🇹 Versione italiana disponibile: README.it.md
The core documentation is bilingual. Technical sections (installation, API reference, configuration) are in English. The full Italian version of this README is available at README.it.md — see also GUIDA_OPERATIVA.md for the complete Italian operational guide.
- Why This Project
- EU AI Act Compliance
- Architecture
- Advanced RAG Pipeline
- Technology Stack
- Requirements
- Quick Start
- Accessing the Interface
- Makefile Commands Reference
- Document Management Console
- API Reference
- Configuration
- Roadmap
- License
Every prompt sent to a cloud AI service travels through external networks, gets logged, and may be used to train future models. For confidential contracts, product strategies, HR data, or proprietary code — this is unacceptable.
Private Corporate AI solves the problem at its root: the entire stack runs locally. A prompt originates in the user's browser, passes through Nginx, is processed by Docker containers, reaches the local LLM model — and the response travels the reverse path. At no point does a single byte leave the corporate perimeter.
Beyond privacy, this project was built to address the growing need for regulatory compliance, particularly with the new EU Regulation on Artificial Intelligence (EU AI Act), giving organizations a powerful, safe, and verifiable AI tool.
The on-premise architecture offers compliance benefits that cloud-based AI systems cannot guarantee with the same simplicity:
| Requirement | How Private Corporate AI Addresses It |
|---|---|
| Data Sovereignty | No corporate data ever leaves the organization's servers. Eliminates data transfer issues to cloud GPAI providers (GPT-4, Gemini, etc.) subject to Art. 53 obligations. |
| Human Oversight by Design | Every system response cites verifiable documentary sources. The system generates advisory outputs, not autonomous decisions (Art. 14). |
| Integrated Cybersecurity | SSL/TLS, isolated Docker networks, randomly generated credentials at each installation — basis for Art. 15 requirements. |
| Documentary Traceability | Every indexed document is identifiable with a unique ID, timestamp and metadata — basis for Art. 12 record-keeping. |
| Transparency | (Phase 1 Roadmap) AI disclosure disclaimer and AI literacy module for end users (Art. 4 & 50). |
The risk profile changes if the system is used for:
- Personnel decisions, employee selection or evaluation
- Credit or insurance assessments
- Public Administration contexts
In these scenarios, additional compliance measures are required. See the EU AI Act analysis document for a detailed assessment.
Separate Docker networks by design:
frontend_net— Nginx, Open WebUI, RAG Console, RAG Backendbackend_net— RAG Backend, Ollama, Qdrant
The two-network separation ensures that the LLM inference engine and vector database are never directly accessible from the browser layer, reducing the attack surface.
The backend has been re-architected for corporate stability and performance:
- Persistent Metadata Store: Uses SQLite/SQLAlchemy to track document lifecycle, ensuring state persistence across restarts.
- Content De-duplication: Automatic SHA-256 hashing prevents redundant indexing of the same files.
- Parallel Ingestion: Async batch processing with semaphores speeds up document indexing by up to 75%.
- Redis Embedding Cache: Integrated Redis to cache vector embeddings, reducing latency and LLM load for repeated queries.
- SSE Streaming: Real-time answer generation via Server-Sent Events for a modern, responsive chat experience.
Unlike traditional RAG systems, Private Corporate AI implements two state-of-the-art techniques to maximize response accuracy:
For each text fragment (chunk), the local LLM automatically generates a brief contextual prefix based on the entire document. This prevents loss of meaning when a chunk is retrieved in isolation (e.g., a table without the chapter heading it belongs to).
The system combines:
- Semantic vector search — finds conceptually related content
- BM25 text search — matches exact codes, acronyms, and specific terms
Results are merged using Reciprocal Rank Fusion (RRF), ensuring 30–40% superior recall on corporate technical documents compared to semantic search alone.
| Container | Image | Role | License |
|---|---|---|---|
corporate_ai_nginx |
nginx:1.27.4-alpine |
SSL/TLS reverse proxy, rate limiting, security headers | BSD |
corporate_ai_webui |
ghcr.io/open-webui/open-webui:v0.8.8 |
Web chat interface, conversation management | MIT |
corporate_ai_console |
node:20-alpine |
Document Management Console (React + Vite) | MIT |
corporate_ai_rag |
Custom build | FastAPI + LangChain, RAG pipeline, Advanced PDF Table Extraction (PyMuPDF4LLM), OpenAI-compatible API | Apache 2.0 |
corporate_ai_redis |
redis:7.4.2-alpine |
Embedding & Query Cache | MIT |
corporate_ai_ollama |
ollama/ollama:0.17.7 |
Local LLM runtime, CPU and NVIDIA GPU support | MIT |
corporate_ai_qdrant |
qdrant/qdrant:v1.17.0 |
Vector database, Hybrid Search (Dense + Sparse/BM25) with RRF | Apache 2.0 |
corporate_ai_ollama_init |
ollama/ollama |
One-shot init: downloads LLM and embedding model on first startup | MIT |
| Component | Minimum | Recommended |
|---|---|---|
| NVIDIA GPU | 8 GB VRAM | 16–24 GB VRAM (RTX 3090/4090) |
| RAM | 16 GB | 32–64 GB |
| Storage | 50 GB | 200–500 GB NVMe |
| OS | Linux / WSL2 | Ubuntu 22.04+ LTS |
| Response time | — | 2–15 seconds |
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores (x86_64 with AVX2) | 8–16 cores |
| RAM | 8 GB | 16–32 GB |
| Storage | 30 GB | 60–200 GB SSD |
| OS | Linux / WSL2 | Ubuntu 22.04+ LTS |
| Response time | — | 30–180 seconds |
AVX2 Note: Ollama uses AVX2 instructions to accelerate CPU inference. Verify support with:
grep avx2 /proc/cpuinfo | head -1— any modern CPU (post-2013) supports it.
If you are installing Private Corporate AI on Windows via WSL2, please read the WSL2 Setup Section in the Deployment Guide. It covers critical information regarding Docker Desktop integration, GPU setup, and filesystem performance.
Installation is fully automated via an interactive script that configures the entire environment (Docker, models, database, certificates) based on detected hardware.
git clone https://github.com/fcollova/Private-Corporate-AI.git
cd private-corporate-ai
chmod +x install.sh
sudo ./install.shNon-interactive flags are also supported:
./install.sh --gpuor./install.sh --cpu
The installer guides you through:
- Hardware Detection — Automatic analysis of CPU, RAM and NVIDIA GPU.
- Profile Selection — New in v0.2.1:
- SOLO Mode: Optimized for professional studios (1-3 users). 5 containers, HTTP on port 80, integrated static console.
- CORPORATE Mode: Optimized for organizations. 7 containers, HTTPS on port 443, Redis cache for high concurrency.
- Mode Selection — Choose between FULL (GPU) for maximum performance or LITE (CPU) for GPU-less servers.
- LLM Model Selection — Choose the optimal model (e.g. Gemma 2, Llama 3.1, DeepSeek-R1).
- Client Customization — Enter company name and choose a color theme for interface branding.
Installation typically takes 5–15 minutes, primarily for LLM model download (several GB).
# Monitor initial model download
make logs-init
# Monitor system resource usage during build
make monitor# Check health of all services
make health
# Send a test query to the RAG
make test-chatThen navigate to https://localhost. Accept the security warning (self-signed certificate) and verify the Open WebUI login screen appears.
The access URL depends on the selected profile:
| Profile | Service | URL | Notes |
|---|---|---|---|
| SOLO | Open WebUI | http://localhost |
Main chat (HTTP) |
| SOLO | Console | http://localhost/console/ |
Static Document Console |
| CORPORATE | Open WebUI | https://localhost |
Main chat (HTTPS/SSL) |
| CORPORATE | Console | https://localhost/console/ |
Containerized Console |
The entire stack is managed via a dynamic make command that automatically detects your profile and mode from the .env file.
| Command | Description |
|---|---|
make install |
Interactive installation (Select Solo/Corporate and GPU/CPU) |
make up |
Start the stack based on your .env configuration |
make restart |
Quick restart of all services |
make down |
Stop all services |
make build |
Rebuild the RAG Backend image |
make rebuild-rag |
Recreate and restart only the RAG Backend |
make rebuild-console |
Recompile frontend (Static for Solo, Container for Corp) |
| Command | Description |
|---|---|
make status |
Health status and uptime of all containers |
make logs |
Combined real-time logs for all services |
make monitor |
Resource dashboard: real-time CPU, RAM and Network |
make gpu-monitor |
VRAM and GPU temperature monitoring (NVIDIA) |
make health |
Verify connectivity (handles HTTP/HTTPS automatically) |
| Command | Description |
|---|---|
make list-models |
List currently installed models on Ollama |
make active-model |
Show which model is currently loaded in RAM/VRAM |
make pull-model MODEL=... |
Download a specific model (e.g. MODEL=llama3:8b) |
make remove-model MODEL=... |
Remove a model from disk |
make pull-models-lite |
Force download of CPU-optimized models |
| Command | Description |
|---|---|
make health |
Verify connectivity between RAG, Ollama, Qdrant and Redis |
make upload-doc FILE=... |
Upload and index a file (PDF, DOCX, TXT, MD, XLSX, PPTX) |
make list-docs |
List indexed documents in the SQL metadata database |
make test-chat |
Send a query to the RAG and receive response with sources |
make wipe-rag |
|
make init-collection |
Manually initialize the Qdrant collection |
| Command | Description |
|---|---|
make up-console |
Start specifically the Console container |
make rebuild-console |
Recompile the React (Vite) app from scratch |
make logs-console |
Console dev/production server logs |
make open-console |
Automatically open the console URL in the browser |
| Command | Description |
|---|---|
make client-info |
Display the currently active company profile |
make reconfigure-client |
Relaunch the wizard to change logos and domains |
make edit-system-prompt |
Open the editor to modify the AI "instructions" |
make export-client-config |
Create a .tar.gz package with all customization |
| Command | Description |
|---|---|
make backup |
Create a compressed backup of all Docker volumes (including SQL) and .env |
make uninstall |
Guided safe removal procedure for the entire stack |
make help |
Show the interactive command guide |
The React console (/console/) enables advanced management of the corporate knowledge base:
- Multiple Domains — Organize documents into separate Qdrant collections (e.g. "Legal", "HR", "Technical")
- Monitoring — View the number of extracted fragments (chunks) per document
- Maintenance — Forced re-indexing and document migration between domains
- Dynamic Branding — Interface automatically adapts to the company name and colors configured during installation
The RAG backend exposes advanced endpoints for domain management. Full interactive documentation available at https://localhost/rag-docs.
| Method | Path | Description |
|---|---|---|
GET |
/api/domains |
List all domains and vector statistics |
POST |
/api/domains |
Create a new information domain |
DELETE |
/api/domains/{name} |
Delete a domain and all its data |
POST |
/api/documents/upload |
Upload and index a document |
PUT |
/api/documents/{id}/domain |
Move a document between domains |
POST |
/api/documents/{id}/reindex |
Force re-indexing of a file |
POST |
/api/chat |
Native RAG query with cited sources |
The console now supports real-time monitoring of document processing:
- Status Tracking: Visual indicators for Processing, Indexed, and Error states.
- Automatic Polling: UI automatically refreshes while documents are being indexed.
- Dynamic Icons: Visual file type identification (.pdf, .docx, .xlsx, .pptx, .md).
The RAG Backend includes a unit test suite to ensure the integrity of the document processing pipeline.
# Run tests (requires pytest and pytest-asyncio)
pytest rag_backend/testsThe .env file is auto-generated by the installer. Key variables:
# LLM Model to use (pulled automatically on first start)
LLM_MODEL=gemma2:9b
# Embedding model for RAG vector indexing
EMBEDDING_MODEL=nomic-embed-text
# URL base for console API calls to the RAG backend
CONSOLE_RAG_API_BASE=/api/rag
# Company branding (set by install wizard)
CLIENT_NAME=Your Company NameA complete example is available in .env.example.
Customize the AI behavior for your specific domain:
make edit-system-promptThe system prompt is stored in rag_backend/system_prompt.txt and controls how the LLM responds to queries, including tone, language, citation format and domain-specific instructions.
This is an actively developed project. The roadmap is driven by EU AI Act compliance requirements (deadline: August 2026) and enterprise integration needs.
- XLSX and PPTX document loader support (via Microsoft MarkItDown)
- AI Transparency Disclaimer in Open WebUI (Art. 4 & 50 AI Act)
- AI Literacy onboarding module for end users
- Docker log retention policy (6-month persistence, Art. 12 AI Act)
- OCR pipeline via Tesseract (support for scanned PDFs and images)
- SharePoint / OneDrive sync connector (Microsoft Graph API)
- Google Workspace connector (Service Account)
- NAS / local file server auto-ingestion via Docker volume
- Human validation of retrieved chunks in Document Console (Art. 14 AI Act)
- Document versioning and in-place index update
- GDPR-compliant audit trail with PII anonymization
- Technical Documentation auto-generation (EU AI Act Annex IV)
- Multi-tenancy with granular domain permissions (HR / Legal / Tech isolation)
Full details in ROADMAP.md.
This project is distributed under the Apache 2.0 license. Included components retain their original licenses:
| Component | License |
|---|---|
| Ollama | MIT |
| Qdrant | Apache 2.0 |
| Open WebUI | MIT |
| Nginx | BSD |
| FastAPI | MIT |
| LangChain | MIT |
Francesco Collovà — Author & Maintainer
- Bug reports: Open an Issue
- Ideas & questions: GitHub Discussions
- Collaboration: LinkedIn
For inquiries reach out via LinkedIn with a brief description of your needs.
Built with ❤️ for organizations that take data privacy seriously.
Personal Project Disclaimer This project is developed and maintained independently by Francesco Collovà as a personal initiative, in personal time and using exclusively personal resources. It is not affiliated with, sponsored by, or endorsed by any current or former employer. The views, architectural choices, and technical decisions expressed in this project reflect solely the author's personal expertise and do not represent the position of any organization the author is or has been associated with. No proprietary information, confidential data, or intellectual property belonging to any employer has been used in the development of this project.
