π₯ Overview
SentinelAI is a GPU-accelerated, production-grade AI inference platform featuring:
High-speed C++ ingestion
CUDA-enabled model inference
FastAPI serving layer
MLflow experiment tracking
InfinityFlow orchestration
Prometheus + Grafana monitoring
Kubernetes GPU deployment
CI/CD pipeline automation
Designed to demonstrate L5βL6 level AI Systems Engineering.
Architecture Flow C++ Ingestion Layer β Pybind11 Bridge β FastAPI (GPU Inference) β Model Service (CUDA / PyTorch) β MLflow (Tracking & Registry) β InfinityFlow (Orchestration) β Prometheus Metrics β Grafana Dashboard β Kubernetes GPU Deployment
Quickstart Clone Repo git clone https://github.com/Trojan3877/SentinelAI cd SentinelAI Run Locally (Docker) docker compose up --build Access Services
Prometheus: http://localhost:9090
Grafana: http://localhost:3000
Streamlit Dashboard: http://localhost:8501
π Metrics Snapshot Metric Value Accuracy 0.91 Avg Latency 34 ms p95 Latency 79 ms Throughput 145 req/s GPU Utilization 72% System Design Principles
GPU resource isolation
Horizontal scaling via HPA
Latency-aware inference
Observability-first design
CI-driven reliability
Modular service separation
Testing pytest Kubernetes Deployment kubectl apply -f k8s/
Q: Why C++ ingestion?
A: Reduces preprocessing latency and CPU bottlenecks in high-throughput environments.
Q: Why MLflow?
A: Enables experiment reproducibility and model registry versioning.
Q: Why InfinityFlow?
A: Abstracts orchestration logic to support scalable production pipelines.
Q: Why GPU deployment?
A: Reduces inference latency and increases throughput under heavy load.
Q: What level engineer built this?
A: Designed to reflect L5βL6 AI Systems engineering capability.
