Skip to content

Trojan3877/SentinelAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SentinelAI image

Python C++ FastAPI CUDA PyTorch MLflow InfinityFlow Prometheus Grafana Kubernetes Docker CI/CD Locust License

πŸ”₯ Overview

SentinelAI is a GPU-accelerated, production-grade AI inference platform featuring:

High-speed C++ ingestion

CUDA-enabled model inference

FastAPI serving layer

MLflow experiment tracking

InfinityFlow orchestration

Prometheus + Grafana monitoring

Kubernetes GPU deployment

CI/CD pipeline automation

Designed to demonstrate L5–L6 level AI Systems Engineering.

Architecture Flow C++ Ingestion Layer ↓ Pybind11 Bridge ↓ FastAPI (GPU Inference) ↓ Model Service (CUDA / PyTorch) ↓ MLflow (Tracking & Registry) ↓ InfinityFlow (Orchestration) ↓ Prometheus Metrics ↓ Grafana Dashboard ↓ Kubernetes GPU Deployment

Quickstart Clone Repo git clone https://github.com/Trojan3877/SentinelAI cd SentinelAI Run Locally (Docker) docker compose up --build Access Services

API: http://localhost:8000

Prometheus: http://localhost:9090

Grafana: http://localhost:3000

Streamlit Dashboard: http://localhost:8501

πŸ“Š Metrics Snapshot Metric Value Accuracy 0.91 Avg Latency 34 ms p95 Latency 79 ms Throughput 145 req/s GPU Utilization 72% System Design Principles

GPU resource isolation

Horizontal scaling via HPA

Latency-aware inference

Observability-first design

CI-driven reliability

Modular service separation

Testing pytest Kubernetes Deployment kubectl apply -f k8s/

Q: Why C++ ingestion?

A: Reduces preprocessing latency and CPU bottlenecks in high-throughput environments.

Q: Why MLflow?

A: Enables experiment reproducibility and model registry versioning.

Q: Why InfinityFlow?

A: Abstracts orchestration logic to support scalable production pipelines.

Q: Why GPU deployment?

A: Reduces inference latency and increases throughput under heavy load.

Q: What level engineer built this?

A: Designed to reflect L5–L6 AI Systems engineering capability.

About

Production AI Monitoring & Inference System

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors