Maya combines a local LLM, real-time facial emotion detection, text sentiment analysis, guided mental exercises, and long-term conversational memory (RAG) to provide empathetic, context-aware support — all without any data ever leaving the device.
"Technology should care for people, not exploit them." — Maya was built on this principle.
- Overview
- Key Features
- System Architecture
- Project Structure
- Hardware Requirements
- LLM Model Research & Benchmarking
- Software Stack & Technology Choices
- Complete Processing Pipeline
- Guided Mental Exercises
- Web Interface & UI Design
- Installation — Raspberry Pi 5
- Installation — Windows (Development)
- Running the Application
- Configuration Reference
- Module Deep Dive
- API Reference
- Utility Scripts
- Troubleshooting
- Privacy & Security
- Future Scope
- License
Maya is an AI-powered wellbeing companion that listens, understands, and responds with empathy. It runs 100% offline on a Raspberry Pi 5, ensuring complete privacy. The system uses:
- A local LLM (Microsoft Phi-3 Mini via Ollama) for natural conversation
- VADER sentiment analysis to understand the emotional tone of text
- FER (Facial Expression Recognition) with a webcam for real-time facial emotion detection
- ChromaDB vector database for long-term conversational memory (RAG)
- An Emotion Engine that fuses text sentiment, facial emotion, and historical patterns into a unified mental state model
- A Guided Exercise System that detects stress and offers evidence-based 30-second exercises
The companion is named Maya and provides brief, warm, supportive responses tailored to the user's current emotional state.
| Problem | Maya's Solution |
|---|---|
| Mental health apps send data to the cloud | 100% offline — nothing leaves the device |
| AI assistants require internet | Runs on local LLM via Ollama |
| Text-only chatbots miss visual cues | Facial emotion detection via webcam |
| Chatbots forget past conversations | Long-term memory via ChromaDB (RAG) |
| Generic responses lack empathy | Emotion-aware prompting fuses sentiment + face + history |
| Expensive hardware requirements | Runs on a $80 Raspberry Pi 5 |
| Feature | Description |
|---|---|
| Fully Offline | No internet required after initial setup. All inference happens locally on-device. |
| Privacy-First | Zero data leaves the Raspberry Pi. No cloud APIs, no telemetry. |
| Multimodal Emotion Understanding | Combines text sentiment + facial expression + conversation history. |
| Guided Mental Exercises | When stress is detected, Maya offers quick 30-second exercises (breathing, grounding, gratitude, mindfulness). |
| Long-Term Memory (RAG) | Remembers past conversations using ChromaDB vector similarity search for context-aware responses. |
| Dual Interface | Terminal CLI for direct interaction, or a beautiful Flask web UI accessible from any device on LAN. |
| Real-Time Camera Feed | Web interface shows live camera feed with emotion overlay and bounding boxes. |
| Streaming Responses | LLM responses stream token-by-token via Server-Sent Events (SSE). |
| Modular Architecture | Clean separation of concerns with abstract base classes for camera and display. |
| RPi 5 Optimized | Tuned context windows, thread counts, token limits for ARM64 CPU inference. |
┌─────────────────────────────────────────────────────────────────┐
│ User Interfaces │
│ ┌──────────────────────┐ ┌───────────────────────────────┐ │
│ │ Terminal CLI │ │ Flask Web App (port 5000) │ │
│ │ (main.py) │ │ (web_app.py + index.html) │ │
│ └──────────┬───────────┘ └──────────────┬────────────────┘ │
└─────────────┼───────────────────────────────┼───────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ AgentBrain (brain.py) │
│ Central orchestrator — coordinates all modules │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────┐ ┌────────────┐ │
│ │ LLMClient │ │ Sentiment │ │ Emotion │ │ Conversa- │ │
│ │ (llm.py) │ │ Analyzer │ │ Engine │ │ tion │ │
│ │ │ │ (sentiment. │ │ (emotion. │ │ Memory │ │
│ │ Ollama API │ │ py) │ │ py) │ │ (memory.py)│ │
│ │ phi3:mini │ │ VADER │ │ Fusion │ │ ChromaDB │ │
│ └─────────────┘ └──────────────┘ └───────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────────┐
│ Ollama Server │ │ Camera Module │
│ (localhost:11434) │ │ (camera.py) │
│ LLM Inference │ │ OpenCV + FER │
└──────────────────────┘ └──────────────────────────┘
wellbeing_ai/
│
├── main.py # Terminal CLI entry point
├── web_app.py # Flask web application (REST API + SSE streaming)
├── benchmark_models.py # LLM model benchmark script (tests all Ollama models)
├── requirements.txt # Python dependencies with version pins
├── setup_rpi.sh # Automated setup script for Raspberry Pi (Linux/Bash)
├── setup_rpi.bat # Automated setup script for Windows development
├── patch_fer.py # Patches FER library to fix moviepy import on RPi
├── reset_memory.py # Utility: clear all stored conversations
├── view_memory.py # Utility: view stored conversations
├── test_camera.py # Camera & FER diagnostic test script
├── benchmark_results.json # Full benchmark data (auto-generated)
│
├── agent/ # Core AI agent modules
│ ├── __init__.py
│ ├── brain.py # AgentBrain — central orchestrator (7-step pipeline)
│ ├── llm.py # LLMClient — Ollama REST API integration
│ ├── sentiment.py # SentimentAnalyzer — VADER-based text sentiment
│ ├── emotion.py # EmotionEngine — multimodal emotion fusion
│ ├── memory.py # ConversationMemory — ChromaDB RAG store
│ └── exercises.py # ExerciseManager — 7 guided mental exercises
│
├── config/ # Configuration
│ ├── __init__.py
│ └── config.py # All tuneable parameters (LLM, camera, paths, etc.)
│
├── interface/ # Hardware abstraction layers
│ ├── __init__.py
│ ├── camera.py # BaseCamera / WebcamCamera — webcam + FER
│ └── display.py # BaseDisplay / TerminalDisplay — output rendering
│
├── templates/ # Flask HTML templates
│ └── index.html # Web chat interface (glassmorphism UI, ~1100 lines)
│
├── data/ # Runtime data (auto-created, gitignored)
│ └── memory/ # ChromaDB persistent vector storage
│
└── .gitignore # Git ignore rules
| Component | Specification |
|---|---|
| Board | Raspberry Pi 5 (4GB or 8GB RAM recommended) |
| Storage | 32GB+ microSD card (Class 10 / UHS-I minimum) |
| Camera | USB Webcam or Raspberry Pi Camera Module v2/v3 |
| Power | Official RPi 5 USB-C power supply (5V/5A) |
| Network | Required only for initial setup (downloading models & packages) |
| Display | Optional — web interface accessible from any device on LAN |
- Windows, macOS, or Linux with Python 3.10+
- Webcam (for testing emotion detection)
- 8GB+ RAM recommended
Selecting the right LLM is critical for a wellbeing companion running on resource-constrained hardware. The model must:
- Fit in memory — RPi 5 has 4–8GB RAM shared between OS, app, and model
- Respond quickly — Users in emotional distress need timely responses (<30s)
- Show empathy — Generic/robotic responses are harmful in a wellbeing context
- Follow instructions — Must stay in character as Maya, keep responses brief, not hallucinate
- Run offline — Must be available via Ollama for local inference
We benchmarked 10 locally available Ollama models spanning a wide range of sizes and architectures:
| # | Model | Architecture | Parameters | Quantized Size | Source |
|---|---|---|---|---|---|
| 1 | phi3:mini |
Phi-3 Mini | 3.8B | 2.0 GB | Microsoft |
| 2 | llama3.1:latest |
LLaMA 3.1 | 8B | 4.6 GB | Meta |
| 3 | qwen2.5:latest |
Qwen 2.5 | 7B | 4.4 GB | Alibaba |
| 4 | mistral:latest |
Mistral | 7B | 4.1 GB | Mistral AI |
| 5 | gemma:2b |
Gemma | 2B | 1.6 GB | |
| 6 | survival-gemma3:latest |
Gemma 3 (finetuned) | 2B | 1.6 GB | Custom |
| 7 | survival-gemma2:latest |
Gemma 2 (finetuned) | 2B | 1.6 GB | Custom |
| 8 | survival-gemma:latest |
Gemma (finetuned) | 2B | 1.6 GB | Custom |
| 9 | tinyllama:latest |
TinyLlama | 1.1B | 0.6 GB | TinyLlama Team |
| 10 | my-survival:latest |
TinyLlama (finetuned) | 1.1B | 0.6 GB | Custom |
Test Environment: Settings identical to production config — temperature=0.3, max_tokens=60, num_ctx=1024, num_thread=4. Each model was tested against 5 diverse wellbeing conversation prompts (50 total inferences).
Test Prompts:
| # | Category | User Message |
|---|---|---|
| 1 | General Greeting | "Hey, I just wanted someone to talk to." |
| 2 | Negative Emotion | "I've been feeling really down lately and nothing seems to help." |
| 3 | Anxiety | "I have a big exam tomorrow and I can't stop worrying about it." |
| 4 | Positive Emotion | "I got promoted at work today! I'm so excited!" |
| 5 | Context Recall | "It happened again last night. I barely slept 3 hours." (with memory context) |
Quality Scoring (0–10 weighted):
| Criterion | Weight | Description |
|---|---|---|
| Empathy | 30% | Empathetic language ("I understand", "sounds like", "here for you") |
| Brevity | 20% | 1–3 sentence responses score highest (optimized for RPi latency) |
| Naturalness | 20% | Absence of robotic phrases ("as an AI", "language model") |
| Length Fit | 15% | 20–200 character responses ideal for quick supportive replies |
| No Hallucination | 15% | Doesn't invent user's name or identity |
| Model | Size | Avg Time | TTFT | Tok/s | Avg Tokens | Quality | Pass |
|---|---|---|---|---|---|---|---|
| qwen2.5:latest | 4.4 GB | 5.68s | 3.93s | 8.48 | 42.6 | 9.13/10 | 5/5 |
| llama3.1:latest | 4.6 GB | 8.54s | 6.96s | 6.29 | 32.0 | 8.65/10 | 5/5 |
| survival-gemma2 | 1.6 GB | 3.30s | 2.69s | 13.16 | 44.0 | 8.59/10 | 5/5 |
| phi3:mini | 2.0 GB | 3.68s | 3.03s | 11.18 | 36.6 | 8.56/10 | 5/5 |
| tinyllama | 0.6 GB | 2.66s | 2.27s | 17.88 | 47.6 | 8.41/10 | 5/5 |
| survival-gemma3 | 1.6 GB | 7.96s | 7.31s | 12.18 | 47.4 | 8.38/10 | 5/5 |
| mistral:latest | 4.1 GB | 4.53s | 3.25s | 10.18 | 41.2 | 8.35/10 | 5/5 |
| gemma:2b | 1.6 GB | 3.36s | 2.73s | 14.00 | 48.0 | 8.11/10 | 5/5 |
| survival-gemma | 1.6 GB | 3.36s | 2.67s | 14.57 | 49.4 | 7.96/10 | 5/5 |
| my-survival | 0.6 GB | 2.74s | 2.59s | 7.66 | 20.0 | 7.45/10 | 5/5 |
Legend: TTFT = Time To First Token | Tok/s = Tokens Per Second
qwen2.5 ██████████████████░░ 9.13 ✗ Too large (4.4GB)
llama3.1 █████████████████░░░ 8.65 ✗ Too large (4.6GB)
survival-gemma2 ████████████████░░░░ 8.59 ✓ GOOD (1.6GB)
phi3:mini ████████████████░░░░ 8.56 ✓ SELECTED (2.0GB)
tinyllama ████████████████░░░░ 8.41 ✓ Fast but less empathetic
survival-gemma3 ████████████████░░░░ 8.38 ~ Slow first token
mistral ████████████████░░░░ 8.35 ✗ Too large (4.1GB)
gemma:2b ███████████████░░░░░ 8.11 ✓ Acceptable fallback
survival-gemma ███████████████░░░░░ 7.96 ✓ Acceptable
my-survival ██████████████░░░░░░ 7.45 ✗ Poor empathy
tinyllama ██░░░░░░░░░░░░░░░░░░ 2.66s ← Fastest
my-survival ██░░░░░░░░░░░░░░░░░░ 2.74s
survival-gemma2 ███░░░░░░░░░░░░░░░░░ 3.30s
phi3:mini ███░░░░░░░░░░░░░░░░░ 3.68s ← SELECTED
mistral ████░░░░░░░░░░░░░░░░ 4.53s
qwen2.5 █████░░░░░░░░░░░░░░░ 5.68s
survival-gemma3 ████████░░░░░░░░░░░░ 7.96s
llama3.1 █████████░░░░░░░░░░░ 8.54s ← Slowest
tinyllama ██████████████████░░ 17.88 ← Fastest generation
survival-gemma █████████████████░░░ 14.57
gemma:2b ██████████████░░░░░░ 14.00
survival-gemma2 █████████████░░░░░░░ 13.16
phi3:mini ███████████░░░░░░░░░ 11.18 ← SELECTED
mistral ██████████░░░░░░░░░░ 10.18
qwen2.5 ████████░░░░░░░░░░░░ 8.48
llama3.1 ██████░░░░░░░░░░░░░░ 6.29 ← Slowest
| Model | General | Negative Emotion | Anxiety | Positive | Context Recall |
|---|---|---|---|---|---|
| phi3:mini | 7.75 | 9.55 | 10.00 | 7.75 | 7.75 |
| qwen2.5 | 9.25 | 9.55 | 9.40 | 10.00 | 7.45 |
| llama3.1 | 8.50 | 8.50 | 8.50 | 10.00 | 7.75 |
| mistral | 7.90 | 9.55 | 8.65 | 7.90 | 7.75 |
| survival-gemma2 | 7.90 | 8.95 | 8.65 | 8.50 | 8.95 |
| gemma:2b | 7.15 | 8.35 | 8.95 | 7.75 | 8.35 |
| tinyllama | 8.50 | 8.80 | 8.50 | 7.45 | 8.80 |
phi3:mini scored a perfect 10.00/10 on the Anxiety prompt — the most critical category for a wellbeing companion.
phi3:mini (9.55/10): "I'm truly sorry to hear that you're going through a tough time. Remember, it's okay to feel sad sometimes, but I believe in your strength too. If there's anything specific on your mind or if you need someone to talk to, I'm here for you."
qwen2.5 (9.55/10): "I'm sorry to hear you're feeling this way. It's okay to feel sad sometimes, and it's important to give yourself lots of kindness during these times. Would you like to talk about what's been on your mind?"
llama3.1 (8.50/10): "Sweetheart, I'm so sorry you're struggling. Would you like me to guide you through a gentle breathing exercise or simply listen while you share what's on your mind?"
my-survival (7.75/10): "Can you please suggest some ways I can make myself feel better?"
⚠️ This model responds as the user instead of the assistant — poor instruction following.
phi3:mini (10.00/10 — PERFECT): "It sounds like you're feeling anxious, which is completely normal before an important test. Remember to take breaks and get some rest tonight—you got this!"
qwen2.5 (9.40/10): "I understand how stressful it can be before an important exam. Remember, you've studied hard, and you've got this. Take some deep breaths and try to get a good night's rest tonight. You'll do great!"
| Model | Size OK? | Speed OK? | Quality OK? | Verdict |
|---|---|---|---|---|
| phi3:mini | ✓ (2.0 GB) | ✓ (3.68s) | ✓ (8.56) | ✅ RECOMMENDED |
| survival-gemma2 | ✓ (1.6 GB) | ✓ (3.30s) | ✓ (8.59) | ✅ Good Alternative |
| gemma:2b | ✓ (1.6 GB) | ✓ (3.36s) | ~ (8.11) | |
| tinyllama | ✓ (0.6 GB) | ✓ (2.66s) | ~ (8.41) | |
| qwen2.5 | ✗ (4.4 GB) | ✓ (5.68s) | ✓ (9.13) | ❌ Too large for 4GB RPi |
| llama3.1 | ✗ (4.6 GB) | ~ (8.54s) | ✓ (8.65) | ❌ Too large, too slow |
| mistral | ✗ (4.1 GB) | ✓ (4.53s) | ✓ (8.35) | ❌ Too large for 4GB RPi |
| my-survival | ✓ (0.6 GB) | ✓ (2.74s) | ✗ (7.45) | ❌ Poor instruction following |
After benchmarking all 10 models, Microsoft Phi-3 Mini (phi3:mini) was selected as the default:
| Criterion | Phi-3 Mini | Best Alternative (qwen2.5) |
|---|---|---|
| Model Size | 2.0 GB ✓ | 4.4 GB ✗ (won't fit 4GB RPi) |
| Quality Score | 8.56/10 | 9.13/10 |
| Response Time | 3.68s | 5.68s |
| Empathy (Anxiety) | 10.00/10 ✓ | 9.40/10 |
| RPi 5 Compatible | ✓ | ✗ |
Key Insights:
- Best quality-to-size ratio — Achieves quality comparable to 7B models at nearly half the size
- Perfect score on anxiety prompts — 10/10 on the most critical wellbeing category
- Fits comfortably in 4GB RAM — Leaves room for OS, Python, ChromaDB, TensorFlow, and FER
- Excellent instruction following — Stays in character as Maya, keeps responses brief
- Natural empathy — Uses phrases like "I'm truly sorry", "it's okay to feel", "I believe in your strength"
Note: Users with 8GB RPi 5 may try
qwen2.5:latest(LLM_MODEL=qwen2.5in config) for higher quality at the cost of longer inference.
source venv/bin/activate # Linux/RPi
# OR
venv\Scripts\activate # Windows
python benchmark_models.pyThe script discovers all local Ollama models, runs 5 prompts against each, measures latency/quality, prints a comparison table, and saves results to benchmark_results.json.
| Property | Value |
|---|---|
| Model | Microsoft Phi-3 Mini (phi3:mini) |
| Runtime | Ollama (local inference server) |
| Parameters | ~3.8B parameters |
| Quantization | Q4_K_M (default Ollama quantization) |
| Context Window | 1024 tokens (tuned for RPi CPU performance) |
| Max Output Tokens | 60 (brief, focused responses) |
| Temperature | 0.3 (low creativity, high consistency) |
| CPU Threads | 4 (matches RPi 5's quad-core Cortex-A76) |
| Timeout | 300 seconds (5 min for slow CPU inference) |
| API | Ollama REST API at http://localhost:11434 |
| Stop Sequences | \n\n, User:, Assistant: |
Why Phi-3 Mini? — It is one of the smallest high-quality LLMs that can run on RPi 5 hardware with acceptable latency. It handles empathetic conversation well within tight token budgets.
| Property | Value |
|---|---|
| Library | VADER (Valence Aware Dictionary and sEntiment Reasoner) |
| Package | vaderSentiment>=3.3.2 |
| Type | Rule-based, lexicon-driven |
| Output | Compound score (-1.0 to +1.0), pos/neg/neu breakdown |
| Thresholds | Positive: ≥ 0.05, Negative: ≤ -0.05 |
| Why VADER? | Zero-latency, no GPU needed, specifically tuned for social/conversational text |
| Property | Value |
|---|---|
| Library | FER (Facial Expression Recognition) v22.5.1 |
| Backend | TensorFlow (Keras CNN) |
| Face Detector | OpenCV Haar Cascade (mtcnn=False for speed on RPi) |
| Detectable Emotions | happy, sad, angry, fear, surprise, neutral, disgust |
| Confidence Threshold | 0.30 (detections below this are discarded) |
| Sampling Interval | Every 3 conversation turns (CLI) or every 2.5s (web UI polling) |
| Why not MTCNN? | MTCNN is more accurate but significantly slower on CPU. Haar Cascade provides adequate speed on RPi 5. |
| Property | Value |
|---|---|
| Database | ChromaDB (persistent mode) |
| Package | chromadb>=0.4.22 |
| Embedding | ChromaDB's default all-MiniLM-L6-v2 Sentence Transformer |
| Distance Metric | Cosine similarity (hnsw:space: cosine) |
| Retrieval Top-K | 2 (reduced for CPU performance) |
| Storage Location | data/memory/ (auto-created) |
| Collection Name | conversations |
| Stored Metadata | user_message, assistant_response, sentiment_label, sentiment_score, emotion, timestamp |
| Property | Value |
|---|---|
| Framework | Flask 3.0+ |
| CORS | flask-cors 4.0+ |
| Streaming | Server-Sent Events (SSE) via /api/chat_stream |
| Host | 0.0.0.0:5000 (accessible on LAN) |
| Template | Single-page glassmorphism UI (templates/index.html) |
| Font | Google Quicksand (loaded via CDN on first access) |
| Property | Value |
|---|---|
| Library | OpenCV (headless) opencv-python-headless>=4.8.0 |
| Usage | Camera capture, color conversion (BGR→RGB), bounding box drawing, JPEG encoding |
| Package | Version | Purpose |
|---|---|---|
requests |
≥2.31.0 | HTTP client for Ollama REST API |
numpy |
≥1.24.0, <2.0.0 | Array operations for OpenCV/TF (pinned <2.0 for compatibility) |
tensorflow |
≥2.15.0, <2.18.0 | Backend for FER emotion detection CNN |
When a user sends a message, the AgentBrain.process() method orchestrates this pipeline:
User Input (text)
│
▼
┌─────────────────────────────────┐
│ 1. SENTIMENT ANALYSIS │
│ SentimentAnalyzer.analyze() │
│ VADER scores the text → │
│ label: positive/negative/ │
│ neutral │
│ compound: -1.0 to +1.0 │
│ intensity: 0.0 to 1.0 │
└────────────┬────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 2. MEMORY RETRIEVAL (RAG) │
│ ConversationMemory.retrieve()│
│ Query ChromaDB with user │
│ input → retrieve top-2 most │
│ semantically similar past │
│ conversations │
└────────────┬────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 3. EMOTION ENGINE UPDATE │
│ EmotionEngine.update() │
│ Fuses: │
│ • Text sentiment (VADER) │
│ • Facial emotion (FER/cam) │
│ • Historical sentiment avg │
│ • Memory sentiment patterns │
│ Outputs: MentalState object │
│ (dominant_emotion, trend, │
│ historical_avg) │
└────────────┬────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 4. LLM RESPONSE GENERATION │
│ LLMClient.chat() │
│ Builds system prompt with: │
│ • Base persona (Maya) │
│ • Current user mood │
│ • Emotional trend guidance │
│ • Retrieved memory context │
│ Sends last 4 messages + │
│ system prompt to Ollama │
│ Streams response tokens │
└────────────┬────────────────────┘
│
▼
┌─────────────────────────────────┐
│ 5. MEMORY STORAGE │
│ ConversationMemory.store() │
│ Stores in ChromaDB: │
│ • user_message │
│ • assistant_response │
│ • sentiment_label & score │
│ • dominant emotion │
│ • timestamp │
│ Embedded for future RAG │
└─────────────────────────────────┘
The Emotion Engine maintains a sliding window (10 turns) of sentiment scores and emotion labels to compute:
- Dominant Emotion Resolution — If a facial emotion is detected (not
unknown/None), it takes priority. Otherwise, text sentiment is mapped: positive→happy, negative→sad, neutral→neutral. - Historical Sentiment Average — Running mean of compound scores over the window.
- Emotional Trend — Compares the average sentiment of the first half vs second half of the window. Difference >0.15 = "improving", <-0.15 = "declining", else "stable".
- Memory Pattern Adjustment — If >60% of retrieved memories have negative sentiment, the historical average is shifted down by 0.1 to increase concern.
Browser polls /api/camera/snapshot every 2.5s
│
▼
WebcamCamera.capture_snapshot_with_overlay()
│
├── cv2.VideoCapture.read() → raw BGR frame
├── FER.detect_emotions(RGB frame) → bounding boxes + emotion scores
├── Draw green bounding box + emotion label on frame
├── cv2.imencode('.jpg') → JPEG bytes
│
▼
Response: { image: base64 JPEG, emotion: "happy" }
│
▼
Browser updates camera feed image + emotion emoji/label
Maya actively monitors the user's emotional state and offers guided mental exercises when stress is detected.
| Principle | Implementation |
|---|---|
| Non-intrusive | Exercises offered only when multiple stress indicators align |
| Opt-in | User can accept ("yes", "let's do it") or decline ("skip", "not now") |
| Quick | All exercises complete in under 30 seconds |
| Skippable | User can exit mid-exercise at any time |
| Cooldown | Only offered once every 5 turns (configurable) |
| # | Exercise | Category | Duration | Description |
|---|---|---|---|---|
| 1 | Box Breathing | 🌬️ Breathing | 24s | 4-4-4-4 pattern (inhale, hold, exhale, hold) |
| 2 | Calming Breath | 🍃 Breathing | 26s | Inhale 4s, exhale 6s — twice |
| 3 | 5-4-3-2-1 Grounding | 🌍 Grounding | 30s | Name 5 things you see, 4 feel, 3 hear, 2 smell, 1 taste |
| 4 | Quick Gratitude | 🙏 Gratitude | 20s | Reflect on one thing you're grateful for |
| 5 | Body Scan | 🧘 Mindfulness | 25s | Release tension in shoulders, jaw, chest, toes |
| 6 | Present Moment | 🧘 Mindfulness | 20s | Three deep breaths with awareness |
| 7 | Tension Release | 💪 Mindfulness | 25s | Progressive muscle relaxation (squeeze and release) |
Exercises are offered when any of these conditions are met:
┌─────────────────────────────────────────────────────────────┐
│ STRESS DETECTION ENGINE │
│ │
│ Condition 1: hist_avg < -0.3 → Persistent low mood │
│ Condition 2: trend="declining" → Getting worse │
│ AND hist_avg < -0.15 │
│ Condition 3: sentiment < -0.5 → Very negative now │
│ Condition 4: emotion ∈ {sad, → Stress emotion │
│ angry, fear, disgust} │
│ AND sentiment < -0.2 │
│ │
│ ANY condition true → needs_exercise = True │
│ Cooldown: min 5 turns between offers │
└─────────────────────────────────────────────────────────────┘
Stress Detected → Exercise Offer (SSE event)
│
▼
┌──────────────────────────┐
│ Exercise Selection Card │
│ ┌────┐ ┌────┐ ┌────┐ │
│ │🌬️ │ │🍃 │ │🌍 │ │ ← User picks one
│ │Box │ │Calm│ │5421│ │
│ └────┘ └────┘ └────┘ │
│ [Skip] │
└──────────┬───────────────┘
│
▼
┌──────────────────────────┐
│ Step-by-Step Guide │
│ "Breathe IN slowly..." │
│ ┌──────────────────┐ │
│ │ ⏱️ 4 seconds │ │ ← Timer countdown
│ │ ████████░░░░ │ │
│ └──────────────────┘ │
│ [Next Step] [Skip] │
└──────────────────────────┘
The web interface is a single-page application built with a glassmorphism design language.
| Component | Description |
|---|---|
| Chat Panel | Message bubbles with avatars, typing indicator, auto-scroll |
| Emotion Sidebar | Live camera feed, emotion emoji, status indicators |
| Header | System status dots (LLM ●, Memory ●, Camera ●), reset button |
| Exercise Cards | Gradient cards with icons, timers, step-by-step guides |
| Property | Value |
|---|---|
| UI Style | Glassmorphism (frosted glass panels, gradient background) |
| Color Palette | Soft purple-blue gradient (#e0c3fc → #8ec5fc) |
| Font | Google Quicksand (warm, approachable) |
| Layout | Responsive — side-by-side on desktop, stacked on mobile |
| Streaming | SSE via ReadableStream for token-by-token display |
| Camera Polling | /api/camera/snapshot every 2.5 seconds |
| Accessibility | High contrast text, large touch targets |
| Emotion | Emoji | Color Accent |
|---|---|---|
| Happy | 😊 | Green |
| Sad | 😢 | Blue |
| Angry | 😠 | Red |
| Fear | 😨 | Purple |
| Surprise | 😲 | Yellow |
| Neutral | 😐 | Gray |
| Disgust | 🤢 | Green |
- Raspberry Pi OS (64-bit / Bookworm recommended)
- Python 3.10 or higher
- Internet connection (for initial setup only)
# Clone the repository
git clone <repository-url> ~/wellbeing_ai
cd ~/wellbeing_ai
# Run the setup script
chmod +x setup_rpi.sh
./setup_rpi.shThe script will:
- Create a Python virtual environment
- Install all pip dependencies
- Create the
data/memory/directory - Install Ollama (if not already installed)
- Pull the
phi3:minimodel
# 1. Install system dependencies
sudo apt update && sudo apt install -y python3 python3-pip python3-venv libatlas-base-dev
# 2. Create & activate virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Upgrade pip
pip install --upgrade pip
# 4. Install Python dependencies
pip install -r requirements.txt
# 5. Patch FER for RPi (fixes moviepy import error)
python patch_fer.py
# 6. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 7. Start Ollama and pull the model
ollama serve &
sleep 3
ollama pull phi3:mini
# 8. Create data directory
mkdir -p data/memory
# 9. Enable camera (if using RPi Camera Module)
sudo raspi-config
# Navigate to: Interface Options → Camera → Enable
# Reboot if promptedREM Clone the repository
git clone <repository-url>
cd wellbeing_ai
REM Run the setup script
setup_rpi.batOr manually:
python -m venv venv
venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt
REM Install Ollama from https://ollama.com/download/windows
ollama pull phi3:minisource venv/bin/activate # Linux/RPi
# OR
venv\Scripts\activate # Windows
python main.pyThis launches an interactive terminal session where you type messages and Maya responds. Camera emotion detection samples every 3 turns (configurable).
source venv/bin/activate # Linux/RPi
python web_app.pyThen open in a browser:
- On the Pi:
http://localhost:5000 - From another device on LAN:
http://<raspberry-pi-ip>:5000
The web interface provides:
- A chat window with streaming responses
- Live camera feed with emotion overlay (bounding boxes + labels)
- Real-time emotion emoji display
- System status indicators (LLM, Memory, Camera)
- Conversation reset button
All configuration lives in config/config.py. Key settings:
| Setting | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL (env: OLLAMA_BASE_URL) |
LLM_MODEL |
phi3:mini |
Ollama model name (env: LLM_MODEL) |
LLM_TEMPERATURE |
0.3 |
Creativity vs consistency (0.0–1.0) |
LLM_MAX_TOKENS |
60 |
Max response length in tokens |
LLM_NUM_CTX |
1024 |
Context window size |
LLM_NUM_THREAD |
4 |
CPU threads (matches RPi 5 quad-core) |
LLM_TIMEOUT |
300 |
Request timeout in seconds |
CAMERA_ENABLED |
True |
Enable/disable camera subsystem |
CAMERA_INDEX |
0 |
OpenCV camera device index |
CAMERA_SAMPLE_INTERVAL |
3 |
Capture emotion every N turns (CLI) |
MEMORY_COLLECTION |
conversations |
ChromaDB collection name |
MEMORY_TOP_K |
2 |
Number of memories to retrieve per query |
EXERCISE_TRIGGER_THRESHOLD |
-0.3 |
Sentiment threshold for offering exercises |
EXERCISE_COOLDOWN_TURNS |
5 |
Minimum turns between exercise offers |
DISPLAY_MODE |
terminal |
Display mode (terminal or eink) |
SYSTEM_PROMPT |
Maya persona | System prompt sent to the LLM |
These settings can be overridden via environment variables:
OLLAMA_BASE_URLLLM_MODELCAMERA_ENABLED(set to"true"/"false")DISPLAY_MODE
The central orchestrator. Initializes all subsystems and exposes:
check_systems()→ dict of subsystem health checksprocess(user_input, face_emotion, stream)→ runs the full 5-step pipeline (sentiment → memory retrieval → emotion update → LLM generation → memory storage)- Maintains a rolling conversation history (last 4 messages sent to LLM)
- Builds a dynamic system prompt that includes Maya's persona, current user mood, emotional trend guidance, and retrieved memory context
Communicates with Ollama's REST API:
is_available()→ checks if Ollama is running and the model is loaded (via/api/tags)generate(prompt, system)→ single-shot generation via/api/generate(streaming internally)chat(messages, stream_output)→ chat-style generation via/api/chat. Whenstream_output=True, returns a generator that yields tokens one by one for SSE streaming.- Handles connection errors, timeouts, and server unavailability gracefully with error messages.
Wraps VADER for conversational sentiment analysis:
analyze(text)→ returnsSentimentResult(label, compound, intensity, scores)- Labels:
positive(compound ≥ 0.05),negative(compound ≤ -0.05),neutral - Intensity = absolute value of compound score (0.0–1.0)
- Zero dependencies beyond
vaderSentiment, instant CPU execution
Maintains session-level emotional state:
update(sentiment, face_emotion, retrieved_memories)→ returnsMentalState- Tracks a sliding window of 10 sentiment scores and emotion labels
- Resolves dominant emotion (face > text mapping)
- Computes emotional trend (improving / declining / stable)
- Adjusts for long-term memory patterns (negative memory ratio)
ChromaDB-backed long-term memory with RAG:
store(MemoryEntry)→ stores a conversation turn with full metadataretrieve(query, top_k)→ semantic similarity search, returnslist[RetrievedMemory]- Uses cosine distance in HNSW index
- Each entry stores: user message, assistant response, sentiment label/score, emotion, timestamp
- Documents are formatted as
"The user said: ...\nMaya (the AI assistant) responded: ..."for embedding (prevents role confusion)
Manages guided mental exercises for stress relief:
ExerciseManager— tracks exercise state and cooldown periodsshould_offer_exercise(current_turn, cooldown_turns)→ checks if enough time has passed since last offerget_random_exercise()→ selects a random exercise, avoiding repetitionmark_exercise_offered(turn)→ records when an exercise was offeredformat_exercise_offer()→ generates the opt-in offer messageformat_exercise_steps(exercise)→ formats exercise steps into a single message- Contains 7 pre-built exercises: Box Breathing, Calming Breath, 5-4-3-2-1 Grounding, Quick Gratitude, Body Scan, Present Moment, Tension Release
- All exercises complete in under 30 seconds
Hardware abstraction for camera + emotion detection:
BaseCamera— abstract base class defining the interfaceWebcamCamera— implementation using OpenCV + FERcapture_emotion()→ capture frame, detect dominant emotion, return label or Nonecapture_frame()→ return raw OpenCV framecapture_snapshot_with_overlay()→ capture frame, detect emotion, draw bounding box + label, return (JPEG bytes, emotion label)is_available()→ check if camera is accessiblerelease()→ release camera resources- FER initialization is optional — if TensorFlow/FER is unavailable, the camera still works for frame capture without emotion detection
Hardware abstraction for output rendering:
BaseDisplay— abstract base classTerminalDisplay— rich terminal output with text wrappingshow_message(sender, message)— formatted chat outputshow_welcome()— welcome bannershow_emotion(emotion)— emoji-mapped emotion displayshow_status(status)— system status messagesclear()— clear terminal screen (platform-aware:clson Windows,clearon Linux)- Designed to be replaceable with an E-Ink display implementation
REST API + SSE streaming server:
GET /— serves the chat interface (templates/index.html)GET /api/status— returns system health (LLM, memory, camera)POST /api/chat— synchronous chat endpoint (returns full response)POST /api/chat_stream— SSE streaming chat endpoint (yields tokens)GET /api/camera/snapshot— returns base64 JPEG with emotion overlayGET /api/camera/emotion— returns detected emotion label onlyPOST /api/reset— resets conversation history
Single-page application with:
- Glassmorphism UI — frosted glass panels with gradient background
- Chat panel — message bubbles with avatars, typing indicator, auto-scroll
- Emotion sidebar — live camera feed, emotion emoji display, status indicators
- SSE streaming — reads token-by-token from
/api/chat_streamusingReadableStream - Emotion polling — polls
/api/camera/snapshotevery 2.5 seconds for live emotion updates - Responsive — adapts to mobile screens with stacked layout
- Font — Google Quicksand for a friendly, approachable feel
The Flask web application exposes the following REST API endpoints:
| Method | Endpoint | Description | Request Body | Response |
|---|---|---|---|---|
GET |
/ |
Serve the chat interface | — | HTML page |
GET |
/api/status |
System health check | — | {status, camera_enabled, model} |
POST |
/api/chat |
Synchronous chat | {message, capture_emotion} |
{response, face_emotion, turn_count} |
POST |
/api/chat_stream |
SSE streaming chat | {message, capture_emotion} |
SSE stream of tokens |
GET |
/api/camera/snapshot |
Camera frame + emotion | — | {image: base64, emotion} |
GET |
/api/camera/emotion |
Detect emotion only | — | {emotion, success} |
POST |
/api/reset |
Reset conversation | — | {success: true} |
POST |
/api/trigger_exercise |
Force exercise offer | — | {success, exercises} |
GET |
/api/exercises |
List all exercises | — | {exercises: [...]} |
POST |
/api/exercise/start |
Start an exercise | {name} |
{success, exercise} |
POST |
/api/exercise/skip |
Skip current exercise | — | {success: true} |
The /api/chat_stream endpoint emits these Server-Sent Events:
| Event Type | Payload | Description |
|---|---|---|
emotion |
{type: "emotion", emotion: "happy"} |
Detected facial emotion (sent first) |
token |
{type: "token", token: "Hello"} |
Single token from LLM response |
exercise_offer |
{type: "exercise_offer", exercises: [...]} |
Stress detected, offer exercises |
done |
{type: "done"} |
Response complete |
error |
{type: "error", error: "..."} |
Error occurred |
Comprehensive LLM benchmark that tests all locally installed Ollama models against wellbeing conversation prompts. Measures latency, TTFT, tokens/second, and empathy quality scores. Results saved to benchmark_results.json.
python benchmark_models.pyDiagnostic script that tests the full camera pipeline in 4 steps:
- Webcam access (OpenCV)
- FER library import
- FER detector initialization
- Live emotion detection with confidence scores
python test_camera.pyPatches the FER library's classes.py to make the moviepy import optional. This fixes the "No module named 'moviepy.editor'" error that occurs on Raspberry Pi since moviepy is not needed for emotion detection.
python patch_fer.pyDeletes all stored conversations from ChromaDB. Asks for confirmation before proceeding.
python reset_memory.pyDisplays all stored conversations with timestamps, sentiment labels, emotions, and message previews.
python view_memory.py[!] Ollama is not running or model not found.
Fix:
ollama serve & # Start Ollama server
ollama pull phi3:mini # Download the model⚠️ Camera enabled but not available
Fix (RPi Camera Module):
sudo raspi-config # Interface Options → Camera → Enable
sudo rebootFix (USB Webcam):
ls /dev/video* # Check available camera devices
# If your camera is at /dev/video1, set CAMERA_INDEX=1 in config/config.pyNo module named 'moviepy.editor'
Fix:
python patch_fer.pyThis is expected on RPi 5 CPU. Responses may take 30–120 seconds. To improve:
- Reduce
LLM_MAX_TOKENSinconfig/config.py - Reduce
LLM_NUM_CTX(smaller context = faster) - Ensure no other heavy processes are running
If you see numpy-related errors:
pip install "numpy>=1.24.0,<2.0.0"
pip install --force-reinstall "tensorflow>=2.15.0,<2.18.0"On some RPi OS versions, the system SQLite may be too old:
pip install pysqlite3-binary- All processing happens locally on the Raspberry Pi. No data is sent to any external server.
- Ollama runs locally — the LLM never contacts the internet during inference.
- ChromaDB stores data locally in
data/memory/on the device's filesystem. - Camera frames are processed in-memory and never saved to disk (only emotion labels are stored).
- The web interface binds to
0.0.0.0:5000— it is accessible on the local network. For additional security, configure a firewall to restrict access to trusted devices only. - No API keys are required. No accounts, no cloud services, no telemetry.
| Area | Enhancement | Description |
|---|---|---|
| Voice Input | Speech-to-text | Add offline whisper.cpp integration for voice conversations |
| Voice Output | Text-to-speech | Use Piper TTS for spoken Maya responses |
| E-Ink Display | Hardware display | Implement EInkDisplay class for Waveshare e-paper HAT |
| Multi-User | User profiles | Separate ChromaDB collections per user with face recognition |
| Journaling | Mood journal | Daily mood summaries and weekly trend reports |
| RPi Camera Module | Native camera | PiCameraModule class using picamera2 library |
| Larger Models | 8GB RPi option | Support qwen2.5 or llama3.1 on 8GB RPi 5 |
| Exercise Expansion | More exercises | Add progressive relaxation, visualization, and journaling prompts |
| Multilingual | Language support | Support Hindi, Spanish, and other languages via multilingual LLMs |
| Wearable Integration | Heart rate data | Integrate with fitness bands for physiological stress signals |
This project is intended for personal and educational use.