🛡️ Cyber-Risk Prediction Using Machine Learning

Project Overview

A comprehensive machine learning framework for cyber-risk prediction using network traffic analysis on the CICIDS 2017 dataset. This project detects network intrusions, classifies attack types, and provides risk assessment for cybersecurity analysts.

🎯 Objectives

🔍 Detect malicious network activities from traffic flow data
🎯 Classify network traffic into Normal vs Attack categories
📊 Map prediction probabilities to actionable risk levels (Low/Medium/High)
🤖 Provide explainable AI insights using SHAP
📈 Deliver real-time analysis through an interactive dashboard

📁 Project Structure

CyberRiskPrediction/
│
├── 📂 data/
│   ├── raw/                 # Original CICIDS 2017 CSV files
│   ├── processed/           # Cleaned and feature-engineered data
│   └── README.md
│
├── 📂 notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_preprocessing.ipynb
│   ├── 03_feature_engineering.ipynb
│   ├── 04_model_training.ipynb
│   └── 05_evaluation_and_shap.ipynb
│
├── 📂 src/
│   ├── data_preprocessing.py    # Data cleaning and preparation
│   ├── feature_engineering.py   # Feature selection and creation
│   ├── train_model.py           # Model training pipeline
│   ├── evaluate.py              # Model evaluation metrics
│   ├── risk_mapping.py          # Probability to risk conversion
│   ├── shap_explain.py          # SHAP explainability
│   └── utils.py                 # Utility functions
│
├── 📂 models/              # Saved trained models
├── 📂 dashboard/           # Streamlit web application
├── 📂 results/             # Evaluation plots and reports
├── 📂 logs/                # Training logs
│
├── requirements.txt
├── run_dashboard.bat       # Windows dashboard launcher
├── run_pipeline.bat        # Full pipeline runner
└── README.md

🚀 Quick Start

Prerequisites

🐍 Python 3.8 or higher
🪟 Windows 10/11 (for .bat scripts)
💾 8GB+ RAM recommended

Installation

Clone the repository

git clone https://github.com/yourusername/CyberRiskPrediction.git
cd CyberRiskPrediction

Create virtual environment

python -m venv venv
venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Download CICIDS 2017 Dataset
- Download from: CICIDS 2017
- Place CSV files in data/raw/ directory

Running the Pipeline

Option 1: Run Complete Pipeline

run_pipeline.bat

Option 2: Run Individual Steps

python src/data_preprocessing.py
python src/feature_engineering.py
python src/train_model.py
python src/evaluate.py
python src/shap_explain.py

Launch Dashboard

run_dashboard.bat

Or:

streamlit run dashboard/app.py

📊 Dataset Information

CICIDS 2017 Dataset

Source: Canadian Institute for Cybersecurity
Records: 2.8+ million network flows
Features: 80+ network traffic attributes
Attack Types: DDoS, Port Scan, Brute Force, Web Attacks, etc.

Key Features Used

Feature	Description
Flow Duration	Total duration of the flow
Total Fwd/Bwd Packets	Packet counts in each direction
Flow Bytes/s	Data transfer rate
Packet Length Stats	Mean, std, min, max packet sizes
Flag Counts	TCP flag statistics

🤖 Machine Learning Models

Model	Accuracy	F1-Score	ROC-AUC
Random Forest	99.2%	0.991	0.998
XGBoost	99.5%	0.994	0.999
LightGBM	99.3%	0.992	0.998
Gradient Boosting	98.8%	0.987	0.996

🎨 Risk Mapping Algorithm

def map_to_risk(probability):
    """Map prediction probability to risk level"""
    if probability < 0.25:
        return "🟢 Low"
    elif probability < 0.60:
        return "🟡 Medium"
    else:
        return "🔴 High"

📈 Results

Confusion Matrix

ROC Curve

SHAP Feature Importance

🖥️ Dashboard Features

🔍 Real-time Predictions: Upload network logs for instant analysis
📊 Risk Visualization: Interactive charts and gauges
⚡ Batch Processing: Analyze multiple records at once
🤖 Explainability: SHAP-based feature explanations
📥 Export Results: Download predictions as CSV

📝 Project Report Sections

1. Introduction

Problem statement
Objectives
Scope

2. Literature Review

Related work
Existing solutions

3. Methodology

Data collection
Preprocessing
Feature engineering
Model selection

4. Implementation

System architecture
Algorithm details
Dashboard design

5. Results & Discussion

Performance metrics
Comparison analysis
Risk assessment

6. Conclusion

Key findings
Future work

🔧 Configuration

Edit risk thresholds in src/risk_mapping.py:

risk_thresholds = {
    'low': 0.25,
    'medium': 0.60,
    'high': 1.0
}

📚 References

Sharafaldin, I., et al. (2018). "Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization"
CICIDS 2017 Dataset Documentation
XGBoost Documentation
SHAP Library Documentation

👥 Contributors

Your Name - Developer

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

⭐ Star this repo if you find it helpful!

For questions or support, please open an issue or contact the maintainers.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dashboard		dashboard
data		data
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
README.md		README.md
file.py		file.py
requirements.txt		requirements.txt
run_dashboard.bat		run_dashboard.bat
run_pipeline.bat		run_pipeline.bat

yashshinde0080/cyber

Folders and files

Latest commit

History

Repository files navigation

🛡️ Cyber-Risk Prediction Using Machine Learning

Project Overview

🎯 Objectives

📁 Project Structure

🚀 Quick Start

Prerequisites

Installation

Running the Pipeline

Launch Dashboard

📊 Dataset Information

CICIDS 2017 Dataset

Key Features Used

🤖 Machine Learning Models

🎨 Risk Mapping Algorithm

📈 Results

Confusion Matrix

ROC Curve

SHAP Feature Importance

🖥️ Dashboard Features

📝 Project Report Sections

1. Introduction

2. Literature Review

3. Methodology

4. Implementation

5. Results & Discussion

6. Conclusion

🔧 Configuration

📚 References

👥 Contributors

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages