Skip to content

yashshinde0080/cyber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

πŸ›‘οΈ Cyber-Risk Prediction Using Machine Learning

Project Overview

A comprehensive machine learning framework for cyber-risk prediction using network traffic analysis on the CICIDS 2017 dataset. This project detects network intrusions, classifies attack types, and provides risk assessment for cybersecurity analysts.

Python License Status ML Cybersecurity

🎯 Objectives

  • πŸ” Detect malicious network activities from traffic flow data
  • 🎯 Classify network traffic into Normal vs Attack categories
  • πŸ“Š Map prediction probabilities to actionable risk levels (Low/Medium/High)
  • πŸ€– Provide explainable AI insights using SHAP
  • πŸ“ˆ Deliver real-time analysis through an interactive dashboard

πŸ“ Project Structure

CyberRiskPrediction/
β”‚
β”œβ”€β”€ πŸ“‚ data/
β”‚   β”œβ”€β”€ raw/                 # Original CICIDS 2017 CSV files
β”‚   β”œβ”€β”€ processed/           # Cleaned and feature-engineered data
β”‚   └── README.md
β”‚
β”œβ”€β”€ πŸ“‚ notebooks/
β”‚   β”œβ”€β”€ 01_data_exploration.ipynb
β”‚   β”œβ”€β”€ 02_preprocessing.ipynb
β”‚   β”œβ”€β”€ 03_feature_engineering.ipynb
β”‚   β”œβ”€β”€ 04_model_training.ipynb
β”‚   └── 05_evaluation_and_shap.ipynb
β”‚
β”œβ”€β”€ πŸ“‚ src/
β”‚   β”œβ”€β”€ data_preprocessing.py    # Data cleaning and preparation
β”‚   β”œβ”€β”€ feature_engineering.py   # Feature selection and creation
β”‚   β”œβ”€β”€ train_model.py           # Model training pipeline
β”‚   β”œβ”€β”€ evaluate.py              # Model evaluation metrics
β”‚   β”œβ”€β”€ risk_mapping.py          # Probability to risk conversion
β”‚   β”œβ”€β”€ shap_explain.py          # SHAP explainability
β”‚   └── utils.py                 # Utility functions
β”‚
β”œβ”€β”€ πŸ“‚ models/              # Saved trained models
β”œβ”€β”€ πŸ“‚ dashboard/           # Streamlit web application
β”œβ”€β”€ πŸ“‚ results/             # Evaluation plots and reports
β”œβ”€β”€ πŸ“‚ logs/                # Training logs
β”‚
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ run_dashboard.bat       # Windows dashboard launcher
β”œβ”€β”€ run_pipeline.bat        # Full pipeline runner
└── README.md

πŸš€ Quick Start

Prerequisites

  • 🐍 Python 3.8 or higher
  • πŸͺŸ Windows 10/11 (for .bat scripts)
  • πŸ’Ύ 8GB+ RAM recommended

Installation

  1. Clone the repository
git clone https://github.com/yourusername/CyberRiskPrediction.git
cd CyberRiskPrediction
  1. Create virtual environment
python -m venv venv
venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download CICIDS 2017 Dataset
    • Download from: CICIDS 2017
    • Place CSV files in data/raw/ directory

Running the Pipeline

Option 1: Run Complete Pipeline

run_pipeline.bat

Option 2: Run Individual Steps

python src/data_preprocessing.py
python src/feature_engineering.py
python src/train_model.py
python src/evaluate.py
python src/shap_explain.py

Launch Dashboard

run_dashboard.bat

Or:

streamlit run dashboard/app.py

πŸ“Š Dataset Information

CICIDS 2017 Dataset

  • Source: Canadian Institute for Cybersecurity
  • Records: 2.8+ million network flows
  • Features: 80+ network traffic attributes
  • Attack Types: DDoS, Port Scan, Brute Force, Web Attacks, etc.

Key Features Used

Feature Description
Flow Duration Total duration of the flow
Total Fwd/Bwd Packets Packet counts in each direction
Flow Bytes/s Data transfer rate
Packet Length Stats Mean, std, min, max packet sizes
Flag Counts TCP flag statistics

πŸ€– Machine Learning Models

Model Accuracy F1-Score ROC-AUC
Random Forest 99.2% 0.991 0.998
XGBoost 99.5% 0.994 0.999
LightGBM 99.3% 0.992 0.998
Gradient Boosting 98.8% 0.987 0.996

🎨 Risk Mapping Algorithm

def map_to_risk(probability):
    """Map prediction probability to risk level"""
    if probability < 0.25:
        return "🟒 Low"
    elif probability < 0.60:
        return "🟑 Medium"
    else:
        return "πŸ”΄ High"

πŸ“ˆ Results

Confusion Matrix

Confusion Matrix

ROC Curve

ROC Curve

SHAP Feature Importance

SHAP Summary

πŸ–₯️ Dashboard Features

  • πŸ” Real-time Predictions: Upload network logs for instant analysis
  • πŸ“Š Risk Visualization: Interactive charts and gauges
  • ⚑ Batch Processing: Analyze multiple records at once
  • πŸ€– Explainability: SHAP-based feature explanations
  • πŸ“₯ Export Results: Download predictions as CSV

πŸ“ Project Report Sections

1. Introduction

  • Problem statement
  • Objectives
  • Scope

2. Literature Review

  • Related work
  • Existing solutions

3. Methodology

  • Data collection
  • Preprocessing
  • Feature engineering
  • Model selection

4. Implementation

  • System architecture
  • Algorithm details
  • Dashboard design

5. Results & Discussion

  • Performance metrics
  • Comparison analysis
  • Risk assessment

6. Conclusion

  • Key findings
  • Future work

πŸ”§ Configuration

Edit risk thresholds in src/risk_mapping.py:

risk_thresholds = {
    'low': 0.25,
    'medium': 0.60,
    'high': 1.0
}

πŸ“š References

  1. Sharafaldin, I., et al. (2018). "Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization"
  2. CICIDS 2017 Dataset Documentation
  3. XGBoost Documentation
  4. SHAP Library Documentation

πŸ‘₯ Contributors

  • Your Name - Developer

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


⭐ Star this repo if you find it helpful!

For questions or support, please open an issue or contact the maintainers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors