A comprehensive machine learning framework for cyber-risk prediction using network traffic analysis on the CICIDS 2017 dataset. This project detects network intrusions, classifies attack types, and provides risk assessment for cybersecurity analysts.
- π Detect malicious network activities from traffic flow data
- π― Classify network traffic into Normal vs Attack categories
- π Map prediction probabilities to actionable risk levels (Low/Medium/High)
- π€ Provide explainable AI insights using SHAP
- π Deliver real-time analysis through an interactive dashboard
CyberRiskPrediction/
β
βββ π data/
β βββ raw/ # Original CICIDS 2017 CSV files
β βββ processed/ # Cleaned and feature-engineered data
β βββ README.md
β
βββ π notebooks/
β βββ 01_data_exploration.ipynb
β βββ 02_preprocessing.ipynb
β βββ 03_feature_engineering.ipynb
β βββ 04_model_training.ipynb
β βββ 05_evaluation_and_shap.ipynb
β
βββ π src/
β βββ data_preprocessing.py # Data cleaning and preparation
β βββ feature_engineering.py # Feature selection and creation
β βββ train_model.py # Model training pipeline
β βββ evaluate.py # Model evaluation metrics
β βββ risk_mapping.py # Probability to risk conversion
β βββ shap_explain.py # SHAP explainability
β βββ utils.py # Utility functions
β
βββ π models/ # Saved trained models
βββ π dashboard/ # Streamlit web application
βββ π results/ # Evaluation plots and reports
βββ π logs/ # Training logs
β
βββ requirements.txt
βββ run_dashboard.bat # Windows dashboard launcher
βββ run_pipeline.bat # Full pipeline runner
βββ README.md
- π Python 3.8 or higher
- πͺ Windows 10/11 (for .bat scripts)
- πΎ 8GB+ RAM recommended
- Clone the repository
git clone https://github.com/yourusername/CyberRiskPrediction.git
cd CyberRiskPrediction- Create virtual environment
python -m venv venv
venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Download CICIDS 2017 Dataset
- Download from: CICIDS 2017
- Place CSV files in
data/raw/directory
Option 1: Run Complete Pipeline
run_pipeline.batOption 2: Run Individual Steps
python src/data_preprocessing.py
python src/feature_engineering.py
python src/train_model.py
python src/evaluate.py
python src/shap_explain.pyrun_dashboard.batOr:
streamlit run dashboard/app.py- Source: Canadian Institute for Cybersecurity
- Records: 2.8+ million network flows
- Features: 80+ network traffic attributes
- Attack Types: DDoS, Port Scan, Brute Force, Web Attacks, etc.
| Feature | Description |
|---|---|
| Flow Duration | Total duration of the flow |
| Total Fwd/Bwd Packets | Packet counts in each direction |
| Flow Bytes/s | Data transfer rate |
| Packet Length Stats | Mean, std, min, max packet sizes |
| Flag Counts | TCP flag statistics |
| Model | Accuracy | F1-Score | ROC-AUC |
|---|---|---|---|
| Random Forest | 99.2% | 0.991 | 0.998 |
| XGBoost | 99.5% | 0.994 | 0.999 |
| LightGBM | 99.3% | 0.992 | 0.998 |
| Gradient Boosting | 98.8% | 0.987 | 0.996 |
def map_to_risk(probability):
"""Map prediction probability to risk level"""
if probability < 0.25:
return "π’ Low"
elif probability < 0.60:
return "π‘ Medium"
else:
return "π΄ High"- π Real-time Predictions: Upload network logs for instant analysis
- π Risk Visualization: Interactive charts and gauges
- β‘ Batch Processing: Analyze multiple records at once
- π€ Explainability: SHAP-based feature explanations
- π₯ Export Results: Download predictions as CSV
- Problem statement
- Objectives
- Scope
- Related work
- Existing solutions
- Data collection
- Preprocessing
- Feature engineering
- Model selection
- System architecture
- Algorithm details
- Dashboard design
- Performance metrics
- Comparison analysis
- Risk assessment
- Key findings
- Future work
Edit risk thresholds in src/risk_mapping.py:
risk_thresholds = {
'low': 0.25,
'medium': 0.60,
'high': 1.0
}- Sharafaldin, I., et al. (2018). "Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization"
- CICIDS 2017 Dataset Documentation
- XGBoost Documentation
- SHAP Library Documentation
- Your Name - Developer
This project is licensed under the MIT License - see the LICENSE file for details.
β Star this repo if you find it helpful!
For questions or support, please open an issue or contact the maintainers.