A Hybrid CNN-Transformer Approach for Accurate Surface Defect Detection
This repository implements a hybrid CNN-Transformer architecture designed for surface defect segmentation on industrial images. It combines the local feature extraction power of CNNs with the global dependency modeling of Transformers, based on a modified TransUNet framework.
Key enhancements include:
- A Mean Filter Module in the encoder to denoise input images and preserve local details.
- An Attention Gate Module in the decoder to enhance positional and spatial precision.
- The architecture has been tested on Crack500 dataset, demonstrating superior performance in segmentation accuracy, F1-score, and IoU.
Defect_Segmentation/
β
βββ configs/
β βββ configs.py # Configuration and training hyperparameters
β
βββ data/
β βββ traincrop/ # Training images and masks
β βββ valcrop/ # Validation images and masks
β βββ testcrop/ # Test images and masks
β
βββ docs/
β βββ crack500_samples.jpg # Sample images from Crack500 dataset
β βββ crack500_results.jpg # Example segmentation results
β βββ modified_transunet.jpg # Proposed architecture illustration
β
βββ networks/
β βββ vit_seg_modeling.py # Vision Transformer backbone for segmentation
β βββ vit_seg_configs.py # ViT architecture configuration
β
βββ utils/
β βββ dataset.py # Crack500Dataset class
β βββ evaluate.py # Evaluation metrics (IoU, F1, Precision, Recall)
β βββ losses.py # FocalLoss implementation
β βββ utils.py # Utility functions (GPU usage, result saving)
β
βββ results/ # Saved metrics, checkpoints, and plots
βββ main.py # Training & evaluation entry point
βββ requirements.txt # Project dependencies
- Data Loading & Normalization using PyTorch Dataset and DataLoader.
- Data Augmentation through random flips, rotations, and normalization.
- Model Initialization with Vision Transformer backbone and hybrid CNN layers.
- Training using Focal Loss and AdamW optimizer with mixed precision (
torch.cuda.amp). - Validation & Early Stopping based on the lowest validation loss.
- Evaluation on the test set with accuracy, precision, recall, F1-score, and IoU metrics.
- Result Visualization β outputs, metrics, and segmentation maps are saved in
/results/.
- Pavement crack images (2000Γ1500 px) with pixel-level annotations.
- Split: 250 train, 50 validation, 200 test images.
- After cropping: 1896 train / 348 val / 1124 test samples.
Modified TransUNet Framework
- Encoder: ResNet backbone + Transformer blocks
- Decoder: CNN layers + Attention Gate Module
- Mean Filter Module: Reduces input noise via multi-scale averaging
- Attention Gate: Enhances feature localization for precise segmentation
Results on Crack500 dataset: (a) Image, (b) Ground Truth, (c) DeepLabV3, (d) DeepLabV3+, (e) FPN, (f) MANet, (g) PAN, (h) PSPNet, (i) U-Net, (j) U-Net++, (k) UperNet, (l) TransUNet, (m) SegFormer, and (n) Our Proposed Method.
| Parameter | Value |
|---|---|
| Batch Size | 8 |
| Learning Rate | 5e-5 |
| Epochs | 100 (configurable) |
| Optimizer | AdamW |
| Early Stopping Patience | 10 |
| Loss Function | Focal Loss (Ξ± = 0.5, Ξ³ = 2.0) |
| Input Size | 256Γ256 |
| Framework | PyTorch 2.0+ |
| GPU | RTX 3090 |
| Metric | Description |
|---|---|
| Accuracy | Correct predictions ratio |
| Precision | True positives over predicted positives |
| Recall | True positives over actual positives |
| F1-Score | Harmonic mean of Precision & Recall |
| IoU | Intersection-over-Union for defect regions |
Results are exported to:
results/
βββ training_metrics.xlsx
βββ per_image_metrics.xlsx
βββ evaluation_metrics.xlsx
| Dataset | Accuracy (%) | F1-Score (%) | IoU (%) |
|---|---|---|---|
| Crack500 | 96.72 | 66.95 | 52.18 |
β
Outperforms classical CNNs (U-Net, PSPNet) and Transformers (SegFormer, TransUNet).
β
Strong balance between local detail preservation and global feature learning.
git clone https://github.com/rasoulameri/Defect_Segmentation.git
cd Defect_Segmentationpip install -r requirements.txtpython main.pytorch >= 2.0
torchvision
numpy
pandas
tqdm
matplotlib
ml_collections
opencv-python
scikit-learn
thop
If you use this code in your research, please cite:
R. Ameri, C.-C. Hsu, and S. S. Band,
A Hybrid CNN-Transformer Approach for Accurate Surface Defect Detection,
TAAI 2024, Taiwan, 2024.
@article{ameri2024hybrid,
title={A Hybrid CNN-Transformer Approach for Accurate Surface Defect Detection},
author={Ameri, R. and Hsu, C.-C. and Band, S. S.},
Conference={TAAI 2024},
year={2024},
address={Taiwan}
}Rasoul Ameri
π§ rasoulameri90@gmail.com
π GitHub Profile


