This repository explores methods for unsupervised anomaly detection on the MVTec AD dataset, focusing on the 'carpet' class. The project demonstrates a progression from simple autoencoders to advanced feature-based and transformer-based approaches, with clear visualizations and benchmarking.
We use the MVTec Anomaly Detection (MVTec AD) dataset, a real-world benchmark for unsupervised anomaly detection. This project focuses on the 'carpet' class, which contains high-resolution images of carpets with various types of defects (color, cut, hole, metal contamination, thread) and normal samples.
Some sample images from the notebooks(they contain much more visualization):
In test/ we have the defective images, and in ground truth/ you have respective segmentation masks. By mistakes this plot did not account for grayscale mask instead of RGB, but the mask is stil correct albeit viridis.

Reconstruction loss heatmap for autoencoder approach

2D t-SNE plot of ResNet50 features

Resnet with autoencoder reconstruction loss heatmap sample

Anamoly score distribution for PatchCore

Anamoly score distribution for ViT + KNN

- Notebook: train_autoencoder.ipynb
- Summary: Trains a simple convolutional autoencoder directly on image pixels. The model learns to reconstruct normal images; reconstruction error (L2 loss) is used as the anomaly score. Reconstruction loss heatmaps are used to visualize where the model detects anomalies.
- Result: AUC ≈ 0.44 (worse than random). Demonstrates the limitations of pixel-space autoencoders for complex textures.
- Notebook: resnet_knn.ipynb
- Summary: Uses a pre-trained ResNet50 as a feature extractor. Features from normal ("good") training images are stored in a memory bank. For a test image, features are extracted and compared to the memory bank using K-nearest neighbors (KNN); the mean distance to the k closest features is the anomaly score. Thresholding (mean + 2*std) is used to classify anomalies. t-SNE plots are used to visualize feature separability.
- Result: AUC ≈ 0.74. Shows the power of deep features and simple non-parametric scoring.
- Notebook: resnet_backbone.ipynb
- the ResNet50 model as a feature extractor and then used an autoencoder to reconstruct the features.
- This approach gave an AUC of 0.99, but the autoencoder takes training time.
- This method is also from a paper.
- Notebook: patch_core.ipynb
- A simplified implementation of PatchCore, as described here
- It is similar to the second method, except that it uses a memory bank of patches instead of the entire image features.
- It gave an AUC of 0.98 and does not have the overhead of training time of autoencoder, simply using pretrained ResNet50.
- Notebook: vit_knn.ipynb
- Used a pre-trained Vision Transformer (ViT) model to extract features.
- Only used the cls label embedding, to reduce training time.
- Created a memory bank of features from the training set.
- Used the k-nearest neighbors (KNN) algorithm to compute the anomaly score.
- Achieved AUC of 0.95, this is only ViT and KNN, did not use patch embeddings, no autoencoder, entirely trained on CPU because i ran out of gpu free credits, yet it performed well.
- Notebook: skipae.ipynb
- First I made a ae model but used symmetrical skip connections to reconstruct the image.
- It was trained on just the good images, and then used to reconstruct.
- The reconstruction is pretty good, good enough that even faulty images are reconstructed well, This has the lowest AUC at 0.41, worse than normal autoencoder.
- I also tried using SSIM loss instead of MSE, it significantly improved this model to 0.56 AUC, same model, same epochs, just different loss metric. Still pretty bad performance, as expected of pixel space autoencoders.
Plans:
Also looking to expand from carpets to other classes as well, and deploy model for small demo.
I also want to look into feature extractor backbone + VAE instead. I might perform better for generalized usecases where we can get image from different angles, and offsets etc.
Will also look into more VAE + FSL possibilities.
- EDA & Dataset Download: Exploratory data analysis, dataset structure, and citation/license details.
- Make Dataset: PyTorch dataset and dataloader creation.
- Autoencoder Baseline
- ResNet50 + KNN
- ResNet50 + Autoencoder
- PatchCore
- ViT + KNN
- Deep Learning: PyTorch, torchvision, transformers
- Computer Vision: Feature extraction, autoencoders, memory banks, anomaly detection
- Visualization: Matplotlib, seaborn, t-SNE, heatmaps
- Experimentation: Jupyter Notebooks, Colab, Implementing papers, and also implementing my own ideas using the papers, like using Vision Transformer backbone instead of ResNet50 etc.
- Reproducibility: Dataset download, preprocessing, and clear notebook structure
- MVTec AD Dataset: Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger, "A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection", CVPR 2019. [project page]
- PatchCore: Neumann, Lukas, et al. "PatchCore: Towards Total Recall in Industrial Anomaly Detection." CVPR 2021. [arXiv]
- Towards Total Recall in Industrial Anomaly Detection (PatchCore)
- Expand to other MVTec AD classes beyond carpets.
- Deploy a demo for real-time anomaly detection.
- Explore feature extractor backbones with variational autoencoders (VAE) and few-shot learning (FSL) approaches.
