AnomalyVFM - Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors

1 University of Ljubljana, Faculty of Computer and Information Science, 2 *codeplain
CVPR 2026

Indicates Equal Contribution

Transforming VFMs into Zero-Shot Anomaly Detectors

TL;DR: AnomalyVFM proposes a framework on how to transform any (transformer) VFM into a strong zero-shot anomaly detector. It does so by leveraging synthetic images and parameter-efficient finetuning. It achieves SOTA results across 9 industrial inspection datasets.

Abstract

Zero-shot anomaly detection aims to detect and localise abnormal regions in the image without access to any in-domain training images. While recent approaches leverage vision–language models (VLMs), such as CLIP, to transfer high-level concept knowledge, methods based on purely vision foundation models (VFMs), like DINOv2, have lagged behind in performance. We argue that this gap stems from two practical issues: (i) limited diversity in existing auxiliary anomaly detection datasets and (ii) overly shallow VFM adaptation strategies. To address both challenges, we propose AnomalyVFM, a general and effective framework that turns any pretrained VFM into a strong zero-shot anomaly detector. Our approach combines a robust three-stage synthetic dataset generation scheme with a parameter-efficient adaptation mechanism, utilising low-rank feature adapters and a confidence-weighted pixel loss. Together, these components enable modern VFMs to substantially outperform current state-of-the-art methods. More specifically, with RADIO as a backbone, AnomalyVFM achieves an average image-level AUROC of 94.1% across 9 diverse datasets, surpassing previous methods by significant 3.3 percentage points.

Contributions

  • Synthetic Dataset Generation A new modular synthetic dataset generation scheme exploiting pretrained Image Generation Models, such as FLUX. Upon acceptance we will also release a large dataset of synthethic images to enable faster development of future methods.
  • VFM Adaptation Framework A new framework (AnomalyVFM) that adjusts pretrained transformer VFMs leveraging the synthethic dataset and parameter-efficient finetuning for 0-shot anomaly detection.

Synthetic Dataset Generation

Transforming VFMs into Zero-Shot Anomaly Detectors

Anomaly-free images are created using an image generation model and then modified via inpainting to produce anomalous versions within a targeted region. Corresponding masks are generated by comparing feature-level differences between the normal and anomalous images, which also serves to filter out samples where the defect failed to generate.

Examples of Generated Images

AnomalyVFM

Transforming VFMs into Zero-Shot Anomaly Detectors

AnomalyVFM adapts a pretrained backbone by injecting LoRA-based feature adaptation modules into the transformer attention layers to refine internal representations. It utilizes a convolutional decoder and a confidence-weighted loss to generate segmentation masks, a combination specifically designed to remain robust against noise in synthetic training labels.

Generalisation across various backbones

Transforming VFMs into Zero-Shot Anomaly Detectors

Experiments across various backbones demonstrate generalisation of AnomalyVFM. On the image SD stands for Synthetic Dataset and FA for Feature Adaptors.

Results

Transforming VFMs into Zero-Shot Anomaly Detectors

Transforming VFMs into Zero-Shot Anomaly Detectors

BibTeX

@InProceedings{fucka2026anomaly_vfm,
    title={AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors},
    author={Fučka, Matic and Zavrtanik, Vitjan and Skočaj, Danijel},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026}
}