A Probabilistic Transformation of Distance-Based Outliers

David Muhr

Michael Affenzeller

Josef Küng

2023-07-11

Distance-based outlier detection methods are widely used across data domains, yet the results of those methods are often tricky to interpret. In particular, distance-based outlier scores require some additional context for interpretation to convert the scores into binary decisions. Previous methods to transform distance-based scores into some interpretable form were either algorithm-specific, or completely algorithm-agnostic based purely on the resulting scores. In our work, we propose to use the distance-information to neighboring data points, a prerequisite common across distance-based outlier detection algorithms, to determine distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into interpretable outlier probabilities. We show that this transformation does not impact detection performance and significantly increases the contrast between normal and outlier scores. To evaluate the proposed probabilistic transformation, we generalize commonly used k-nearest neighbors outlier detection methods as weighted k-nearest neighbors outlier detection and evaluate it on a wide range of tabular datasets. We further integrate our probabilistic transformation into the popular PatchCore method and show how the resulting ProbabilisticPatchCore method improves upon the original specification.

@article{Muhr2023,
  doi = {https://doi.org/10.3390/make5030042},
  url = {https://www.mdpi.com/2504-4990/5/3/42},
  year = {2023},
  publisher = {TBD},
  volume = {5},
  number = {3},
  pages = {782-802},
  author = {David Muhr, Michael Affenzeller and Josef Küng},
  title = {A Probabilistic Transformation of Distance-Based Outliers},
  journal = {Machine Learning and Knowledge Extraction}
}

To demonstrate the difference between distance-based and probabilistic outlier scores, we visualize the distance-based and probabilistic PatchCore scores for all test images of the MVTecAD dataset.

Dataset Train Normal Outlier Contrast
Carpet 280 28 89 787
Grid 264 21 57 258
Leather 245 32 92 186
Tile 230 33 84 1024
Wood 247 19 60 279
Bottle 209 20 63 1024
Cable 224 58 92 1024
Capsule 219 23 109 191
Hazelnut 391 40 70 667
Metal Nut 220 22 93 713
Pill 267 26 141 1024
Screw 320 41 119 105
Toothbrush 60 12 30 346
Transistor 213 60 40 1024
Zipper 240 32 119 900