A Probabilistic Transformation of Distance-Based Outliers

David Muhr

Michael Affenzeller

Josef Küng

2023-07-11

Distance-based outlier detection methods are widely used across data domains, yet the results of those methods are often tricky to interpret. In particular, distance-based outlier scores require some additional context for interpretation to convert the scores into binary decisions. Previous methods to transform distance-based scores into some interpretable form were either algorithm-specific, or completely algorithm-agnostic based purely on the resulting scores. In our work, we propose to use the distance-information to neighboring data points, a prerequisite common across distance-based outlier detection algorithms, to determine distance probability distributions and, subsequently, use the distributions to turn distance-based outlier scores into interpretable outlier probabilities. We show that this transformation does not impact detection performance and significantly increases the contrast between normal and outlier scores. To evaluate the proposed probabilistic transformation, we generalize commonly used k-nearest neighbors outlier detection methods as weighted k-nearest neighbors outlier detection and evaluate it on a wide range of tabular datasets. We further integrate our probabilistic transformation into the popular PatchCore method and show how the resulting ProbabilisticPatchCore method improves upon the original specification.

@article{Muhr2023,
  doi = {https://doi.org/10.3390/make5030042},
  url = {https://www.mdpi.com/2504-4990/5/3/42},
  year = {2023},
  publisher = {TBD},
  volume = {5},
  number = {3},
  pages = {782-802},
  author = {David Muhr, Michael Affenzeller and Josef Küng},
  title = {A Probabilistic Transformation of Distance-Based Outliers},
  journal = {Machine Learning and Knowledge Extraction}
}

To demonstrate the difference between distance-based and probabilistic outlier scores, we visualize the distance-based and probabilistic PatchCore scores for all test images of the MVTecAD dataset.

Dataset	Train	Normal	Outlier	Contrast
Carpet	280	28	89	787
Grid	264	21	57	258
Leather	245	32	92	186
Tile	230	33	84	1024
Wood	247	19	60	279
Bottle	209	20	63	1024
Cable	224	58	92	1024
Capsule	219	23	109	191
Hazelnut	391	40	70	667
Metal Nut	220	22	93	713
Pill	267	26	141	1024
Screw	320	41	119	105
Toothbrush	60	12	30	346
Transistor	213	60	40	1024
Zipper	240	32	119	900