Perform outlier detection more effectively using subsets of features

Identifying outliers in data, especially in high-dimensional datasets, poses significant challenges. Using subspaces, or subsets of features, can enhance the outlier detection process by reducing the curse of dimensionality, improving accuracy, and facilitating interpretability. Techniques like KNN and LOF benefit from this approach. The post provides an overview of creating and using subspaces, and mentions tools such as PyOD, SOD, and FeatureBagging that help implement these techniques effectively.

#machine-learning

#python

#data-science

#algorithms

Nov 25, 2024•36m read time•From towardsdatascience.com

Table of contents

Perform outlier detection more effectively using subsets of features Challenges with outlier detection An example of outliers based on their distances to other datapoints KNN and LOF algorithms Issues with moderate numbers of features Subspaces Further Motivations for Subspaces Choosing the subspaces PyOD SOD (Subspace Outlier Detection)FeatureBagging Using other detectors Ongoing outlier detection projects Conclusions

Comment

Bookmark

Copy

Sort: