Identifying outliers in data, especially in high-dimensional datasets, poses significant challenges. Using subspaces, or subsets of features, can enhance the outlier detection process by reducing the curse of dimensionality, improving accuracy, and facilitating interpretability. Techniques like KNN and LOF benefit from this approach. The post provides an overview of creating and using subspaces, and mentions tools such as PyOD, SOD, and FeatureBagging that help implement these techniques effectively.

36m read timeFrom towardsdatascience.com
Post cover image
Table of contents
Perform outlier detection more effectively using subsets of featuresChallenges with outlier detectionAn example of outliers based on their distances to other datapointsKNN and LOF algorithmsIssues with moderate numbers of featuresSubspacesFurther Motivations for SubspacesChoosing the subspacesPyODSOD (Subspace Outlier Detection)FeatureBaggingUsing other detectorsOngoing outlier detection projectsConclusions

Sort: