An experiment injecting domain knowledge as a differentiable soft constraint into a neural network's loss function for fraud detection on a severely imbalanced dataset (0.17% positive rate). The hybrid neuro-symbolic approach adds a rule penalty that fires on transactions with high amounts and unusual PCA norms, even when labeled fraud is absent from a batch. Across 5 random seeds, the hybrid shows a consistent but small ROC-AUC improvement (0.970 vs 0.967) while F1 and PR-AUC differences fall within noise range. Key lessons: symmetric threshold evaluation is critical for fair model comparison on imbalanced data, single-seed results are unreliable, and high lambda values (≥1.0) can override the BCE signal and degrade performance. Full code with lambda sweep and variance analysis is available on GitHub.

14m read timeFrom towardsdatascience.com
Post cover image
Table of contents
AbstractThe Problem: When ROC-AUC LiesThe SetupThe ModelThe Rule LossTuning LambdaResultsVariance Analysis — 5 Random SeedsWhy Does the Rule Loss Help ROC-AUC?On Threshold Evaluation in Imbalanced ClassificationThings to Watch Out ForClosing ThoughtsReferencesDisclosure

Sort: