Extend the Bagging objective to any ML algorithm.

Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Classical ML models like tree-based ensembles typically require the entire dataset in memory, making them impractical for large-scale data. The Random Patches technique addresses this by sampling random subsets of both rows and columns to train individual trees, then combining them into an ensemble. This extends the Bagging objective — building maximally diverse trees reduces variance — and empirical results across 13 datasets show it often outperforms traditional random forests. The approach requires an ensemble setting but enables training on datasets that don't fit in memory without resorting to big-data frameworks like Spark MLlib.

Train Classical ML Models on Large Datasets

An open-source alternative to Anthropic’s most viral feature!

Train classical ML models on large datasets

P.S. For those wanting to develop “Industry ML” expertise: