Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Decision trees tend to overfit by classifying all training instances perfectly, leading to poor generalization. Random Forest introduces randomness to mitigate this by creating a bootstrapped dataset and randomly selecting candidate features for node splitting. The ExTra Trees algorithm adds an additional layer of randomness by selecting split thresholds randomly, further reducing model variance. When using ExTra Trees in sklearn, ensure the `bootstrap` flag is set to `True` to avoid using the full dataset for each tree.

Random Forest vs. ExTra Trees

Are you overwhelmed with the amount of information in ML/DS?

<p>How do you get more random than random</p>


<p>That begs the question, the whole point of calculating which node to split and split threshold for decision trees is to learn the distribution efficiently right? Is it that extra forest decision trees are generally bigger than random forest because of the unintelligent split? I guess that even by randomly splitting, the decision trees inside extra forest will eventually fit the whole dataset given growth is not constrained.</p>