Best of Data AnalysisSeptember 2024

  1. 1
    Article
    Avatar of mlmMachine Learning Mastery·2y

    5 Real-World Machine Learning Projects You Can Build This Weekend

    Applying machine learning with real-world datasets teaches valuable skills like cleaning data and handling class imbalance. This guide provides five weekend projects with suggested datasets, goals, and focus areas, such as predicting house prices, sentiment analysis of tweets, customer segmentation, churn prediction, and movie recommendations. By building APIs and dashboards, you gain end-to-end machine learning experience.

  2. 2
    Article
    Avatar of mlmMachine Learning Mastery·2y

    Automating Data Cleaning Processes with Pandas

    Discover how to automate data cleaning processes using the Pandas library. Learn about typical data cleaning functions like filling missing values, removing duplicates, manipulating strings, and converting date formats. The post also introduces a custom class, DataCleaner, to encapsulate these steps into a reusable pipeline for an efficient and systematic approach to data cleaning.

  3. 3
    Article
    Avatar of medium_jsMedium·2y

    An Introduction to Bayesian A/B Testing

    A/B testing, also known as split testing, helps businesses optimize conversion rates by experimenting with different webpage versions. The post compares frequentist and Bayesian methods for analyzing A/B test results. It highlights the limitations of the Chi2 test in frequentist settings and demonstrates Bayesian modeling using Python's PyMC package. A more complex example of modeling customer behavior post-intervention showcases Bayesian flexibility in uncertain data scenarios. Bayesian inference is advocated for its intuitive interpretation and adaptability, especially when data is sparse and uncertainty modeling is crucial.

  4. 4
    Article
    Avatar of communityCommunity Picks·2y

    Frappe Insights: Open Source Data Analytics

    Frappe Insights is an open-source business intelligence tool designed to help businesses make data-driven decisions faster and more efficiently. It seamlessly integrates data from various sources, supports complex queries, and offers powerful visualisation features. With its intuitive dashboards, users can track metrics, receive data alerts, and share insights, enhancing overall operational efficiency and decision-making.

  5. 5
    Article
    Avatar of mlmMachine Learning Mastery·2y

    The Power of Pipelines

    Machine learning projects often involve a sequence of data preprocessing steps and learning algorithms. Sklearn pipelines automate critical aspects of these workflows, such as data preprocessing, feature engineering, and the integration of algorithms. This ensures consistency, reproducibility, and enhanced model reliability. Key highlights include the foundational concept of pipelines, the impact of feature engineering on model performance, and the use of SimpleImputer for handling missing data.

  6. 6
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Cyclical Feature Engineering

    Cyclical feature engineering is essential for handling features with recurring patterns, like the hour-of-the-day, day-of-the-week, and month-of-the-year, which are often overlooked. Using trigonometric functions such as sine and cosine helps capture the periodic nature of these features, retaining critical information. This approach ensures features like hours or days are correctly interpreted by models, improving their accuracy.

  7. 7
    Article
    Avatar of rich_tech123Tech Pioneers·2y

    Embracing Growth: My Journey as a Software Engineer and Data Analyst

    A passionate software engineer and data analyst shares their journey of continuous learning, evolution, and boundary-pushing. They invite others to connect and share knowledge.

  8. 8
    Article
    Avatar of rbloggersR-bloggers·2y

    📦 {alone} v0.5 is now available

    The {alone} package v0.5 is now available, featuring new subtitle and description fields for episodes. This release includes data from Alone Season 11, which is now ready for analysis. Notably, Season 11 is tied with Season 6 as the second highest-rated season on IMDb, while Season 7 remains the highest-rated. Three participants survived over 80 days in Season 11, similar to Season 7.