Best of Data AnalysisOctober 2024

  1. 1
    Article
    Avatar of communityCommunity Picks·2y

    Step by step, from zero to advanced.

    Regular Expressions (Regex) are strings of characters that follow specific syntax rules used for finding, matching, and editing data. They are applicable in various programming languages like Python, SQL, JavaScript, and tools like Google Analytics. Online resources such as RegexLearn offer tutorials and examples for learning Regex. After completing the learning modules, users can test and practice their knowledge with different levels of Regex tutorials.

  2. 2
    Video
    Avatar of wdsWeb Dev Simplified·2y

    Why You Should Replace Your Booleans With Timestamps

    Storing booleans as timestamps in databases offers benefits for debugging and analytics, such as knowing when certain actions occurred. However, it increases storage requirements significantly. This method is beneficial for enterprise applications but may not be necessary for smaller projects.

  3. 3
    Article
    Avatar of detlifeData Engineer Things·2y

    I spent 6 hours learning Apache Arrow: Overview

    Apache Arrow is a standard memory format designed for efficient data processing in analytics workloads. It focuses on performance and interoperability by leveraging a columnar in-memory format and aligned memory allocation. Arrow minimizes serialization and deserialization costs, enabling efficient data sharing between systems. Key elements include physical memory layouts for arrays, record batch serialization, and IPC formats enabling seamless inter-process and network data transfers. Arrow is widely adopted by various data projects, enhancing their performance and data handling capabilities.

  4. 4
    Article
    Avatar of medium_jsMedium·2y

    From Data Collection to Deployment: Mastering the Data Science Workflow

    Data science has evolved into a critical tool for strategic decision-making. The workflow from data collection to deployment is not linear but iterative. Key steps include defining the problem, gathering and cleaning data, conducting exploratory data analysis, feature engineering, model selection, training and tuning, evaluating performance, and finally deploying the model. Effective communication of results to stakeholders is also vital.

  5. 5
    Article
    Avatar of jetbrainsJetBrains·2y

    Where To Get Data for Your Data Science Projects

    Finding good data for data science projects can be challenging. This post discusses what makes data 'good,' including relevance, consistency, and timeliness. It contrasts structured and unstructured data, and explains common data formats like CSV and databases. The post also lists resources to find datasets, such as the UCI Machine Learning Repository, Kaggle, and Hugging Face. It highlights the importance of starting with structured data and provides guidance on the next steps after choosing a dataset.

  6. 6
    Video
    Avatar of youtubeYouTube·2y

    Learn SQL Beginner to Advanced in Under 4 Hours

    This post provides a comprehensive SQL tutorial, covering everything from setting up the MySQL environment to fundamental and advanced SQL topics. It includes exercises on data selection, querying, data cleaning, and exploratory data analysis, with two practical projects at the end. The tutorial is designed for beginners and progresses to advanced topics such as CTEs and temp tables.

  7. 7
    Article
    Avatar of logrocketLogRocket·2y

    Using Polars in Rust for high-performance data analysis

    Learn how to use Rust and Polars to create a high-performance data analysis application with a REST-based Web API. This guide covers setting up a Rust project, importing and manipulating data from CSV files, creating a web server using Axum and Tokio, and implementing API endpoints for retrieving and analyzing data. Polars is highlighted for its performance and versatility in handling large data sets.

  8. 8
    Article
    Avatar of communityCommunity Picks·2y

    Data Projects to Land Your First Job

    Job applications being ignored and portfolios blending in are common issues for aspiring data analysts. The key to standing out lies in undertaking meaningful projects for local charities, non-profits, and churches who need data assistance but lack the resources. These real-world projects can build credibility, offer valuable experience, and foster connections—all of which are more impressive to employers than generic, overused datasets.

  9. 9
    Article
    Avatar of communityCommunity Picks·2y

    Data Cleaning: 9 Ways to Clean Your ML Datasets

    Clean data is essential for accurate and reproducible machine learning models. This post details nine crucial data cleaning techniques for 2024, including handling missing values, outlier detection, duplicate removal, and using tools like DagsHub’s Data Engine, Apache Airflow, and scikit-learn. By ensuring datasets are clean and well-prepared, engineers can meaningfully benchmark model performance. Automated pipelines and advanced imputation methods are also discussed to streamline the data cleaning process.

  10. 10
    Article
    Avatar of tdsTowards Data Science·2y

    Advanced Techniques in Lying Using Data Visualizations

    Readers learn how data visualizations can be manipulated to support any narrative by omitting data points, exploiting pattern psychology, selectively categorizing data, and adjusting readability. The content encourages critical evaluation of presented data and emphasizes the importance of ethical practices in data presentation.

  11. 11
    Article
    Avatar of motherduckMotherDuck·2y

    Introducing the prompt() Function: Use the Power of LLMs with SQL!

    The new prompt() function allows the integration of small language models (SLMs) like OpenAI's gpt-4o-mini into SQL, enabling text summarization and structured data extraction directly within SQL queries. This function is currently in Preview on MotherDuck and supports various use cases such as bulk text summarization and unstructured to structured data conversion. Users can start exploring the function via the Free Trial or Standard Plan, with certain usage quotas in place.

  12. 12
    Article
    Avatar of freecoursesFREE COURSES!·2y

    100% FREE COURSE - Automate Excel Data Manipulation with Python and ChatGPT

    A 100% free course on how to automate Excel data manipulation using Python and ChatGPT is available on CodeCast. The course aims to teach efficient data handling techniques, combining the power of Python programming with the advanced capabilities of ChatGPT.

  13. 13
    Article
    Avatar of tdsTowards Data Science·2y

    Mastering Back-of-the-Envelope Math Will Make You a Better Data Scientist

    Back-of-the-envelope math, or quick-and-dirty estimates, can be more useful than complex models in many business scenarios. Such estimates help cut through complexity, enabling quicker and often sufficiently accurate decision-making. Scenarios are outlined for when rough estimates are appropriate, including assessing minimum viability, ranking options, and making best guesses. The post provides guidance on creating structured estimates and getting stakeholders comfortable with their accuracy.

  14. 14
    Video
    Avatar of youtubeYouTube·2y

    How I would learn Data Analysis (If i could start over) | Data Analyst Roadmap 2024

    Data analysts play a crucial role in helping organizations make data-driven decisions. Key steps include data collection, data cleaning and pre-processing, exploratory data analysis, and data visualization. Essential skills for the role include knowledge of statistics, Excel, SQL, and Python, along with strong communication skills. To stand out as a fresher, focus on unique projects, leverage data sets of personal interest, and publish dashboards online.

  15. 15
    Article
    Avatar of salesforceengSalesforce Engineering·2y

    Engineering 360 Dashboard: Transforming Complex Data into Powerful Engineering Insights

    The Engineering 360 Dashboard at Salesforce provides actionable insights and data-driven decisions for engineering operations, focusing on developer productivity, agile practices, and high product standards. It integrates with tools like Data Cloud, Tableau, and MuleSoft to eliminate data silos and ensure consistent metrics. Key challenges addressed include data unification, scalability, and maintaining data trust and security. Ongoing improvements include incorporating AI and anomaly detection to enhance the platform’s capabilities.

  16. 16
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Clean ML Datasets With Cleanlab

    Cleanlab, an open-source library developed by MIT researchers, helps clean datasets in just four lines of Python code. By identifying issues such as out-of-distribution samples, outliers, label problems, and duplicates, Cleanlab significantly improves dataset quality, which is crucial for training accurate machine learning models. Several demo notebooks are available for further learning.

  17. 17
    Video
    Avatar of youtubeYouTube·2y

    Database Normalization for Beginners | How to Normalize Data w/ Power Query (full tutorial!)

    Learn how to transform a single merged table into a star schema using Power Query in Excel. The post explains the concept of database normalization and its benefits in organizing data to eliminate redundancy. It demonstrates how to create separate lookup or dimension tables for customer, store, and product information, improving data integrity and scalability. The guide also covers advanced steps like splitting the transactions table into order-level and line-item-level tables, achieving second normal form, and further normalization into snowflake schemas, which optimize data for various analytical tasks.

  18. 18
    Video
    Avatar of youtubeYouTube·2y

    Excel for Data Analytics - Full Course for Beginners

    The post covers a comprehensive course on Excel for data analytics, designed for absolute beginners. It starts with basic functions, charts, and tables and progresses to advanced features like pivot tables, Power Query, and Power Pivot. The course includes hands-on exercises and real-world projects. All necessary resources, including Excel workbooks and datasets, are provided for free. Additionally, there are optional paid resources for guided practice and community support.

  19. 19
    Article
    Avatar of freecodecampfreeCodeCamp·2y

    Microsoft Excel: 14 Time-Saving Keyboard Shortcuts

    Microsoft Excel is essential for professionals in various fields. Learning keyboard shortcuts can greatly enhance productivity. Key shortcuts include `CTRL + T` to create a table, `ALT + N, V, T` for pivot tables, and `ALT + H, O, I` for autofitting column sizes. Mastering these shortcuts reduces dependency on the mouse, speeding up common tasks.

  20. 20
    Article
    Avatar of tdsTowards Data Science·2y

    Who Really Owns the Airbnbs You’re Booking? — Marketing Perception vs Data Analytics Reality

    Airbnb's marketing promotes an authentic, local experience, but data from InsideAirbnb reveals that many listings are owned by professional hosts with multiple properties, making the experience more hotel-like and profit-driven. An analysis of listings in European cities shows a high prevalence of multi-property ownership in cities with lax regulations. The post provides a guide on replicating this analysis for other cities using Python and data from InsideAirbnb.