Best of Data ScienceJuly 2024

  1. 1
    Article
    Avatar of hnHacker News·2y

    Data Structures Cheat Sheet

    This guide provides an introduction to data structures and their representation in Memgraph. It explains the basics of graphs, linked lists, queues, stacks, and trees, along with examples and queries to create these data structures using Memgraph. The document also discusses tree traversal algorithms like BFS and DFS and demonstrates how to run these algorithms in Memgraph.

  2. 2
    Article
    Avatar of communityCommunity Picks·2y

    25 Open Source AI Tools to Cut Your Development Time in Half

    A comprehensive overview of 25 open-source AI tools designed to streamline various stages of ML/AI projects, from data preparation to deployment and monitoring. Each tool is evaluated based on factors like popularity, impact, innovation, community engagement, and relevance to emerging AI trends. The guide aids in selecting appropriate tools by examining their unique features and suitability for specific use cases, thereby enhancing productivity and project success.

  3. 3
    Article
    Avatar of tdsTowards Data Science·2y

    Done is Better Than Perfect

    In high-growth companies, prioritizing completion over perfection can be crucial for career advancement. Perfectionism can hinder output, limit growth opportunities, and create workplace friction. Identifying and addressing perfectionist tendencies can help data scientists and other professionals deliver timely results, make decisions with incomplete data, and move projects forward while maintaining a balance between quality and efficiency.

  4. 4
    Article
    Avatar of hnHacker News·2y

    QuestDB

    QuestDB is an open-source time-series database with SQL analytics designed to efficiently handle data ingestion and analysis. The post details the development and debugging of a primary-replica replication feature, addressing a performance issue related to excessive network bandwidth usage. The author implemented a custom network profiling tool using Rust to capture and analyze network traffic, identifying the root cause of the problem. The solution involved optimizing how metadata was uploaded, ultimately improving bandwidth efficiency. Techniques used within QuestDB for high ingestion performance were also highlighted.

  5. 5
    Article
    Avatar of kdnuggetsKDnuggets·2y

    10 GitHub Repositories to Master Data Science

    Discover 10 essential GitHub repositories to master data science, offering interactive courses, books, guides, code examples, and free resources based on top university curricula. These repositories cover a wide range of topics, from statistics and Python to machine learning and data visualization techniques. Beginners and experienced practitioners alike can benefit from the comprehensive resources and best practices provided.

  6. 6
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Building Data Science Pipelines Using Pandas

    Learn to build end-to-end data science pipelines using the Pandas pipe method. This method enhances code readability, enables function chaining, and improves code organization. The tutorial includes transforming code into a pipeline structure that handles data ingestion, cleaning, analysis, and visualization, demonstrating a comparison between pipeline and non-pipeline approaches.

  7. 7
    Article
    Avatar of taiTowards AI·2y

    5 AI Real-World Projects To Set Foot in The Door

    Explore five real-world AI projects to kickstart your journey in data science. Learn how to build a RAG chatbot, create autonomous agents, train your own language model, fine-tune a BERT model for legal texts, and evaluate models effectively. Perfect for newcomers aiming to gain hands-on experience.

  8. 8
    Article
    Avatar of freecodecampfreeCodeCamp·2y

    How to Use the Python SDK to Build Your Own Web Scraper

    Learn how to use Python's Requests and Beautiful Soup libraries to build your own web scraper. This guide walks through scraping data from the UC Irvine Machine Learning Repository, covering the necessary libraries, defining functions to scrape and parse data, and saving the data to a CSV file. Important considerations include legal guidelines, ethical practices, and website compliance.

  9. 9
    Article
    Avatar of hnHacker News·2y

    Satyrn

    Satyrn is a modern Jupyter client for Mac offering faster startup times than VS Code and JupyterLab, inline code generation, and a minimalist design. It includes features like Black code formatting, easy graph and table copying, kernel manager for virtual environments, and works seamlessly with existing ipynb files without setup.

  10. 10
    Article
    Avatar of communityCommunity Picks·2y

    How SQL Enhances Your Data Science Skills

    SQL is vital for data scientists due to its ability to efficiently retrieve, manipulate, and analyze large datasets. Key SQL concepts such as SELECT statements, WHERE clauses, JOIN operations, and aggregate functions enhance data exploration, preparation, and integration. Mastering these SQL skills complements other data science tools and improves overall data handling capabilities.

  11. 11
    Article
    Avatar of rpythonReal Python·2y

    Exercises Course: Introduction to Web Scraping With Python – Real Python

    Web scraping involves collecting and parsing raw data from the Web. This course covers parsing website data using string methods, regular expressions, and an HTML parser. The course includes 23 lessons, downloadable resources, and a certificate of completion.

  12. 12
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Landing a Data Engineer Role: Free Courses and Certifications

    Training for a data engineer role doesn't have to be expensive. A curated list of 10 free data engineering courses offers quality education at no cost. Courses cover key areas such as SQL, Python, cloud data engineering, ETL and data pipelines, data warehousing, and Apache Spark. Many courses are provided by edX, and some require prior knowledge in SQL and relational databases. The article encourages that with dedication and persistence, one can achieve their data engineering goals through these free resources.

  13. 13
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    9 Python Command Line Flags

    Discover the 9 most common Python command line flags and how they modify the behavior of the Python interpreter. This includes flags like `-c` for running commands directly in the command line, `-i` for entering interactive mode after script execution, and `-O` and `-OO` for optimizing code by ignoring assert statements and docstrings. Additional flags like `-W` for ignoring warnings, `-m` for running modules as scripts, `-v` for verbose mode, `-x` for skipping the first line of a script, and `-E` for ignoring Python environment variables are also covered.

  14. 14
    Article
    Avatar of kdnuggetsKDnuggets·2y

    5 Free Courses to Master Natural Language Processing

    Explore five free courses that provide comprehensive training in Natural Language Processing (NLP). Courses range from beginner to advanced levels, covering fundamentals, Python libraries, AI-powered chatbots, and specialized NLP techniques using Google Cloud and deep learning models. Perfect for those looking to transition into the NLP field without incurring high costs.

  15. 15
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    GROUPING SETS in SQL

    Learn how to efficiently run multiple aggregations in SQL using GROUPING SETS, which allows scanning the table just once. This method is more efficient compared to using UNION with separate queries. The post provides a detailed example and a link to a Jupyter Notebook for practical implementation.

  16. 16
    Article
    Avatar of collectionsCollections·2y

    Key Data Job Trends and Opportunities in 2024

    The data job market in 2024 is highly competitive, with strong demand for skilled professionals. Python and SQL remain critical programming languages, while AI engineering roles are becoming increasingly important. Opportunities in freelancing are growing, and low-code/no-code tools are making data analytics more accessible. Key data engineering roles include Data Engineer, Big Data Engineer, and Machine Learning Engineer. Staying updated with industry trends and obtaining relevant certifications are crucial for success.

  17. 17
    Article
    Avatar of jetbrainsJetBrains·2y

    Polars vs. pandas: What’s the Difference?

    Polars is a powerful dataframe library built for speed and efficiency on a single machine, often outperforming pandas in memory usage and speed. Written in Rust and based on Apache Arrow, Polars offers features like safe concurrency and query optimization through lazy execution. Despite its performance advantages, Polars is less compatible with current data visualization and machine learning libraries compared to pandas.

  18. 18
    Article
    Avatar of kdnuggetsKDnuggets·2y

    5 Tools Every Data Scientist Needs in Their Toolbox in 2024

    To excel in data science in 2024, it's crucial to have the right tools: Python for programming, a solid foundation in maths and statistics, data visualization tools like Matplotlib and Tableau, SQL for managing databases, and frameworks such as TensorFlow and PyTorch. These tools help streamline your workflow and improve your ability to extract and communicate insights effectively.

  19. 19
    Article
    Avatar of freecodecampfreeCodeCamp·2y

    What are Markov Chains? Explained With Python Code Examples

    Markov chains are mathematical models used to predict future events based on current states, with applications in various fields such as finance, genetics, and robotics. This guide explains the key types of Markov chains, including Discrete-Time, Continuous-Time, and Hidden Markov Models, along with a Python code example demonstrating how to implement a Gaussian Hidden Markov Model. Markov chains are valued for their 'memoryless' property and their ability to model complex systems efficiently.

  20. 20
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Is Data Science Still Worth It In 2024?

    Despite the rise of generative AI tools like ChatGPT, there is still significant demand for data scientists, especially in tech-driven companies and sectors like healthcare, finance, and AI fields such as NLP and computer vision. The U.S. News & World Report ranks data science highly among tech and STEM jobs for 2024. Those considering a career in data science can benefit from online courses like DataCamp's Data Scientist Certification, which can be completed in 30 days and offers extensive resources for job placement and career growth.

  21. 21
    Article
    Avatar of kdnuggetsKDnuggets·2y

    How to Perform Memory-Efficient Operations on Large Datasets with Pandas

    Learn effective techniques to handle and perform memory-efficient operations on large datasets using Pandas. Tips include using the `low_memory` parameter when loading data, converting data types, processing data in chunks, and employing vectorized operations instead of `apply` with lambda functions. Additional suggestions include using `inplace=True` for DataFrame modifications and filtering data before performing operations.

  22. 22
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Learn Data Analysis with Julia

    Learn how to set up the Julia programming environment for data science, load and manipulate data, and create visualizations. This tutorial covers installing necessary packages, loading data into DataFrames, exploring and manipulating data, creating visualizations, and building a data processing pipeline using Julia. Perfect for beginners and those looking to expand their data analysis toolkit.

  23. 23
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Free Daily Dose of Data Science Archive

    The 2024 edition of the Daily Dose of Data Science archive has been released. It features categorized posts on key data science and machine learning topics, a 2-minute assessment to recommend relevant chapters, and a focus on practical, no-fluff content. The edition aims to maximize learning efficiency and enhance readers' skills significantly.

  24. 24
    Article
    Avatar of elasticelastic·2y

    Deep learning vs. machine learning: Understanding the differences

    Machine learning (ML) and deep learning (DL) are pivotal AI technologies transforming various industries by enabling data-driven decision-making. ML models, which learn from data without explicit programming, excel with structured data and simpler tasks, while DL models, inspired by the human brain, handle vast amounts of unstructured data and complex tasks using neural networks. The differentiation between them lies in their structure, complexity, and data handling capabilities, with DL offering superior performance for tasks like image and speech recognition. However, DL models require more computational resources and are less interpretable than ML models.

  25. 25
    Article
    Avatar of planetpythonPlanet Python·2y

    Dashboards in Python with Streamlit

    A discussion with Channin Nantasenamat covers Python and the Streamlit web framework, teaching bioinformatics, differences in data science disciplines, and his experiences as a YouTuber.