Best of Data ScienceJune 2024

  1. 1
    Article
    Avatar of communityCommunity Picks·2y

    6 python libraries to make beautiful maps

    This post introduces 6 Python libraries for making informative and stylish maps. The libraries mentioned include Cartopy, Folium, Plotly, ipyleaflet, geemap, and ridgemap. They offer various features and capabilities for static and interactive map visualizations.

  2. 2
    Video
    Avatar of TechWithTimTech With Tim·2y

    10 Useful Python Modules You NEED to Know

    Discover essential Python modules including request for HTTP requests, Flask for lightweight web development, pydantic for data validation, FastAPI for creating fast APIs, and Django for professional web applications. Automate tasks with selenium, perform mathematical operations with numpy, manipulate data with pandas, and visualize data with matplotlib. Additionally, delve into TensorFlow for deep learning and LangChain for advanced AI applications.

  3. 3
    Article
    Avatar of hnHacker News·2y

    Python-based ETL

    Amphi is a Python-based ETL tool designed for efficient data extraction, transformation, and loading with a low-code approach. It features a graphical user interface for designing data pipelines, generating deployable native Python code, and supporting various data formats like CSV and JSON. Amphi ensures flexibility, ease of sharing pipeline definitions, and guarantees data privacy as processing is done locally. The platform is aimed at fostering community collaboration among data practitioners of all levels.

  4. 4
    Article
    Avatar of freecodecampfreeCodeCamp·2y

    Learn Python for Data Science – Hands-on Projects with EDA, AB Testing & Business Intelligence

    A comprehensive Python data science course covering data analytics, AB testing, and end-to-end case studies with hands-on projects.

  5. 5
    Article
    Avatar of freecodecampfreeCodeCamp·2y

    Practical Guide to Linear Algebra in Data Science and AI

    Linear algebra is a practical tool that can be used to solve real-world problems in data science and AI. It is applied across various industries, and understanding its core concepts is essential for working with machine learning, deep learning, computer vision, and generative AI. A linear algebra roadmap for 2024 is provided to guide your learning journey, and there are numerous resources available to help you master linear algebra.

  6. 6
    Article
    Avatar of medium_jsMedium·2y

    10 Best Practices for Data Science

    This post discusses 10 best practices for data science, including starting and staying organized, using version control, separating notebooks and source files, writing tests and sanity checks, automating the data pipeline, centralizing important parameters, making project runs verbose, and starting with a simple end-to-end pipeline. These practices promote reproducibility, collaboration, reliability, and efficiency in data science projects.

  7. 7
    Article
    Avatar of medium_jsMedium·2y

    Linear Algebra Concepts Every Data Scientist Should Know

    Linear algebra is fundamental in transforming theoretical data science models into practical solutions. It is crucial for data representation, dimensionality reduction, optimization, feature engineering, and similarity measures. Concepts such as vectors, vector spaces, matrices, and operations like dot products and matrix multiplication are key foundational topics. Understanding the basis, rank, determinants, eigenvectors, and eigenvalues are vital for advanced applications in data science and machine learning.

  8. 8
    Article
    Avatar of mlnewsMachine Learning News·2y

    Firecrawl: A Powerful Web Scraping Tool for Turning Websites into Large Language Model (LLM) Ready Markdown or Structured Data

    Firecrawl, developed by Mendable AI, is a state-of-the-art web scraping tool designed to efficiently extract data from websites, including those with dynamic JavaScript-rendered content. It outputs clean, well-formatted Markdown suitable for Large Language Model (LLM) applications, while incorporating caching mechanisms and generative feedback loops to enhance data quality and extraction efficiency. Users can access Firecrawl via an intuitive API and multiple SDKs for different programming languages.

  9. 9
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Why You Should Learn SQL in 2024

    Learning SQL is crucial in 2024 as it remains a highly demanded skill for data professionals, enabling efficient data management and analysis. SQL's readability, standardization, and integration with other tools like Python and R make it an invaluable asset in any data-centric environment. Mastering SQL can significantly enhance one's ability to handle large datasets, perform complex queries, and interact with various database systems.

  10. 10
    Article
    Avatar of tdsTowards Data Science·2y

    Exploratory Data Analysis in 11 Steps

    Exploratory Data Analysis (EDA) involves a structured process that starts with stakeholder communication to identify objectives, followed by defining analysis goals and research questions. Analysts should review existing knowledge, assess data accessibility, clean and transform data, and use summary statistics to understand data patterns. Key findings should be documented as the analysis progresses and shared appropriately with stakeholders.

  11. 11
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    4 Ways to Test ML Models in Production

    Testing ML models in production is crucial to ensure reliability and performance on real-world data. Four common strategies are A/B testing, canary testing, interleaved testing, and shadow testing. A/B testing distributes requests non-uniformly between models, while canary testing gradually rolls out the candidate model to a subset of users. Interleaved testing mixes predictions from both models, and shadow testing logs outputs without affecting user experience. These techniques help mitigate risks and validate the model effectively.

  12. 12
    Article
    Avatar of collectionsCollections·2y

    From Full-Stack to Data Science: My Journey So Far

    Saad Fazal shares his diverse career journey from full-stack development to data science. Exploring various roles like Shopify developer, Flutter mobile app developer, and Unity game developer, he highlights the importance of continuous learning, networking, and strategic planning in achieving career goals. His story emphasizes the value of flexibility and dedication in navigating tech career shifts.

  13. 13
    Article
    Avatar of kdnuggetsKDnuggets·2y

    5 Free Templates for Data Science Projects on Jupyter Notebook

    Boost your data science project with these 5 free templates for Jupyter Notebook.

  14. 14
    Article
    Avatar of medium_jsMedium·2y

    The Internet isn’t Fun Anymore

    The Internet used to be a safe space for exploring and connecting, but now it feels suffocating due to constant AI tools, ads, and monetization. Users have limited agency and feel overexposed. Companies are pushing AI features that are often annoying or useless. Some companies are marketing active refusal of AI. People are losing their ability to make choices and think for themselves online.

  15. 15
    Article
    Avatar of planetpythonPlanet Python·2y

    [June 2024] Python Monthly Newsletter 💻🐍

    Stay updated with the latest in Python and tech through this monthly newsletter. It includes curated important Python articles, resources, and tools. Highlights include NVIDIA's new Warp library for high-performance simulation, a recap of the annual Python conference, and insights into AI strategies from tech giants like Apple, Meta, and Google. Additionally, the newsletter touches on interesting tech stories and discussions about industry trends and software practices.

  16. 16
    Article
    Avatar of jetbrainsJetBrains·2y

    How to Move From pandas to Polars

    Polars is gaining popularity in the data science community due to its speed and security benefits, being written in Rust and based on Apache Arrow. Polars offers a similar API to pandas, which lowers the barrier for migration. It handles large data sets more efficiently with its lazy API and better concurrency capabilities. Tools like PyCharm support Polars, smoothing the transition. The primary differences in syntax and migration tips are provided, ensuring a relatively seamless switch from pandas to Polars.

  17. 17
    Article
    Avatar of mlmMachine Learning Mastery·2y

    5 Free YouTube Channels Dedicated to Machine Learning Education

    Discover five YouTube channels that offer free, high-quality tutorials on machine learning, data science, and programming. These channels—StatQuest with Josh Starmer, Codebasics, freeCodeCamp, Sentdex, and Data School—provide content ranging from beginner to advanced levels. Topics covered include machine learning algorithms, Python programming, statistical analysis, and deep learning. The post emphasizes the importance of hands-on practice for effective learning.

  18. 18
    Article
    Avatar of kdnuggetsKDnuggets·2y

    5 Free Competitions for Aspiring Data Scientists

    Learn about 5 free data science competitions for aspiring data scientists, including Kaggle Competitions, DataHack by Analytic Vidhya, AI Hackathons by MachineHack, AI Crowd, and DrivenData.

  19. 19
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Beginner’s Guide to Machine Learning Testing With DeepChecks

    DeepChecks is a Python package providing built-in checks for machine learning testing, including data integrity and model evaluation. This guide explains how to validate datasets, test trained models, and generate comprehensive reports using minimal code. You will learn to use DeepChecks for data integrity tests, train machine learning models, and run model evaluation suites. Also included are instructions on performing single checks, saving reports as JSON or HTML, and automating processes using GitHub Actions.

  20. 20
    Article
    Avatar of medium_jsMedium·2y

    Forecasting Gold Prices with TimeGPT

    This post explores how TimeGPT, a time series LLM model, can be used with gold price data to accurately forecast future prices. The post covers the process of retrieving gold price data, preprocessing the data, setting up TimeGPT, and interpreting the forecasted prices and confidence intervals.

  21. 21
    Article
    Avatar of mlnewsMachine Learning News·2y

    MaxKB: Knowledge Base Question Answering System Based on Large Language Models LLMs

    MaxKB is an advanced question-answering system based on large language models, simplifying knowledge management for businesses. It supports direct document uploads, automatic crawling, intelligent text processing, and enhances data accessibility through automatic text splitting and vectorization. With its retrieval enhancement generation (RAG) for precise answers and a user-friendly interface, MaxKB offers both power and accessibility, suitable for integration in various business environments.

  22. 22
    Article
    Avatar of nvidiadevNVIDIA Developer·2y

    Machine Learning – What Is It and Why Does It Matter?

    Many industries use data science and machine learning to recognize patterns, detect changes, and make predictions to enhance their operations. The availability of open-source tools has facilitated this trend since the mid-2000s. Today, improvements in predictive models can result in significant financial gains. However, training these models requires significant computational resources, with GPUs offering a solution to scalability issues that CPUs can no longer handle due to the limitations posed by Moore's law.

  23. 23
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Poisson Regression vs. Linear Regression

    Linear regression may not be suitable for count data as it can produce negative predictions, which don't make sense for certain types of data like the number of calls received. Poisson regression, a type of generalized linear model (GLM), is better suited for count-based responses as it assumes the data follows a Poisson distribution. It ensures non-negative predictions and acknowledges that outcomes are not equally likely around the mean.

  24. 24
    Article
    Avatar of kdnuggetsKDnuggets·2y

    I Took the Google Data Analytics Certification Where 2,148,697 Have Already Enrolled

    A personal review of the Google Data Analytics Certification, highlighting its flexibility, content, and suitability for beginners in the tech industry.

  25. 25
    Article
    Avatar of medium_jsMedium·2y

    How to Maximize Your Impact as a Data Scientist

    Learn why focusing on impact is important for data scientists' career growth, the challenges in driving real impact, and how to become more impact-focused in your work.