Best of Data Analysis — June 2024

1
Article
KDnuggets·2y
10 GitHub Repositories to Master SQL
This post lists 10 GitHub repositories that can help readers master SQL and database management. The repositories include tutorials, practice exercises, comprehensive courses, and tools for SQL-related tasks.
199
4
2
Article
System Design Codex·2y
3 Types of Event Patterns in EDA
Event-Driven Architecture (EDA) revolves around components sending and receiving events to communicate. There are three primary event patterns: Event Notifications, which inform other components of an occurrence with minimal data; Event-Based State Transfer, where events containing necessary information are pushed to consuming components; and Event Sourcing, which involves storing and replaying events to reconstruct entity states. Each pattern offers unique advantages for different scenarios.
110
1
3
Article
freeCodeCamp·2y
Learn Python for Data Science – Hands-on Projects with EDA, AB Testing & Business Intelligence
A comprehensive Python data science course covering data analytics, AB testing, and end-to-end case studies with hands-on projects.
98
4
Article
Code Like A Girl·2y
SQL Essentials: GROUP BY vs. PARTITION BY explained
Understanding the differences between GROUP BY and PARTITION BY clauses in SQL is crucial for efficient data analysis. GROUP BY is used to summarize data by grouping rows that have the same values in specified columns, while PARTITION BY is used for detailed calculations within specific partitions. GROUP BY can reduce the number of rows by summarizing data, whereas PARTITION BY adds additional information without reducing rows. Both clauses support aggregate functions, but PARTITION BY also supports ranking and time-series functions.
85
6
5
Article
Engineering Enablement·2y
Measuring Developer Experience at Google
Google's Engineering Satisfaction (EngSat) survey, in use since 2018, measures developer productivity by combining survey insights with system-based metrics. The survey is conducted quarterly and has adapted over time to include consistent staffing, effective processes, and robust infrastructure. EngSat has helped Google track productivity changes, address technical debt, and validate new metrics. The program faces challenges such as increasing survey length and decreasing response rates, which are managed through strategic sampling and transparency in reporting. Google's approach offers valuable advice for other organizations wanting to implement developer surveys.
52
6
Article
KDnuggets·2y
Why You Should Learn SQL in 2024
Learning SQL is crucial in 2024 as it remains a highly demanded skill for data professionals, enabling efficient data management and analysis. SQL's readability, standardization, and integration with other tools like Python and R make it an invaluable asset in any data-centric environment. Mastering SQL can significantly enhance one's ability to handle large datasets, perform complex queries, and interact with various database systems.
49
7
Article
Towards Data Science·2y
Exploratory Data Analysis in 11 Steps
Exploratory Data Analysis (EDA) involves a structured process that starts with stakeholder communication to identify objectives, followed by defining analysis goals and research questions. Analysts should review existing knowledge, assess data accessibility, clean and transform data, and use summary statistics to understand data patterns. Key findings should be documented as the analysis progresses and shared appropriately with stakeholders.
47
1
8
Article
JetBrains·2y
How to Move From pandas to Polars
Polars is gaining popularity in the data science community due to its speed and security benefits, being written in Rust and based on Apache Arrow. Polars offers a similar API to pandas, which lowers the barrier for migration. It handles large data sets more efficiently with its lazy API and better concurrency capabilities. Tools like PyCharm support Polars, smoothing the transition. The primary differences in syntax and migration tips are provided, ensuring a relatively seamless switch from pandas to Polars.
39
1
9
Article
Medium·2y
Forecasting Gold Prices with TimeGPT
This post explores how TimeGPT, a time series LLM model, can be used with gold price data to accurately forecast future prices. The post covers the process of retrieving gold price data, preprocessing the data, setting up TimeGPT, and interpreting the forecasted prices and confidence intervals.
28
1
10
Article
Hacker News·2y
goldmansachs/gs-quant: Python toolkit for quantitative finance
GS Quant is a Python toolkit developed by Goldman Sachs for quantitative finance, facilitating the development of trading strategies, derivative structuring, and risk management solutions. It leverages 25 years of experience in global markets and includes statistical packages for data analytics applications. It requires Python 3.6 or greater and can be installed via PIP.
25
11
Article
NVIDIA Developer·2y
Machine Learning – What Is It and Why Does It Matter?
Many industries use data science and machine learning to recognize patterns, detect changes, and make predictions to enhance their operations. The availability of open-source tools has facilitated this trend since the mid-2000s. Today, improvements in predictive models can result in significant financial gains. However, training these models requires significant computational resources, with GPUs offering a solution to scalability issues that CPUs can no longer handle due to the limitations posed by Moore's law.
23
12
Article
Product Hunt·2y
SQL Workbench - In-browser SQL Workbench for data querying & visualization
SQL Workbench, launched on June 19th, 2024, offers an in-browser solution for data querying and visualization. Featured under Developer Tools and Data & Analytics, it marks the first release of this tool. Perfect for users looking for a browser-based SQL Workbench to manage their data efficiently.
22
13
Article
KDnuggets·2y
I Took the Google Data Analytics Certification Where 2,148,697 Have Already Enrolled
A personal review of the Google Data Analytics Certification, highlighting its flexibility, content, and suitability for beginners in the tech industry.
20
14
Article
Medium·2y
How to Maximize Your Impact as a Data Scientist
Learn why focusing on impact is important for data scientists' career growth, the challenges in driving real impact, and how to become more impact-focused in your work.
20
15
Article
Code Like A Girl·2y
SQL Window Functions: The Ultimate Tool for Data Enthusiasts
Learn about SQL window functions, their syntax, benefits, and common use cases like ranking and time-series analysis. Mastering these functions can greatly enhance your data analysis skills.
19
1
16
Article
Towards Data Science·2y
From Code to Insights: Software Engineering Best Practices for Data Analysts
This post provides software engineering best practices for data analysts. It covers key lessons, such as code readability, automation of repetitive tasks, mastering tools, managing environments, optimizing program performance, DRY principle, leveraging testing, using version control systems, seeking code reviews, and staying up-to-date.
18
17
Article
Hacker News·2y
My thoughts on Python in Excel
Python in Excel is an alternative to the Excel formula language and has use cases for computationally intensive tasks, AI, advanced visualizations, and time-series analysis. However, it is not suitable for beginners or interactive data analysis. There are also restrictions such as not being able to use custom packages or connect to web APIs.
15
18
Article
Towards AI·2y
A Data Analysis Project — Smart Phones Data Analysis.
A data analysis project on smartphone data. Extracting insights on brands, models, prices, ratings, 5G capability, IR blaster, processor brands, cores, battery capacity, RAM capacity, screen size, operating systems, resolution, refresh rate, and more.
15
19
Article
InfoWorld·2y
11 surprising ways developers are using Wasm
WebAssembly, or Wasm, has a wide range of surprising applications including speech decoding, data analysis, old video games, functions as a service, and plugin integrations.
15
1
20
Article
Grafana Labs·2y
5 useful transformations you should know to get the most out of Grafana
Discover five useful transformations in Grafana that can help you better understand your data, including grouping data, organizing fields by name, filtering data by value, sorting data, and partitioning data by values.
14
21
Article
Daily Dose of Data Science | Avi Chawla | Substack·2y
Even Two Outliers Can Distort Your Data Analysis
Outliers can significantly distort the results of data analysis, such as correlation and regression fits, leading to misleading conclusions. Visualizing data through plots like PairPlot is crucial to identify these outliers and validate statistical measures. Manual code reviews are often inefficient, but tools like Sourcery leverage AI to provide instant, human-like code reviews, significantly speeding up the process.
12
22
Article
Towards Data Science·2y
Back to Basics: Databases, SQL, and Other Data-Processing Must-Reads
Relational databases and SQL queries remain vital for daily workflows of data professionals, despite the buzz around LLMs. This post highlights essential reads on maintaining and growing skills in data and ML tasks, emphasizing the interconnectedness of foundational data operations and advanced AI tasks. Featured topics include simplifying Python code for data engineering, learning SQL for data analytics, using pivot tables in SQL, managing Excel charts with VBA, and turning relational databases into graph databases.
12
23
Article
MotherDuck·2y
All-in-SQL Hybrid Search in DuckDB: Integrating Full Text and Embedding Methods
This post explores integrating Full Text Search (FTS) and Embedding Search to create a Hybrid Search system in DuckDB. It details the methods and SQL implementations used to combine these search techniques, focusing on the need for exact keyword matching and semantic understanding. The post also covers how to rank documents using Reciprocal Ranked Fusion and Convex Combination metrics, providing examples using the Kaggle Movies dataset.
11
24
Article
Hacker News·2y
qStudio Release Version 3.0
qStudio 3.0 is the leading SQL Editor for data analysis. It includes powerful local qDuckDB and AI features like Text2SQL. It also introduces Pulse-Pivot for pivoting data.
11
25
Article
Medium·2y
What is a Data Analyst? Everything You Need to Know
Learn what a data analyst is, their role in businesses, and the skills required to become one. Discover the data analysis process and the difference between a data analyst and other data roles.
11

See all Data Analysis archives