Explore Apache Spark, a distributed computing framework for big data processing, and learn about data analytics, machine learning, and stream processing. Learn about Spark architecture, RDDs, and Spark SQL. Whether you're a data engineer, data scientist, or big data enthusiast,  this category provides tips about using Spark for large-scale data processing.

Apache Spark

Deploy an on-premise data hub with Canonical MAAS, Spark, Kubernetes and Ceph

Apache Hadoop and Apache Spark for Big Data Analysis

Amazon EMR Serverless introduces Shuffle-optimized disks delivering improved performance for I/O intensive workloads

Amazon EMR on EKS now supports Apache Livy

Understanding Distributed Computing

Iris - Turning observations into actionable insights for enhanced decision making

Cost Optimization Strategies for scalable Data Lakehouse

Enhancing Data Security with Spark: A Guide to Column-Level Encryption - Part 1

Sentiment Analysis of Yelp Restaurants Reviews in Real-Time

Enabling near real-time data analytics on the data lake