Ray is a distributed execution engine for AI workloads that lets Python developers scale from a single machine to hundreds of GPUs with minimal code changes. Founded at UC Berkeley's RISE Lab by the same lineage that produced Apache Spark, Ray was originally built for reinforcement learning research, faded when RL hit a wall, then surged back when OpenAI used it to train GPT-3 and post-training with RL became central to LLMs like ChatGPT. The framework is organized in layers: Ray Core provides distributed primitives (tasks and actors), while higher-level libraries like Ray Data, Ray Train, Ray Tune, Ray Serve, and RLlib handle specific workloads. Ray Data excels at heterogeneous CPU+GPU pipelines for multimodal data. Deployment options range from a lightweight cluster launcher to KubeRay (a Kubernetes operator) to the managed AnyScale platform. The episode covers observability via the Ray Dashboard, a VS Code remote debugger, fast iteration via runtime environments, and how Ray compares to Dask, Spark, multiprocessing, and asyncio.
Sort: