Discover more about what's new at AWS with AWS Parallel Computing Service now supports Slurm 25.11

AWS' platform is a leading cloud computing platform, offering insights into cloud infrastructure, services, and solutions for developers, businesses, and IT professionals. Through articles, whitepapers, and documentation, AWS offers insights into cloud architecture, serverless computing, and machine learning on AWS. Developers and architects can learn about AWS services like EC2, S3, Lambda, and more to build scalable, secure, and cost-effective cloud applications.

AWS Parallel Computing Service (AWS PCS) now supports Slurm 25.11, bringing several new capabilities to HPC workloads on AWS. Key additions include expedited re-queue for automatic job rescheduling when nodes fail, a Prometheus-compatible OpenMetrics endpoint for real-time monitoring of jobs and nodes, and expanded logging options. Slurm daemon logs (slurmdbd and slurmrestd) can now be sent to Amazon CloudWatch Logs, S3, or Data Firehose. Scheduler audit logs are now a dedicated log type, giving independent control over ingestion and storage costs. These features are available in all AWS regions where PCS is supported.