Best of Cloud — October 2025

1
Video
Fireship·31w
US-EAST-1 is humanity’s weakest link…
A major AWS outage in the US-EAST-1 region caused widespread service disruptions across thousands of companies including Netflix, Reddit, and PlayStation. The root cause was a DNS resolution failure affecting API endpoints, particularly DynamoDB, which cascaded into serverless job queues. The incident highlights the risks of centralized cloud infrastructure dependency and the challenges of single-provider reliance even with availability zones designed for redundancy.
120
18
2
Article
InfoQ·30w
Cloudflare Introduces Email Service to Compete with Amazon SES, Resend, and SendGrid
Cloudflare announced a private preview of its Email Service during Birthday Week, enabling developers to send and receive emails directly from Workers without API keys. The globally managed service automatically configures SPF, DKIM, and DMARC for improved deliverability, supports both REST APIs and SMTP, and integrates with Workers AI for routing and parsing incoming emails. Unlike regional services like Amazon SES, Cloudflare offers a single global endpoint. The beta launches in November with message-based pricing and requires a paid Workers subscription.
74
9
3
Article
80 LEVEL·31w
Amazon Allegedly Replaced 40% of AWS DevOps With AI Days Before Crash
AWS experienced a major outage affecting platforms like Snapchat, Roblox, and Fortnite. An unverified report claims Amazon laid off 40% of its DevOps team days before the crash, replacing them with AI systems that handle IAM permissions, VPC configs, and Lambda deployments. While the connection between layoffs and the outage remains speculative, the incident highlights concerns about cloud service provider concentration and automation risks.
64
11
4
Article
Jakarta EE·30w
The Dark Side of IT: How US-EAST-1 Took Europe Offline and Why GDPR is in the Crosshairs
An AWS US-EAST-1 outage in October 2025 took down European digital services despite companies believing their infrastructure was EU-only. The incident exposed hidden architectural dependencies where critical services like IAM, authentication, and control planes route through Virginia data centers. European banks, healthcare providers, and government agencies experienced severe disruptions. The analysis examines GDPR compliance failures, Schrems II implications, and how cross-border data flows occur without user notification. CIOs are advised to map control-plane dependencies, review AWS contracts for regional sovereignty gaps, and prepare for regulatory scrutiny as European data protection authorities investigate cloud provider compliance.
62
7
5
Article
Groww Engineering·32w
A Framework for Cloud Cost Optimization: How We Saved 40% of our Cloud cost
Groww Engineering reduced their cloud costs by 40% over three months through a systematic framework combining visibility, ownership, and architectural changes. They built an internal FinOps dashboard for granular cost tracking, standardized resource labeling across teams, shifted from fixed to elastic infrastructure, deprecated legacy services, migrated analytics to an in-house query engine, and established continuous optimization practices with team-level budgets and regular audits.
54
6
Article
Hacker News·31w
Today is when Amazon brain drain finally caught up with AWS
A major AWS outage in the US-EAST-1 region on October 20, 2025, caused by DNS resolution issues with DynamoDB endpoints, took 75 minutes to diagnose and affected much of the internet. The incident highlights concerns about AWS's loss of institutional knowledge due to significant employee departures, layoffs (27,000+ since 2022), and high regretted attrition rates (69-81%). Senior engineers who understood deep system failure modes have left, potentially leaving newer teams without the tribal knowledge needed to quickly detect and resolve complex infrastructure issues. The outage suggests that cost-cutting measures and talent drain may be compromising AWS's operational resilience.
43
6
7
Article
Where's Your Ed At·31w
This Is How Much Anthropic and Cursor Spend On Amazon Web Services
Anthropic spent $2.66 billion on AWS through September 2025, exceeding its estimated $2.55 billion revenue for the same period. The company's AWS costs increased 174% from January to September 2025, consuming 88-227% of monthly revenue depending on the period. Cursor, Anthropic's largest customer, saw its AWS bills double from $6.2M to $12.6M in June 2025 after Anthropic introduced Priority Service Tiers that significantly increased costs for prompt caching. The analysis reveals that AI model providers' operational costs scale linearly with revenue, suggesting current pricing models are unsustainable without dramatic price increases that could drive away customers.
36
5
8
Article
InfoWorld·31w
AWS DNS error hits DynamoDB, causing problems for multiple services and customers
A DNS resolution error in AWS's US-EAST-1 region caused widespread DynamoDB API failures, affecting multiple AWS services and customers including Perplexity, Canva, Venmo, and others. The incident began shortly after midnight Pacific Time and was resolved within three hours through initial mitigations. The outage highlighted how single points of failure in cloud infrastructure can have global consequences, even when the root cause is isolated to one region.
36
2
9
Article
The Last Week in AWS·32w
AWS Deprecates Two Dozen Services (Most of Which You’ve Never Heard Of)
AWS has deprecated approximately two dozen services in its quarterly cleanup, including 19 services entering maintenance mode, four being sunset, and one reaching end of support. Notable deprecations include Glacier APIs (the S3 storage class remains), S3 Object Lambda, CodeCatalyst, and Snowball Edge. Most deprecated services were commercial failures that never gained significant traction. Glacier's API removal is largely inconsequential since it's now an S3 storage class. CodeCatalyst failed to gain momentum after launch. Snowball Edge customers can continue using existing deployments but shouldn't plan new architectures around it. Many modernization tools are being consolidated into AWS Transform, while Systems Manager components are being wound down in favor of third-party alternatives.
32
4
10
Article
The Verge·30w
‘There isn’t really another choice:’ Signal chief explains why the encrypted messenger relies on AWS
Signal president Meredith Whittaker defends the encrypted messenger's reliance on AWS following a major outage, explaining that AWS, Microsoft Azure, and Google Cloud are the only viable options for providing global-scale, low-latency communication services. She emphasizes that the real issue isn't Signal's choice, but the concentration of power among 3-4 cloud infrastructure providers, making it practically impossible for services to avoid dependency on these hyperscalers without spending billions to build their own infrastructure.
24
3
11
Article
Hacker News·33w
Battering RAM
Researchers demonstrate a $50 hardware interposer that bypasses memory encryption on Intel SGX and AMD SEV-SNP cloud processors. The device sits between processor and DDR4 memory, passing boot-time security checks before activating to redirect encrypted memory addresses. This enables plaintext access to protected workloads and breaks attestation on fully patched systems. The attack exposes fundamental limitations in current scalable memory encryption designs, which lack cryptographic freshness guarantees. Open-source schematics are available, and both Intel and AMD have acknowledged the findings but consider physical DRAM attacks out of scope for current products.
20
3
12
Article
InfoWorld·31w
Anthropic extends Claude Code to browsers
Anthropic launched Claude Code on the web as a beta research preview for Pro and Max users, allowing developers to use the AI coding assistant directly from browsers or smartphones without a terminal. The service runs coding tasks on Anthropic-managed cloud infrastructure in isolated sandboxes, supports parallel task execution across repositories, handles Git interactions via secure proxy, and automatically creates pull requests. It's particularly effective for repository mapping, routine tasks, bug fixes, and back-end changes with test-driven development.
19
13
Article
Tech World With Milan·29w
How Google, Amazon, and CrowdStrike broke millions of systems
Deep dive into three major 2025 cloud outages: AWS's DNS race condition that cascaded through 113 services for 15 hours, Google Cloud's null pointer exception in Service Control that crashed 50+ services globally for 7 hours, and CrowdStrike's kernel driver bug that locked 8.5 million Windows machines in boot loops. Each incident reveals critical lessons about race conditions, dependency chains, deployment strategies, and the fragility of centralized control planes at hyperscale. Includes technical root cause analysis, cascading failure patterns, and actionable takeaways for building resilient distributed systems.
18
14
Article
Deleted user·30w
Knock knock
17
8
15
Article
ArangoDB·32w
ArangoDB: Multi-Model Database for Your Modern Apps
ArangoDB is a multi-model database that combines graph, document, key-value, and search capabilities in a single system with a unified query language (AQL). It offers flexible deployment options including fully-managed cloud service (ArangoGraph), on-premises, and Kubernetes support across major cloud platforms. The database provides native client libraries for multiple programming languages and emphasizes database consolidation by replacing multiple specialized databases with one unified solution.
16
16
Article
Where's Your Ed At·29w
Big Tech Needs $2 Trillion In AI Revenue By 2030 or They Wasted Their Capex
Major tech companies have spent over $776 billion on AI infrastructure between 2023-2025, yet none are showing meaningful revenue from AI services. Microsoft reports only $13 billion annual recurring revenue from AI, with much of Azure's AI revenue coming from OpenAI's discounted compute costs. Analysis suggests these companies need to generate $2 trillion in AI revenue by 2030 to justify their capital expenditures, while currently every AI service provider except GPU manufacturers is losing money. The high costs of GPUs, data centers, and rapid hardware depreciation compound the challenge of achieving profitability.
12
1
17
Article
Hacker News·29w
$5 PlanetScale — PlanetScale
PlanetScale announces a new $5/month single-node tier for their Postgres database service, making it more accessible for developers on day one. The PS-5 node type offers a non-HA configuration suitable for development, testing, and non-critical workloads, while maintaining the ability to vertically scale. This complements their existing $30/month three-node HA setup, allowing teams to start small and scale to production without platform migrations.
12
18
Video
Tech With Lucy·32w
Cloud Layoffs explained…
Tech companies are reducing traditional engineering roles while investing heavily in AI infrastructure, with predictions that 40% of companies will see AI-related workforce reductions by 2030. Major tech leaders like Microsoft and Meta report significant portions of their code now being written by AI, creating what's called the AI paradox where the industry grows while traditional roles shrink. The cloud industry is transforming rather than dying, requiring engineers to adapt their skills to remain relevant.
12
1
19
Article
Software Testing Magazine·30w
Free Load Testing Tools & Services
A comprehensive comparison of free load testing tools and services available in 2025. Covers 10 platforms including Blazemeter, Grafana Cloud k6, Loader.io, and others, detailing their limitations on concurrent users, test duration, and features. Each service offers different constraints on virtual users (ranging from 10 to 50), test durations (30 seconds to 20 minutes), and capabilities. Most tools are JMeter-compatible and cloud-based, designed for testing web applications, APIs, and mobile apps under simulated user load from multiple geographic locations.
10

See all Cloud archives