Best of CloudOctober 2025

  1. 1
    Video
    Avatar of fireshipFireship·31w

    US-EAST-1 is humanity’s weakest link…

    A major AWS outage in the US-EAST-1 region caused widespread service disruptions across thousands of companies including Netflix, Reddit, and PlayStation. The root cause was a DNS resolution failure affecting API endpoints, particularly DynamoDB, which cascaded into serverless job queues. The incident highlights the risks of centralized cloud infrastructure dependency and the challenges of single-provider reliance even with availability zones designed for redundancy.

  2. 2
    Article
    Avatar of infoqInfoQ·30w

    Cloudflare Introduces Email Service to Compete with Amazon SES, Resend, and SendGrid

    Cloudflare announced a private preview of its Email Service during Birthday Week, enabling developers to send and receive emails directly from Workers without API keys. The globally managed service automatically configures SPF, DKIM, and DMARC for improved deliverability, supports both REST APIs and SMTP, and integrates with Workers AI for routing and parsing incoming emails. Unlike regional services like Amazon SES, Cloudflare offers a single global endpoint. The beta launches in November with message-based pricing and requires a paid Workers subscription.

  3. 3
    Article
    Avatar of 80lv80 LEVEL·31w

    Amazon Allegedly Replaced 40% of AWS DevOps With AI Days Before Crash

    AWS experienced a major outage affecting platforms like Snapchat, Roblox, and Fortnite. An unverified report claims Amazon laid off 40% of its DevOps team days before the crash, replacing them with AI systems that handle IAM permissions, VPC configs, and Lambda deployments. While the connection between layoffs and the outage remains speculative, the incident highlights concerns about cloud service provider concentration and automation risks.

  4. 4
    Article
    Avatar of jakartaeeJakarta EE·30w

    The Dark Side of IT: How US-EAST-1 Took Europe Offline and Why GDPR is in the Crosshairs

    An AWS US-EAST-1 outage in October 2025 took down European digital services despite companies believing their infrastructure was EU-only. The incident exposed hidden architectural dependencies where critical services like IAM, authentication, and control planes route through Virginia data centers. European banks, healthcare providers, and government agencies experienced severe disruptions. The analysis examines GDPR compliance failures, Schrems II implications, and how cross-border data flows occur without user notification. CIOs are advised to map control-plane dependencies, review AWS contracts for regional sovereignty gaps, and prepare for regulatory scrutiny as European data protection authorities investigate cloud provider compliance.

  5. 5
    Article
    Avatar of growwenggGroww Engineering·32w

    A Framework for Cloud Cost Optimization: How We Saved 40% of our Cloud cost

    Groww Engineering reduced their cloud costs by 40% over three months through a systematic framework combining visibility, ownership, and architectural changes. They built an internal FinOps dashboard for granular cost tracking, standardized resource labeling across teams, shifted from fixed to elastic infrastructure, deprecated legacy services, migrated analytics to an in-house query engine, and established continuous optimization practices with team-level budgets and regular audits.

  6. 6
    Article
    Avatar of hnHacker News·31w

    Today is when Amazon brain drain finally caught up with AWS

    A major AWS outage in the US-EAST-1 region on October 20, 2025, caused by DNS resolution issues with DynamoDB endpoints, took 75 minutes to diagnose and affected much of the internet. The incident highlights concerns about AWS's loss of institutional knowledge due to significant employee departures, layoffs (27,000+ since 2022), and high regretted attrition rates (69-81%). Senior engineers who understood deep system failure modes have left, potentially leaving newer teams without the tribal knowledge needed to quickly detect and resolve complex infrastructure issues. The outage suggests that cost-cutting measures and talent drain may be compromising AWS's operational resilience.

  7. 7
    Article
    Avatar of wheresyouredWhere's Your Ed At·31w

    This Is How Much Anthropic and Cursor Spend On Amazon Web Services

    Anthropic spent $2.66 billion on AWS through September 2025, exceeding its estimated $2.55 billion revenue for the same period. The company's AWS costs increased 174% from January to September 2025, consuming 88-227% of monthly revenue depending on the period. Cursor, Anthropic's largest customer, saw its AWS bills double from $6.2M to $12.6M in June 2025 after Anthropic introduced Priority Service Tiers that significantly increased costs for prompt caching. The analysis reveals that AI model providers' operational costs scale linearly with revenue, suggesting current pricing models are unsustainable without dramatic price increases that could drive away customers.

  8. 8
    Article
    Avatar of infoworldInfoWorld·31w

    AWS DNS error hits DynamoDB, causing problems for multiple services and customers

    A DNS resolution error in AWS's US-EAST-1 region caused widespread DynamoDB API failures, affecting multiple AWS services and customers including Perplexity, Canva, Venmo, and others. The incident began shortly after midnight Pacific Time and was resolved within three hours through initial mitigations. The outage highlighted how single points of failure in cloud infrastructure can have global consequences, even when the root cause is isolated to one region.

  9. 9
    Article
    Avatar of lastweekinawsThe Last Week in AWS·32w

    AWS Deprecates Two Dozen Services (Most of Which You’ve Never Heard Of)

    AWS has deprecated approximately two dozen services in its quarterly cleanup, including 19 services entering maintenance mode, four being sunset, and one reaching end of support. Notable deprecations include Glacier APIs (the S3 storage class remains), S3 Object Lambda, CodeCatalyst, and Snowball Edge. Most deprecated services were commercial failures that never gained significant traction. Glacier's API removal is largely inconsequential since it's now an S3 storage class. CodeCatalyst failed to gain momentum after launch. Snowball Edge customers can continue using existing deployments but shouldn't plan new architectures around it. Many modernization tools are being consolidated into AWS Transform, while Systems Manager components are being wound down in favor of third-party alternatives.

  10. 10
    Article
    Avatar of thevergeThe Verge·30w

    ‘There isn’t really another choice:’ Signal chief explains why the encrypted messenger relies on AWS

    Signal president Meredith Whittaker defends the encrypted messenger's reliance on AWS following a major outage, explaining that AWS, Microsoft Azure, and Google Cloud are the only viable options for providing global-scale, low-latency communication services. She emphasizes that the real issue isn't Signal's choice, but the concentration of power among 3-4 cloud infrastructure providers, making it practically impossible for services to avoid dependency on these hyperscalers without spending billions to build their own infrastructure.

  11. 11
    Article
    Avatar of hnHacker News·33w

    Battering RAM

    Researchers demonstrate a $50 hardware interposer that bypasses memory encryption on Intel SGX and AMD SEV-SNP cloud processors. The device sits between processor and DDR4 memory, passing boot-time security checks before activating to redirect encrypted memory addresses. This enables plaintext access to protected workloads and breaks attestation on fully patched systems. The attack exposes fundamental limitations in current scalable memory encryption designs, which lack cryptographic freshness guarantees. Open-source schematics are available, and both Intel and AMD have acknowledged the findings but consider physical DRAM attacks out of scope for current products.

  12. 12
    Article
    Avatar of infoworldInfoWorld·31w

    Anthropic extends Claude Code to browsers

    Anthropic launched Claude Code on the web as a beta research preview for Pro and Max users, allowing developers to use the AI coding assistant directly from browsers or smartphones without a terminal. The service runs coding tasks on Anthropic-managed cloud infrastructure in isolated sandboxes, supports parallel task execution across repositories, handles Git interactions via secure proxy, and automatically creates pull requests. It's particularly effective for repository mapping, routine tasks, bug fixes, and back-end changes with test-driven development.

  13. 13
    Article
    Avatar of techworld-with-milanTech World With Milan·29w

    How Google, Amazon, and CrowdStrike broke millions of systems

    Deep dive into three major 2025 cloud outages: AWS's DNS race condition that cascaded through 113 services for 15 hours, Google Cloud's null pointer exception in Service Control that crashed 50+ services globally for 7 hours, and CrowdStrike's kernel driver bug that locked 8.5 million Windows machines in boot loops. Each incident reveals critical lessons about race conditions, dependency chains, deployment strategies, and the fragility of centralized control planes at hyperscale. Includes technical root cause analysis, cascading failure patterns, and actionable takeaways for building resilient distributed systems.

  14. 14
    Article
    Avatar of 404Deleted user·30w

    Knock knock

  15. 15
    Article
    Avatar of arangoArangoDB·32w

    ArangoDB: Multi-Model Database for Your Modern Apps

    ArangoDB is a multi-model database that combines graph, document, key-value, and search capabilities in a single system with a unified query language (AQL). It offers flexible deployment options including fully-managed cloud service (ArangoGraph), on-premises, and Kubernetes support across major cloud platforms. The database provides native client libraries for multiple programming languages and emphasizes database consolidation by replacing multiple specialized databases with one unified solution.

  16. 16
    Article
    Avatar of wheresyouredWhere's Your Ed At·29w

    Big Tech Needs $2 Trillion In AI Revenue By 2030 or They Wasted Their Capex

    Major tech companies have spent over $776 billion on AI infrastructure between 2023-2025, yet none are showing meaningful revenue from AI services. Microsoft reports only $13 billion annual recurring revenue from AI, with much of Azure's AI revenue coming from OpenAI's discounted compute costs. Analysis suggests these companies need to generate $2 trillion in AI revenue by 2030 to justify their capital expenditures, while currently every AI service provider except GPU manufacturers is losing money. The high costs of GPUs, data centers, and rapid hardware depreciation compound the challenge of achieving profitability.

  17. 17
    Article
    Avatar of hnHacker News·29w

    $5 PlanetScale — PlanetScale

    PlanetScale announces a new $5/month single-node tier for their Postgres database service, making it more accessible for developers on day one. The PS-5 node type offers a non-HA configuration suitable for development, testing, and non-critical workloads, while maintaining the ability to vertically scale. This complements their existing $30/month three-node HA setup, allowing teams to start small and scale to production without platform migrations.

  18. 18
    Video
    Avatar of techwithlucyTech With Lucy·32w

    Cloud Layoffs explained…

    Tech companies are reducing traditional engineering roles while investing heavily in AI infrastructure, with predictions that 40% of companies will see AI-related workforce reductions by 2030. Major tech leaders like Microsoft and Meta report significant portions of their code now being written by AI, creating what's called the AI paradox where the industry grows while traditional roles shrink. The cloud industry is transforming rather than dying, requiring engineers to adapt their skills to remain relevant.

  19. 19
    Article
    Avatar of softwaretestingmagazineSoftware Testing Magazine·30w

    Free Load Testing Tools & Services

    A comprehensive comparison of free load testing tools and services available in 2025. Covers 10 platforms including Blazemeter, Grafana Cloud k6, Loader.io, and others, detailing their limitations on concurrent users, test duration, and features. Each service offers different constraints on virtual users (ranging from 10 to 50), test durations (30 seconds to 20 minutes), and capabilities. Most tools are JMeter-compatible and cloud-based, designed for testing web applications, APIs, and mobile apps under simulated user load from multiple geographic locations.