Best of AWSOctober 2025

  1. 1
    Article
    Avatar of allthingsdistributedAll Things Distributed·31w

    Development gets better with Age

    Experience in software development provides invaluable perspective when evaluating new technologies like generative AI. Seasoned developers recognize recurring patterns across decades - from programming languages to platforms - and apply this wisdom to cut through hype. Rather than rushing to adopt AI due to FOMO, experienced builders focus on understanding customer problems first, then selecting appropriate solutions. The key lessons: maintain healthy skepticism, prioritize fundamentals like security and privacy, and remember that new technologies often follow familiar patterns from the past.

  2. 2
    Video
    Avatar of codeheadCodeHead·30w

    Should YOU Become A Devops Engineer

    DevOps engineering combines development and operations to automate software delivery through CI/CD pipelines, containerization, and infrastructure management. The role requires skills in Linux, scripting, cloud platforms, and tools like Docker, Kubernetes, and Terraform. DevOps engineers earn competitive salaries (up to $190,000+) due to specialized expertise, but the work focuses on infrastructure, automation, and system stability rather than product development. Success requires enjoying problem-solving, automation, and behind-the-scenes technical work.

  3. 3
    Video
    Avatar of fireshipFireship·29w

    US-EAST-1 is humanity’s weakest link…

    A major AWS outage in the US-EAST-1 region caused widespread service disruptions across thousands of companies including Netflix, Reddit, and PlayStation. The root cause was a DNS resolution failure affecting API endpoints, particularly DynamoDB, which cascaded into serverless job queues. The incident highlights the risks of centralized cloud infrastructure dependency and the challenges of single-provider reliance even with availability zones designed for redundancy.

  4. 4
    Article
    Avatar of infoqInfoQ·28w

    Cloudflare Introduces Email Service to Compete with Amazon SES, Resend, and SendGrid

    Cloudflare announced a private preview of its Email Service during Birthday Week, enabling developers to send and receive emails directly from Workers without API keys. The globally managed service automatically configures SPF, DKIM, and DMARC for improved deliverability, supports both REST APIs and SMTP, and integrates with Workers AI for routing and parsing incoming emails. Unlike regional services like Amazon SES, Cloudflare offers a single global endpoint. The beta launches in November with message-based pricing and requires a paid Workers subscription.

  5. 5
    Article
    Avatar of bytebytegoByteByteGo·31w

    How Airbnb Runs Distributed Databases on Kubernetes at Scale

    Airbnb deployed distributed SQL databases across multiple Kubernetes clusters, each mapped to a different AWS Availability Zone, to achieve high availability and fault tolerance. They built custom Kubernetes operators to safely manage stateful workloads, coordinate node replacements, and maintain quorum during failures. Using AWS EBS for persistent storage, PVCs for volume management, and techniques like replica reads and stale reads, they mitigated latency issues while maintaining consistency. Their largest production cluster handles 3 million queries per second across 150 nodes with 300TB of data, achieving 99.95% availability through careful sequencing of upgrades, canary deployments, and overprovisioning for resilience.

  6. 6
    Article
    Avatar of 80lv80 LEVEL·29w

    Amazon Allegedly Replaced 40% of AWS DevOps With AI Days Before Crash

    AWS experienced a major outage affecting platforms like Snapchat, Roblox, and Fortnite. An unverified report claims Amazon laid off 40% of its DevOps team days before the crash, replacing them with AI systems that handle IAM permissions, VPC configs, and Lambda deployments. While the connection between layoffs and the outage remains speculative, the incident highlights concerns about cloud service provider concentration and automation risks.

  7. 7
    Article
    Avatar of jakartaeeJakarta EE·28w

    The Dark Side of IT: How US-EAST-1 Took Europe Offline and Why GDPR is in the Crosshairs

    An AWS US-EAST-1 outage in October 2025 took down European digital services despite companies believing their infrastructure was EU-only. The incident exposed hidden architectural dependencies where critical services like IAM, authentication, and control planes route through Virginia data centers. European banks, healthcare providers, and government agencies experienced severe disruptions. The analysis examines GDPR compliance failures, Schrems II implications, and how cross-border data flows occur without user notification. CIOs are advised to map control-plane dependencies, review AWS contracts for regional sovereignty gaps, and prepare for regulatory scrutiny as European data protection authorities investigate cloud provider compliance.

  8. 8
    Article
    Avatar of allthingsdistributedAll Things Distributed·28w

    What is USSD (and who cares)?

    USSD, a 30-year-old messaging protocol requiring only 2G connectivity, powers hundreds of billions in financial transactions across Sub-Saharan Africa through companies like M-Pesa and Moniepoint. Behind simple menu-driven interfaces on feature phones, these platforms run sophisticated cloud architectures with ML-powered fraud detection and IoT systems. The technology demonstrates how builders solve real customer problems by choosing suitable tools over shiny ones, creating profitable businesses while serving communities with limited internet access and smartphone penetration.

  9. 9
    Article
    Avatar of bytebytegoByteByteGo·28w

    How Nubank Built an In-house Logging Platform for 1 Trillion Log Entries

    Nubank built an in-house logging platform to replace a costly third-party vendor, handling 1 trillion daily log entries at 50% lower cost. The solution uses a two-phase architecture: an ingestion pipeline with Fluent Bit, custom buffering, and processing services, plus a query/storage layer combining Trino, AWS S3, and Parquet format. The platform processes 1 petabyte daily, maintains 45 petabytes of searchable data with 45-day retention, and serves 15,000 queries daily scanning 150 petabytes. Key design decisions included decoupling ingestion from querying, implementing micro-batching for reliability, and achieving 95% data compression with Parquet.

  10. 10
    Article
    Avatar of hnHacker News·29w

    Today is when Amazon brain drain finally caught up with AWS

    A major AWS outage in the US-EAST-1 region on October 20, 2025, caused by DNS resolution issues with DynamoDB endpoints, took 75 minutes to diagnose and affected much of the internet. The incident highlights concerns about AWS's loss of institutional knowledge due to significant employee departures, layoffs (27,000+ since 2022), and high regretted attrition rates (69-81%). Senior engineers who understood deep system failure modes have left, potentially leaving newer teams without the tribal knowledge needed to quickly detect and resolve complex infrastructure issues. The outage suggests that cost-cutting measures and talent drain may be compromising AWS's operational resilience.

  11. 11
    Article
    Avatar of wheresyouredWhere's Your Ed At·29w

    This Is How Much Anthropic and Cursor Spend On Amazon Web Services

    Anthropic spent $2.66 billion on AWS through September 2025, exceeding its estimated $2.55 billion revenue for the same period. The company's AWS costs increased 174% from January to September 2025, consuming 88-227% of monthly revenue depending on the period. Cursor, Anthropic's largest customer, saw its AWS bills double from $6.2M to $12.6M in June 2025 after Anthropic introduced Priority Service Tiers that significantly increased costs for prompt caching. The analysis reveals that AI model providers' operational costs scale linearly with revenue, suggesting current pricing models are unsustainable without dramatic price increases that could drive away customers.

  12. 12
    Article
    Avatar of infoworldInfoWorld·29w

    AWS DNS error hits DynamoDB, causing problems for multiple services and customers

    A DNS resolution error in AWS's US-EAST-1 region caused widespread DynamoDB API failures, affecting multiple AWS services and customers including Perplexity, Canva, Venmo, and others. The incident began shortly after midnight Pacific Time and was resolved within three hours through initial mitigations. The outage highlighted how single points of failure in cloud infrastructure can have global consequences, even when the root cause is isolated to one region.

  13. 13
    Article
    Avatar of lastweekinawsThe Last Week in AWS·30w

    AWS Deprecates Two Dozen Services (Most of Which You’ve Never Heard Of)

    AWS has deprecated approximately two dozen services in its quarterly cleanup, including 19 services entering maintenance mode, four being sunset, and one reaching end of support. Notable deprecations include Glacier APIs (the S3 storage class remains), S3 Object Lambda, CodeCatalyst, and Snowball Edge. Most deprecated services were commercial failures that never gained significant traction. Glacier's API removal is largely inconsequential since it's now an S3 storage class. CodeCatalyst failed to gain momentum after launch. Snowball Edge customers can continue using existing deployments but shouldn't plan new architectures around it. Many modernization tools are being consolidated into AWS Transform, while Systems Manager components are being wound down in favor of third-party alternatives.

  14. 14
    Article
    Avatar of kiroKiro·29w

    The wait(list) is over, get started with Kiro today

    Kiro, an AI-powered IDE from AWS, has removed its waitlist and is now publicly available. New users receive 500 free bonus credits (50% of the Pro plan) valid for 30 days. Version 0.4.0 introduces spec-driven development features including optional MVP tasks, per-prompt credit consumption visibility, dev server integration, and the ability to reference specs as context. The IDE uses a unified credit system that meters usage in 0.01 increments, with different models consuming credits at varying rates.

  15. 15
    Article
    Avatar of infoqInfoQ·31w

    AWS Introduces M4 and M4 Pro Mac Instances for Faster Apple App Development

    AWS launched M4 and M4 Pro Mac instances powered by Apple's latest M4 silicon, offering up to 20% better build performance compared to M2 instances. The M4 variant features a 10-core CPU with 24 GB unified memory, while the M4 Pro includes a 14-core CPU with 48 GB memory. Both provide 2 TB local storage and are designed for building, testing, and signing iOS and macOS applications with Xcode. The instances are available as dedicated hosts with per-second billing but require a 24-hour minimum allocation period. Currently available only in US regions (Northern Virginia and Ohio), they support macOS Sequoia 15.6 and newer, though they come at a higher price point than previous generations.

  16. 16
    Article
    Avatar of thevergeThe Verge·28w

    ‘There isn’t really another choice:’ Signal chief explains why the encrypted messenger relies on AWS

    Signal president Meredith Whittaker defends the encrypted messenger's reliance on AWS following a major outage, explaining that AWS, Microsoft Azure, and Google Cloud are the only viable options for providing global-scale, low-latency communication services. She emphasizes that the real issue isn't Signal's choice, but the concentration of power among 3-4 cloud infrastructure providers, making it practically impossible for services to avoid dependency on these hyperscalers without spending billions to build their own infrastructure.

  17. 17
    Article
    Avatar of freecodecampfreeCodeCamp·29w

    How to Build a Full-Stack Serverless CRUD App using AWS and React

    A comprehensive guide to building a serverless coffee shop management system using AWS services. Covers setting up DynamoDB for data storage, creating Lambda functions with reusable layers, exposing APIs through API Gateway, implementing authentication with Cognito, and deploying a React frontend via S3 and CloudFront. Includes detailed steps for configuring IAM roles, handling CORS, creating CRUD operations, and troubleshooting common deployment issues.

  18. 18
    Article
    Avatar of techworld-with-milanTech World With Milan·27w

    How Google, Amazon, and CrowdStrike broke millions of systems

    Deep dive into three major 2025 cloud outages: AWS's DNS race condition that cascaded through 113 services for 15 hours, Google Cloud's null pointer exception in Service Control that crashed 50+ services globally for 7 hours, and CrowdStrike's kernel driver bug that locked 8.5 million Windows machines in boot loops. Each incident reveals critical lessons about race conditions, dependency chains, deployment strategies, and the fragility of centralized control planes at hyperscale. Includes technical root cause analysis, cascading failure patterns, and actionable takeaways for building resilient distributed systems.

  19. 19
    Article
    Avatar of 404Deleted user·27w

    Knock knock

  20. 20
    Article
    Avatar of stackovStack Overflow Blog·27w

    Vibe coding needs a spec, too

    Spec-driven development is emerging as a structured approach to AI-assisted coding, where developers write specifications that AI agents translate into code. AWS's Kiro IDE implements this methodology by breaking development into three phases: requirements, design, and tasks. Senior engineers are adopting this approach faster because it mirrors their natural problem-solving process of whiteboarding and documenting before coding. The shift emphasizes systems thinking and critical problem decomposition over raw coding skills, as AI handles implementation while developers focus on architecture and context. Kiro enforces test-driven development and uses techniques like neuro-symbolic AI for spec validation, addressing challenges around context management and hallucination reduction.

  21. 21
    Article
    Avatar of clickhouseClickHouse·30w

    Inside Laravel Nightwatch’s Observability Pipeline: Real-Time Event Processing with Amazon MSK and ClickHouse Cloud

    Laravel Nightwatch processes over 1 billion observability events daily using Amazon MSK and ClickHouse Cloud. The platform combines MSK Express brokers for event ingestion, ClickPipes for streaming data to ClickHouse, and AWS Lambda for validation. This architecture achieves sub-second query latency while handling millions of events per second. On launch day, the system processed 500 million events with 97ms average dashboard latency for 5,300 users. The dual-database design separates transactional workloads on RDS PostgreSQL from analytical workloads on ClickHouse Cloud, enabling horizontal scaling and cost-effective real-time monitoring at global scale.