Best of AWS — November 2025

1
Article
ByteByteGo·28w
How Disney Hotstar (now JioHotstar) Scaled Its Infra for 60 Million Concurrent Users
Disney+ Hotstar scaled from 25 million to 61 million concurrent users during the 2023 Cricket World Cup through a comprehensive infrastructure overhaul. Key improvements included separating cacheable from non-cacheable APIs at the CDN layer, migrating from self-managed KOPS to Amazon EKS, implementing distributed NAT gateways per subnet, and introducing a Datacenter Abstraction model. This abstraction unified multiple Kubernetes clusters into logical data centers with a centralized Envoy-based API gateway, replacing 200+ individual load balancers. The team also eliminated NodePort limitations by switching to ClusterIP services, standardized service endpoints, and adopted single-manifest deployments. The final architecture distributed 200+ microservices across six optimized EKS clusters, each designed for specific workload types.
117
1
2
Article
Product Hunt·27w
Better Upload: Simple and easy file uploads for React, use your S3 bucket
A lightweight React library for handling file uploads directly to S3-compatible storage services. Designed to minimize setup complexity and avoid unnecessary dependencies while providing direct-to-bucket upload functionality.
113
2
3
Article
Hacker News·29w
Send this article to your friend who still thinks the cloud is a good idea
A developer shares their experience moving projects from AWS to bare-metal servers with Hetzner, achieving 10x cost savings and 2x performance improvement. The piece argues that cloud services like AWS charge excessive markups (10x-100x) compared to renting or buying servers directly, and that most small-to-medium businesses don't need expensive managed cloud services. It challenges common fears about server management, suggesting that with modern tools like AI assistants, managing Linux servers is accessible and cost-effective for most developers.
93
18
4
Article
InfoWorld·29w
Perplexity’s open-source tool to run trillion-parameter models without costly upgrades
Perplexity AI released TransferEngine, an open-source tool that enables trillion-parameter language models to run across different cloud providers' GPU hardware at full speed. The software solves vendor lock-in by creating a universal interface for GPU-to-GPU communication that works on both Nvidia ConnectX and AWS EFA networking protocols. This allows companies to run massive models like DeepSeek V3 and Kimi K2 on older H100 and H200 systems instead of purchasing expensive next-generation hardware. TransferEngine achieves 400 Gbps throughput using RDMA technology and is already powering Perplexity's production AI search engine, handling disaggregated inference, reinforcement learning, and Mixture-of-Experts routing.
59
5
Article
ByteByteGo·28w
EP189: How to Design Good APIs
Covers fundamental principles of API design including idempotency, versioning, resource naming, security, and pagination. Explores big data pipeline architectures across AWS, Azure, and GCP. Provides a structured learning path for AWS services from fundamentals through certifications. Explains RAG application architecture on AWS and compares virtualization approaches from bare metal to containers on VMs.
53
6
Article
PostgreSQL·27w
Autobase 2.5.0 released
Autobase 2.5.0 introduces Expert Mode to its UI, enabling advanced cluster configuration options for experienced users. Key features include a YAML editor for custom parameters, updated cloud provider pricing and instance specifications (Hetzner ARM instances, 4th-gen Intel on AWS/GCP), configurable IOPS and throughput for AWS EBS volumes, and Ansible 12 compatibility. Autobase is an open-source tool for deploying and managing highly available PostgreSQL clusters, automating tasks like deployment, failover, backups, and scaling without requiring deep DBA expertise.
36
1
7
Video
Awesome·27w
The whole internet was down... again...
Recent major outages from Cloudflare and AWS exposed critical vulnerabilities in modern internet infrastructure. While cloud services promised decentralization and resilience, the industry has consolidated around a few vendors using default configurations. Cloudflare's outage was caused by an oversized feature file in their Bot Manager component. The real issue isn't the outages themselves, but the illusion of resilience created by cloud-native tools while actually centralizing failure points. Modern developers increasingly lack the knowledge to build systems that gracefully handle failures, relying instead on configuration wizards and AI assistance.
34
8
Article
Marc Brooker·28w
Why Strong Consistency?
Eventual consistency in database architectures creates significant challenges for both application developers and end users. Common issues include race conditions where newly created resources appear to not exist, complex retry logic requirements, and limitations on read replica effectiveness for read-modify-write operations. Aurora DSQL addresses these problems by providing strongly consistent reads across all replicas while maintaining read scalability, eliminating the need for applications to handle replication lag and routing complexity.
31
2
9
Article
AWS·27w
Build production-ready applications without infrastructure complexity using Amazon ECS Express Mode
Amazon ECS Express Mode is a new capability that automates containerized application deployment with a single command. It handles infrastructure setup including load balancers, auto scaling, networking, and security groups automatically. Developers can deploy production-ready applications using AWS best practices without managing hundreds of configuration parameters. The service provisions ECS clusters, task definitions, Application Load Balancers, and Route 53 domains from one entry point. Available in all AWS Regions with no additional cost beyond standard AWS resource usage.
25
1
10
Video
Be A Better Dev·30w
AWS Explained: The Most Important AWS Services To Know
A comprehensive walkthrough of essential AWS services organized by function: networking (Route 53, CloudFront), storage (S3, EBS, EFS), compute (EC2, Lambda, ECS, Fargate), databases (RDS, DynamoDB, Aurora), security (WAF, Cognito, Certificate Manager), AI/ML (Bedrock, SageMaker), messaging (SNS, SQS, EventBridge), analytics (Athena, EMR, Redshift), monitoring (CloudWatch, X-Ray), and CI/CD (CodeBuild, CodeDeploy, CodePipeline). Uses an e-commerce application as a practical example to demonstrate how these services integrate to build production systems.
19
11
Article
Tech Lead Digest·30w
My AWS Account Got Hacked - Here Is What Happened
A cloud architect shares a detailed account of how their personal AWS account was compromised through an exposed access key in a NextJS application. The attacker created IAM users, launched EC2 instances for crypto-mining, flooded the victim's inbox with spam to hide AWS notifications, and attempted to use SES for phishing. The post walks through the detection process, containment steps, timeline reconstruction using CloudTrail, and root cause analysis. Key lessons include proper secret management, enabling GuardDuty, avoiding root user access, and responding quickly to suspicious activity.
13
3
12
Article
The Last Week in AWS·26w
AWS Finally Lets You Find Your Idle NAT Gateways
AWS Compute Optimizer now identifies idle NAT Gateways, helping users eliminate unnecessary costs. Each idle gateway costs approximately $35/month plus data processing fees. A NAT Gateway is considered idle when it has no active connections, no incoming packets from VPC clients or destinations for 32 days, and isn't associated with a route table. This feature addresses the low-end cost problem of forgotten resources, though high-volume data processing charges remain a separate concern.
12
13
Article
Jobs·30w
The Day Our Database Bill Nearly Sank the Company
A SaaS startup reduced their database costs by 60% through systematic optimization. The team identified missing indexes causing full-table scans on 50M+ row tables, removed unused indexes, archived old logs to S3, and refactored queries. Results included 5x faster queries, 40% lower CPU load, and 50% storage cost reduction. The key takeaway emphasizes database optimization as an ongoing practice requiring regular monitoring of queries, maintaining lean indexes, and proper data lifecycle management.
11
2

See all AWS archives