Don't let one ambitious user trigger a denial of service.

Thoughtbot's platform is a central hub for software developers and product designers, offering insights into product development, design thinking, and agile methodologies. Through articles, case studies, and design sprints, Thoughtbot offers insights into user research, prototyping, and usability testing. Readers can learn about lean product development, iterative design processes, and cross-functional collaboration to create user-centric digital products and services.

thoughbot

When building a chatbot on top of OpenAI, a single user can exhaust your organization's rate limits across both requests per minute (RPM) and tokens per minute (TPM), effectively causing a denial of service for all other users. A practical solution involves tracking per-user RPM and TPM using Redis cache with increment/expire operations. Token counts are read directly from OpenAI's response object (via RubyLLM) and accumulated per user per minute. Per-user limits can be derived by dividing the org-level limit by expected concurrent users with a safety buffer. For better UX, requests can be queued as background jobs with retry logic instead of being immediately rejected when limits are hit.

Your chat bot needs a better rate limit strategy

If you enjoyed this post, you might also like: