Finding the right model and prompt for your AI feature is harder than it looks. Spreadsheets help, until they don’t. So we did something about it.

The RubyLA blog offers insights, tutorials, and community updates for Ruby developers in Los Angeles and beyond. Developers can explore Ruby language features, web development frameworks, and best practices for building Ruby applications. Additionally, the blog covers Ruby meetups, events, and job opportunities in the Los Angeles area, fostering collaboration and networking within the local Ruby community.

RUBYLAND

Iterating on LLM prompts requires systematic evaluation across models, providers, and configurations. Spreadsheets work initially but fragment across teams, lack structure, and disconnect from code. RubyLLM::Evals is a Rails engine that manages prompt configurations, test samples, and evaluation runs within your application. It supports multiple evaluation types including LLM-as-judge, uses real production data for testing, tracks accuracy and costs across iterations, and keeps prompts versioned in the database for production use.

Evaluating LLM prompts in Rails