Allowing users to select their own LLM models often degrades quality and consistency. Most users lack the evaluation data and benchmarking infrastructure to make informed model choices, leading to suboptimal results based on hype rather than performance. Different models interpret prompts differently, causing workflow

Sort: