LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

We are a community of AI/ ML/Generative AI enthusiasts/researchers/journalists/writers who share interesting news and articles about the applications of AI. 

Machine Learning News

Large language models (LLMs) face challenges in handling real conversations where information is disclosed gradually across multiple turns. Microsoft and Salesforce researchers have found a significant 39% performance drop in multi-turn tasks due to increased unreliability when instructions are underspecified. The study used SHARDED simulation methods to evaluate LLMs on tasks like coding and math problems, revealing a need for improved reliability in adapting to evolving dialogue.