Large language models (LLMs) face challenges in handling real conversations where information is disclosed gradually across multiple turns. Microsoft and Salesforce researchers have found a significant 39% performance drop in multi-turn tasks due to increased unreliability when instructions are underspecified. The study used
Sort: