Researchers have developed a machine learning technique to improve red-teaming for large language models. By training a red-team model to generate diverse prompts that elicit toxic responses from a chatbot, they achieved better coverage and effectiveness compared to human testers and other automated methods. The method provides
•5m read time• From news.mit.edu
Sort: