A new study appears to lend credence to allegations that OpenAI trained at least some of its AI models on copyrighted content.

TechCrunch (TC) is a leading technology news and media site that covers the latest trends, startups, and innovations in the tech industry. With breaking news,  analysis, and expert commentary, TechCrunch provides  insights into the world of technology and entrepreneurship. Developers can learn about emerging technologies, funding opportunities, and market trends by following TechCrunch's coverage of the tech industry.

TechCrunch

A new study reveals that some of OpenAI's models, including GPT-4 and GPT-3.5, have memorized portions of copyrighted content, such as fiction books and news articles. The study, conducted by researchers from the University of Washington, the University of Copenhagen, and Stanford, introduces a method to identify 'memorized' data in AI models. This finding lends support to ongoing legal cases against OpenAI, which has defended its use of data based on 'fair use.' The study emphasizes the need for more transparency in AI training data.

OpenAI’s models ‘memorized’ copyrighted content, new study suggests