A new study reveals that some of OpenAI's models, including GPT-4 and GPT-3.5, have memorized portions of copyrighted content, such as fiction books and news articles. The study, conducted by researchers from the University of Washington, the University of Copenhagen, and Stanford, introduces a method to identify 'memorized' data in AI models. This finding lends support to ongoing legal cases against OpenAI, which has defended its use of data based on 'fair use.' The study emphasizes the need for more transparency in AI training data.

3m read timeFrom techcrunch.com
Post cover image

Sort: