Why Meta stole millions of books to train AI

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Meta downloaded 82 terabytes of pirated books from shadow libraries like LibGen to train their Llama AI model, despite legal concerns from engineers. After publishers refused licensing deals deemed too expensive and slow, Meta chose piracy over falling behind competitors like OpenAI and Google. The pirated data improved Llama's performance by 5%, leading to 800 more correct answers. Meta covered their tracks by masking IP addresses and removing copyright tags, while relying on a fair use legal defense strategy shared across the AI industry when facing inevitable lawsuits from authors and publishers.

4m watch time
16 Comments

Sort: