Many of the top LLMs today are closed source. What if we could discover their internal weights? 
In this video we dive into a recent research paper from Google DeepMind which presents an attack on large language models. The attack targets transformer-based LLMs, that expose log probabilities as part of their API, which includes GPT-4 and PaLM-2. The researchers successfully used the attack to discover internal data about OpenAI models. Part of the extracted data includes the hidden dimension size of gpt-3.5-turbo, and the researchers estimate it would take less than 2,000$ to extract the weights of the embedding projection layer of that model.

Stealing Part of a Production Language Model paper page - https://arxiv.org/abs/2403.06634

Blog post - https://aipapersacademy.com/stealing-part-of-a-production-language-model/

-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - https://aipapersacademy.com/newsletter/

👍 Please like & subscribe if you enjoy this content

We use VideoScribe to edit our videos - https://tidd.ly/44TZEiX (affiliate)
-----------------------------------------------------------------------------------------------

Chapters:
0:00 Introduction
1:13 Attack Targets
2:36 Hidden Dimension Extraction
5:29 Weights Extraction
6:18 Recover Logits From Log Probabilities
8:10 Results

AI Papers Academy

A Google DeepMind research paper demonstrates a model-stealing attack that extracts the embedding projection layer (the final layer) from closed-source LLMs like GPT-4 and PaLM-2 using only standard API access. By exploiting APIs that expose log probabilities or logit bias, attackers can reconstruct the full logit vector and apply SVD to recover the hidden dimension size and approximate weight matrix. The researchers estimated the full embedding projection layer of GPT-3.5-Turbo could be extracted for under $2,000 in API queries. OpenAI and Google have since deployed mitigations. The video explains the math behind the attack, including how partial top-k log probabilities can be extended to full logit vectors using biased queries.

Stealing Part of a Production Language Model | AI Paper Explained