A deep dive into Representational Similarity Analysis (RSA) applied to GPT-2-XL's internal attention matrices. Using PyTorch hook functions to extract Query, Key, and Value activations across all 48 transformer layers, the analysis shows that while Q, K, and V vectors are nearly orthogonal (near-zero direct correlations), their RSA scores remain consistently high (~0.85–0.9). Category separability analysis using Cohen's d reveals that semantic world-knowledge is encoded within each attention matrix, with within-category RSA scores consistently exceeding across-category scores across all layers. Includes runnable Google Colab code.

16m read timeFrom thepalindrome.org
Post cover image
Table of contents
What you will learn in this 2-part post seriesWhat are the Q , K , and V vectors in the attention algorithm?Import and inspect GPT-2-XLAccess the internal calculations using hooksCorrelating Q , K , and V activationsCosine similarities and RSA (one layer)Laminar profile of RSA scoresCategory separability in one layerLaminar profile of category separability and RSASo you wanna learn more?

Sort: