A hands-on walkthrough of implementing LLM attention mechanisms in Elixir using the Nx and Axon libraries, based on Sebastian Raschka's 'Build a LLM from Scratch'. Covers four progressively complex variants: simplified self-attention (no trainable weights), self-attention with trainable weight matrices (V1 uniform init, V2 Axon

52m read timeFrom karlosmid.com
Post cover image
Table of contents
TL;DRThe problem with modeling long sequencesCapturing data dependencies with attention mechanismsAttending to different parts of the input with self-attentionImplementing self-attention with trainable weightsHiding future words with causal attentionExtending single-head attention to multi-head attentionExampleStep 1: reshapeStep 2: transposeWhy transpose?Conclusion

Sort: