How Taalas "prints" LLM onto a chip?

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Taalas, a startup, has built a fixed-function ASIC chip with Llama 3.1 8B weights physically etched into silicon transistors, achieving 17,000 tokens/second inference — roughly 10x faster, cheaper, and more energy-efficient than GPU-based systems. By hardwiring the model's 32 layers sequentially on-chip, data flows directly

4m read timeFrom anuragk.com
Post cover image
Table of contents
BasicsHOW NVIDIA GPUs process stuff? (Inefficiency 101)Breaking the wall!So, they don't use any RAM?But isn't fabricating a custom chip for every model super expensive?

Sort: