StripedHyena is an open source model architecture that offers an alternative to Transformers. It is faster and more memory efficient for training and inference, and it achieves comparable performance to Transformers. StripedHyena combines attention and gated convolutions in its design and utilizes new model grafting techniques. Future plans include exploring larger models, multi-modal support, and further performance optimizations.
Table of contents
A single architecture for short and longer context tasks Understanding the architecture design space: Many ways to improve scalingA shift in the computational footprint of language models: Cheaper fine-tuning, faster inferenceReducing memory for inferenceFrom signal processing to language modelsWhat’s aheadAcknowledgmentsSort: