Hypura is a storage-tier-aware LLM inference scheduler for Apple Silicon that enables running models larger than physical memory by intelligently placing tensors across GPU, RAM, and NVMe tiers. It supports three inference modes: full-resident (model fits in GPU+RAM), expert-streaming for MoE models like Mixtral (exploiting
Table of contents
Why does this matter?How it worksPerformanceInstallQuick startOllama-compatible serverArchitectureFAQSafety notesLicenseEthics1 Comment
Sort: