antirez

Running large language models locally is already too slow for practical use even on high-end Apple Silicon hardware like Mx chips. Even relatively small models like Qwen 3.5 35B push the limits of what's usable, making the idea of running truly massive models (1T+ parameters) from SSD or other slow storage purely theoretical and impractical for real work.

If you want, you can't just run a 1T model from SSD, but a 10T model from manual pen and paper math. The fact is that even small LLMs like Qwen 3.5 35B are already almost too slow for serious usage in high end Mx Apple Silicon. Running big LLMs slow is nice, but useless.

<p>If you want, you can't just run a 1T model from SSD, but a 10T model from manual pen and paper math. The fact is that even small LLMs like Qwen 3.5 35B are already almost too slow for serious usage in high end Mx Apple Silicon. Running big LLMs slow is nice, but useless.</p>