Effort is a new algorithm for LLM inference that allows for real-time adjustment of calculations during inference. It is implemented for Mistral and does not require retraining. The implementation is currently available for FP16 only.
Sort: