Shimmy is a 5.1MB Rust-based inference server that provides 100% OpenAI-compatible API endpoints for local GGUF models. It offers zero-configuration setup with auto-discovery of models from Hugging Face cache, Ollama, and local directories. The tool enables privacy-first AI development by running models locally without external

Sort: