SelfHostLLM is a tool that helps developers calculate GPU memory requirements and maximum concurrent requests for self-hosted large language model inference. It supports popular models like Llama, Qwen, DeepSeek, and Mistral, allowing users to plan their AI infrastructure efficiently with custom configurations.
1 Comment
Sort: