RamaLama is a tool that simplifies running AI models locally inside containers using Podman, Docker, or Kubernetes. It supports multiple model registries including Ollama, HuggingFace, and OCI registries, and automatically handles GPU acceleration. The guide walks through installing RamaLama and Podman on macOS, running models like tinyllama and Gemma, and integrating them with a Spring Boot application via Spring AI's OpenAI-compatible module. It also covers deploying AI model containers on Minikube with GPU support using the generic-device-plugin and krunkit driver for Apple Silicon GPU acceleration.

10m read timeFrom piotrminkowski.com
Post cover image
Table of contents
Source CodeInstall RamaLamaInstall and Configure PodmanRun Model with RamalamaIntegrate Spring AI with Models on RamaLamaUse RamaLama to Run Containers with AI Models in KubernetesConclusion

Sort: