#VoxxedDaysCERN26

Managing and deploying AI models can often require extensive system configuration and complex software dependencies. RamaLama, a new open-source tool, aims to make working with AI models straightforward by leveraging container technology, making the process "boring"—predictable, reliable, and easy to manage. RamaLama integrates with container engines like Podman and Docker to deploy AI models within containers, eliminating the need for manual configuration and ensuring optimal setup for both CPU and GPU systems.

This talk will introduce RamaLama’s key features, including support for multiple AI model registries (Ollama, Hugging Face, and OCI), simplified commands for running models as chatbots or REST API services, and compatibility with alternative AI runtimes like llama.cpp and vllm. We’ll explore RamaLama’s unique capabilities, such as generating Podman quadlet files for edge deployments and Kubernetes YAML for scalable deployment, demonstrating how it allows developers to transition from local experimentation to production seamlessly. Join us to learn how RamaLama enables frictionless, containerized AI model deployment for developers and system administrators alike.

Devoxx

RamaLama is a Red Hat open source project that uses containers (Podman or Docker) to run open source LLMs locally and in production environments like Kubernetes. The talk covers running models via llama.cpp or vLLM inference engines, benchmarking local model performance, containerizing AI workloads with security isolation flags, building RAG pipelines using Dockling for document ingestion and vector databases, generating systemd Quadlets and Kubernetes YAML manifests for deployment, and building agentic AI applications with LangChain4j. The core idea is solving the 'works on my machine' problem for AI by treating models as versioned container artifacts that move consistently from laptop to cluster.

RamaLama: Making working with AI Models Boring by Cedric Clyburn