RamaLama is a Red Hat open source project that uses containers (Podman or Docker) to run open source LLMs locally and in production environments like Kubernetes. The talk covers running models via llama.cpp or vLLM inference engines, benchmarking local model performance, containerizing AI workloads with security isolation

47m watch time

Sort: