Deploy OpenViking on OpenShift AI to improve AI agent memory

OpenViking is an open source context database from ByteDance's Volcengine team that replaces flat vector storage with a hierarchical virtual filesystem for AI agent memory. Instead of dumping context into embeddings, it organizes memories, resources, and skills into directories via a viking:// protocol with tiered context loading (L0 abstract, L1 overview, L2 full content), dramatically reducing token costs compared to traditional RAG. The openviking-openshift project provides Kustomize-based deployment manifests for Red Hat OpenShift AI, running Qwen3-Embedding-0.6B and Qwen3-32B via vLLM on a shared A100 GPU with GPU time-slicing, TLS-terminated routes, and full restricted-v2 SCC compliance. The entire pipeline — embedding generation, summarization, and VLM inference — runs in-cluster with no external API dependencies. Clients can interact via REST API, Python SDK, or CLI, and sessions allow agents to accumulate persistent memories across conversations.

#python

#ai-agents

#rag

#openshift

Apr 23•8m read time•From developers.redhat.com

Table of contents

What OpenViking actually does The deployment architecture with Red Hat AI Why this architecture matters for OpenShift AI teams From deployment to usage Client access: REST API, Python SDK, and CLI The bigger picture

Comment

Bookmark

Copy

Sort: