OpenViking is an open source context database from ByteDance's Volcengine team that replaces flat vector storage with a hierarchical virtual filesystem for AI agent memory. Instead of dumping context into embeddings, it organizes memories, resources, and skills into directories via a viking:// protocol with tiered context loading (L0 abstract, L1 overview, L2 full content), dramatically reducing token costs compared to traditional RAG. The openviking-openshift project provides Kustomize-based deployment manifests for Red Hat OpenShift AI, running Qwen3-Embedding-0.6B and Qwen3-32B via vLLM on a shared A100 GPU with GPU time-slicing, TLS-terminated routes, and full restricted-v2 SCC compliance. The entire pipeline — embedding generation, summarization, and VLM inference — runs in-cluster with no external API dependencies. Clients can interact via REST API, Python SDK, or CLI, and sessions allow agents to accumulate persistent memories across conversations.

Table of contents
What OpenViking actually doesThe deployment architecture with Red Hat AIWhy this architecture matters for OpenShift AI teamsFrom deployment to usageClient access: REST API, Python SDK, and CLIThe bigger pictureSort: