Caching is a first-class architectural concern in agentic systems. This talk breaks down how Java applications can layer internal, distributed, and semantic caches. We'll explore in-process caching with Caffeine for ultra-low-latency access, distributed caching with Redisson and Valkey for shared cache and semantic caching using Vector Similarity Search to reduce latency and cost while scaling LLM access.

Presented by Dmitry Polyakovsky (Oracle) at JavaOne 2026 (CA, March 2026).

All https://www.youtube.com/playlist?list=PLX8CzqL3ArzUMVSzm-z_-if8BIB55EGl4 talks.

➠ https://valkey.io
➠ https://valkey.io/blog/introducing-valkey-search/
➠ https://github.com/ben-manes/caffeine
➠ https://github.com/redisson/redisson
➠ https://ollama4j.github.io/ollama4j/
➠ https://ollama.com
➠ https://github.com/dmitrypol/cachedemo

Tags: #Java #Cache #JavaOne #JVM #JDK 

LRN1451 19 105 F

Java (Official Oracle)

A conference talk covering three levels of caching for Java applications: in-memory caching with Caffeine, distributed caching with Redis/Valkey using the Redisson library, and semantic caching powered by vector similarity search. The speaker explains how semantic caching works by vectorizing LLM inputs, using HNSW graphs for approximate nearest neighbor search, and storing vectors alongside cached LLM responses in Valkey Search to avoid expensive inference calls. Key concepts include cosine similarity, Euclidean distance, inner product comparisons, similarity thresholds, TTL management, and memory trade-offs. Practical demos show Caffeine cache, Redisson-based distributed cache with per-key TTL and rate limiting, and a semantic cache that correctly matches 'France capital' to a cached response for 'capital of France'.

From Local Cache to Semantic Cache: Java Patterns for AI Apps