A conference talk covering three levels of caching for Java applications: in-memory caching with Caffeine, distributed caching with Redis/Valkey using the Redisson library, and semantic caching powered by vector similarity search. The speaker explains how semantic caching works by vectorizing LLM inputs, using HNSW graphs for approximate nearest neighbor search, and storing vectors alongside cached LLM responses in Valkey Search to avoid expensive inference calls. Key concepts include cosine similarity, Euclidean distance, inner product comparisons, similarity thresholds, TTL management, and memory trade-offs. Practical demos show Caffeine cache, Redisson-based distributed cache with per-key TTL and rate limiting, and a semantic cache that correctly matches 'France capital' to a cached response for 'capital of France'.

48m watch time

Sort: