Building real-time personalization systems that stay under the 200ms latency threshold requires deliberate architectural choices. A two-tower approach separates fast candidate retrieval (under 20ms using vector search and HNSW indexing) from heavier AI ranking, avoiding the impossibility of scoring entire catalogs per request.
Sort: