Senior Software Engineer
CurrentGenAI Efficiency: improve LLM model efficiency to reduce latency, TPU/GPU cost etc- Distributed KV cache to reduce cost and latency, including implicit prefix caching and explicit context caching, see https://ai.google.dev/gemini-api/docs/caching2024 Q3 Perfy Gold Award in EEVEE (efficient estimation and verification with early exist) to speed up LLM.