2026-07-17 –, Chamber Hall B (S3B)
Most LLM tutorials end where production begins. When OpenAI returns a 429, when Claude’s latency spikes 10x, or when your streaming response dies mid-generation—what happens to your users?
This talk covers battle-tested architecture patterns for production LLM streaming, moving beyond simple API calls to resilient systems. We will explore multi-provider failover chains (OpenAI → Anthropic → local), circuit breakers specifically configured for AI workloads, and token-aware rate limiting that protects both latency and cost.
You will learn framework-agnostic Python patterns using asyncio and LiteLLM for provider abstraction. We will examine real incident patterns—including the December 2025 Anthropic outage—and the architectural decisions that separate 99.5% availability from 99.9%.
Nitish Agarwal is a technical leader with over 14 years of experience building and scaling engineering organizations. Currently serving as a Principal Engineer (previously Sr. Manager) at GoDaddy, he leads the AI and digital transformation initiatives for the company's customer care platform. His recent work focuses on deploying LLM-powered intelligent assistants and architecting customer data platforms that serve over 20 million customers globally.
Prior to GoDaddy, Nitish was a Product Owner at Balena in London, where he spearheaded IoT marketplace developments that reduced software adoption times by 90%. His career spans pivotal roles at major tech firms including Skyscanner, where he optimized flight search response times by 93% for 80 million monthly users, and Expedia Group, where he helped build the Vrbo engineering teams from the ground up in India.
Nitish holds an MBA from City University of London and a Master of Engineering from the University of Waterloo. A specialist in Cloud Architecture (AWS) and GenAI integration, he is passionate about bridging the gap between complex distributed systems and practical, revenue-driving product strategy.