Surviving LLM Traffic Spikes: Routing, Rate Limits, and Failover in Python EuroPython 2026

Surviving LLM Traffic Spikes: Routing, Rate Limits, and Failover in Python
.ical
2026-07-16 13:05–13:35, S3B

Your team ships an AI feature and users love it. Then one viral post turns "normal load" into hundreds of LLM requests per second.

LLM calls don't behave like traditional API requests. They're slow (sometimes seconds), expensive, rate-limited by providers, and a single provider outage can take your entire feature down. You can't just "add more servers." You need a routing layer that knows where to send traffic, when to back off, and how to fail without taking everything with it.

In this talk, we'll walk through the LLM traffic routing architecture we built in Python at Manychat, where we serve AI-powered automation to thousands of Instagram and messaging accounts. Everything we'll show is running in production.

We'll cover the core gateway patterns for multi-provider LLM traffic, implemented using LiteLLM Router as a reference design.

By the end, you'll walk away with:

A weighted routing blueprint you can adapt to your own provider mix
Fallback and cooldown rules designed to survive real outages
Practical rate limiting (requests and tokens) with retry backoff
The monitoring baseline (latency, tokens, errors by provider, weight drift) to catch issues before they cascade
A checklist for rolling this out safely, incrementally

Expected audience expertise: Intermediate

Sergi Porta

Sergi Porta is a Python Team Lead at Manychat, where he leads the integration of AI features into the Manychat product, assembled a team of Python developers from scratch, and makes the architectural decisions behind a system that powers millions of interactions between accounts and their subscribers. His role combines technical leadership, onboarding, and helping the team grow. Previously he was an Engineering Manager at a medical imaging company, first engineer at an early-stage startup, and started his career as a software engineer at HP. 10+ years across healthcare, consumer tech, and developer tooling, with a strong interest in engineering management and helping teams perform at their best.

Surviving LLM Traffic Spikes: Routing, Rate Limits, and Failover in Python .ical 2026-07-16 13:05–13:35, S3B

Surviving LLM Traffic Spikes: Routing, Rate Limits, and Failover in Python
.ical
2026-07-16 13:05–13:35, S3B