Exploring LLM latency EuroPython 2025

Exploring LLM latency
.ical

2025-07-16 12:50–13:50, Main Hall B

Which LLM provider should you choose based on latency?

As LLMs become integral to modern applications, speed can make or break your UX. This poster highlights the performance trade-offs of OpenAI, Anthropic, local models, and other providers.

I will show how different models and providers perform across key metrics, helping you determine which option delivers the best user experience. I will show not only the raw numbers but also how each metric influences user perception.

We will measure:
- Time to first token
- Time to last token
- Latency variability throughout the day
- How structured responses affect performance

We will evaluate:
- State-of-the-art models
- Open-source models hosted by cloud providers
- Local models you can run on your own infrastructure

What you will learn
- Which model to use based on its latency
- How does prompt caching affects performance

You’ll have a clearer view of which solution fits your needs and how to balance performance, costs, and practical considerations.

Expected audience expertise:

Intermediate

Pavel Král

I work as a freelance Python developer, which gives me firsthand insights into the challenges companies face when implementing LLMs.

I enjoy exploring their performance across different languages, analyzing latency, and weighing cost considerations.

My experience spans a range of roles, from DevOps to Django developer, giving me a broad perspective on how these models can be effectively integrated into diverse workflows.

Exploring LLM latency .ical 2025-07-16 12:50–13:50, Main Hall B

Exploring LLM latency
.ical

2025-07-16 12:50–13:50, Main Hall B