Pavel Král EuroPython 2025

Pavel Král
.ical

I work as a freelance Python developer, which gives me firsthand insights into the challenges companies face when implementing LLMs.

I enjoy exploring their performance across different languages, analyzing latency, and weighing cost considerations.

My experience spans a range of roles, from DevOps to Django developer, giving me a broad perspective on how these models can be effectively integrated into diverse workflows.

Session

07-16

12:50

60min

Exploring LLM latency

Pavel Král

Which LLM provider should you choose based on latency?

As LLMs become integral to modern applications, speed can make or break your UX. This poster highlights the performance trade-offs of OpenAI, Anthropic, local models, and other providers.

I will show how different models and providers perform across key metrics, helping you determine which option delivers the best user experience. I will show not only the raw numbers but also how each metric influences user perception.

We will measure:
- Time to first token
- Time to last token
- Latency variability throughout the day
- How structured responses affect performance

We will evaluate:
- State-of-the-art models
- Open-source models hosted by cloud providers
- Local models you can run on your own infrastructure

What you will learn
- Which model to use based on its latency
- How does prompt caching affects performance

You’ll have a clearer view of which solution fits your needs and how to balance performance, costs, and practical considerations.

Data Engineering and MLOps

Main Hall B

Pavel Král .ical

Session

Pavel Král
.ical