Pavel Král
I work as a freelance Python developer, which gives me firsthand insights into the challenges companies face when implementing LLMs.
I enjoy exploring their performance across different languages, analyzing latency, and weighing cost considerations.
My experience spans a range of roles, from DevOps to Django developer, giving me a broad perspective on how these models can be effectively integrated into diverse workflows.
Session
Which LLM provider should you choose based on latency?
As LLMs become integral to modern applications, speed can make or break your UX. This poster highlights the performance trade-offs of OpenAI, Anthropic, local models, and other providers.
I will show how different models and providers perform across key metrics, helping you determine which option delivers the best user experience. I will show not only the raw numbers but also how each metric influences user perception.
We will measure:
- Time to first token
- Time to last token
- Latency variability throughout the day
- How structured responses affect performance
We will evaluate:
- State-of-the-art models
- Open-source models hosted by cloud providers
- Local models you can run on your own infrastructure
What you will learn
- Which model to use based on its latency
- How does prompt caching affects performance
You’ll have a clearer view of which solution fits your needs and how to balance performance, costs, and practical considerations.