2025-07-16 –, Main Hall B
Which LLM provider should you choose based on latency?
As LLMs become integral to modern applications, speed can make or break your UX. This poster highlights the performance trade-offs of OpenAI, Anthropic, local models, and other providers.
I will show how different models and providers perform across key metrics, helping you determine which option delivers the best user experience. I will show not only the raw numbers but also how each metric influences user perception.
We will measure:
- Time to first token
- Time to last token
- Latency variability throughout the day
- How structured responses affect performance
We will evaluate:
- State-of-the-art models
- Open-source models hosted by cloud providers
- Local models you can run on your own infrastructure
What you will learn
- Which model to use based on its latency
- How does prompt caching affects performance
You’ll have a clearer view of which solution fits your needs and how to balance performance, costs, and practical considerations.
Intermediate
I work as a freelance Python developer, which gives me firsthand insights into the challenges companies face when implementing LLMs.
I enjoy exploring their performance across different languages, analyzing latency, and weighing cost considerations.
My experience spans a range of roles, from DevOps to Django developer, giving me a broad perspective on how these models can be effectively integrated into diverse workflows.