BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//programme.europython.eu//europython-2026//speaker//QAHDB
 D
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-europython-2026-JQSQBB@programme.europython.eu
DTSTART;TZID=CET:20260716T112500
DTEND;TZID=CET:20260716T115500
DESCRIPTION:GPUs power almost every modern Python workload in machine learn
 ing\, vision\, and scientific computing. Yet most Python developers treat 
 GPUs like “faster CPUs” and hope frameworks will handle performance au
 tomatically.\nThat mental model is wrong and it is the main reason Python 
 GPU code underperforms by orders of magnitude.\n\nThis introductory talk b
 uilds the correct mental model for writing fast GPU code in Python. We wil
 l explore how modern GPUs are structured\, why global memory dominates exe
 cution time\, and how to determine whether a kernel is compute-bound or me
 mory-bound using the roofline model. Through concrete examples\, attendees
  will see why naive parallel kernels fail to scale and which hardware-awar
 e patterns\; tiling\, data reuse\, and kernel fusion actually lead to spee
 dups.\n\nThe session concludes with a practical look at how Python can exp
 ress these ideas using Triton\, showing how high-level code is compiled in
 to PTX and how Python can approach near-CUDA performance when it respects 
 hardware constraints.
DTSTAMP:20260524T121707Z
LOCATION:Chamber Hall A (S3A)
SUMMARY:An Introduction to Writing Fast GPU Code in Python - Abhik Sarkar
URL:https://programme.europython.eu/europython-2026/talk/JQSQBB/
END:VEVENT
END:VCALENDAR