2026-07-16 –, Chamber Hall A (S3A)
GPU programming can be scary, but doesn't need to be! Did you know you can access the full performance of CUDA purely in Python? With the full CUDA Python stack, you have a friendly interface to get you started with GPU acceleration.
In this example-driven talk, we'll begin with a general discussion of the CUDA model and how to manage accelerator devices in Python with cuda.core. Next, we'll teach you how to create arrays and launch work with CuPy. Then, you'll learn how to customize parallel algorithms with cuda.compute and write your own kernels that leverage cooperative algorithms with cuda.coop, and integrate seamlessly with accelerated libraries such as cuBLAS and cuDNN.
We'll look at a variety of parallel examples, from counting words, to implementing softmax and row-wise reductions.
By the time the talk is over, you'll be ready to start accelerating your Python code with GPUs!
Bryce Adelstein Lelbach has spent over a decade developing programming languages, compilers, and libraries. He is passionate about parallel programming and strives to make it more accessible for everyone.
Bryce is a Principal Architect at NVIDIA, where he founded the Core C++ Compute Libraries team and now leads the Vanguard Programming group that drives NVIDIA's roadmap for programming languages, compilers, and core libraries.
He is a leader of the systems programming language community, having served as chair of the C++ Library Evolution and the US programming language standards committee. He has been an organizer and program chair for many conferences over the years. On the C++ committee, he has worked on concurrency primitives, parallel algorithms, senders, and multidimensional arrays.
He previously worked at Lawrence Berkeley National Laboratory and Louisiana State University. He is one of the founding developers of the HPX parallel runtime system.
Outside of work, Bryce is passionate about airplanes and watches. He lives in Midtown Manhattan with his girlfriend and dog.