Mateusz Modrzejewski

Mateusz Modrzejewski, PhD - software engineer, researcher, conference speaker, author and co-author of papers on music information retrieval and audio AI. Assistant Professor at the Institute of Computer Science of Warsaw University of Technology, where he leads an Audio Intelligence Lab. Previously at Apple (Music Machine Learning team, Apple Music). Has also worked with research and engineering teams of other Fortune 500 companies, providing AI solutions and analytics.

Apart from his scientific and engineering work, he is also an experienced touring musician, having performed for audiences of up to 150,000 people and having toured in Poland, China, Vietnam, the UK, Germany, Ukraine, Lithuania and Estonia, among others. Some of the artists he has played with include The Dumplings, Grubson, Marek Dyjak, Chłopcy Kontra Basia, Maria Sadowska, Pablopavo i Ludziki, Majka Jeżowska, Michał Milczarek Trio.


Session

07-17
11:05
30min
How Music Generation Actually Works
Mateusz Modrzejewski

Music generation has gone from a research curiosity to something you can try in a browser. Commercial platforms and open source models can produce full songs from a text prompt. Between the hype and the technical papers, it’s hard to get a straight answer about what’s actually going on under the hood. This talk is a clear, honest walkthrough of how music generation systems work, in simple language, no deep machine learning knowledge needed.

We start with the core challenge: how do you turn a continuous audio signal into something a generative model can work with? Neural audio codecs solve this by compressing waveforms into sequences of discrete tokens, and this idea is the foundation everything else builds on. From there, we look at the two main modeling strategies: token prediction and diffusion. We compare what each does well, where it struggles, and why the choice between them matters.

On the practical side, we walk through the open source models and Python tools available today, and what you can build with them. Then we get into evaluation, one of the most important open problems in the field. Current metrics only tell part of the story, and there is no standard benchmark for comparing systems. This has real consequences for how research moves forward and how models get used.

We close with a discussion that often gets skipped: how artists and musicians see these tools, what legal questions remain around training data and copyright, and why these conversations matter for the future of the field.​​​​​​​​​​​​​​​​

Python for Games, Art, Play and Expression
Theatre Hall (S2)