EuroPython 2025

Sharing is caring: Efficient Data Exchange with pyarrow
2025-07-17 , North Hall

Apache Arrow was designed with multiple goals in mind, one of the most important being the ability to exchange data between systems efficiently. In this talk we will explore what that really means and what has been the evolution of the Arrow project around the data exchange area during the years.

We will cover how to share Arrow data in process leveraging the use of the C Data interface, C Device Interface and C Stream Interface along with the Arrow PyCapsule Interface. We will show examples on how popular dataframe libraries (pandas, polars) use those exchange methods.

We will also cover an overview of the Inter Process Communication Protocol used to share Arrow data between processes and how to build your own network exchange leveraging the use of the Arrow format with Flight RPC. These overviews will be accompanied by Python examples.

By the end of the session, attendees will have a clear understanding of how pyarrow can be utilized to exchange data faster within and between their data applications. We will provide examples on how and will share our tips on when to use them.


Expected audience expertise:

Advanced

My software development journey began with the open-source and the Apache Arrow project. In 2021, I made my first contribution to the Arrow R package—an experience that sparked my interest in software development and open-source collaboration. During my internship at Quansight, I was introduced to the Python DataFrame API standard, which deepened my understanding of interoperability challenges.

In 2022, after over a year of contributions, I became an Apache Arrow committer, primarily focusing on the Python implementation. I continued my work as a PyArrow maintainer at Voltron Data until mid-2024.

Apache Arrow remains the project I’m most passionate about, and I’m still actively involved in its development as a freelancer.