2025-07-17 –, North Hall
Apache Arrow was designed with multiple goals in mind, one of the most important being the ability to exchange data between systems efficiently. In this talk we will explore what that really means and what has been the evolution of the Arrow project around the data exchange area during the years.
We will cover how to share Arrow data in process leveraging the use of the C Data interface, C Device Interface and C Stream Interface along with the Arrow PyCapsule Interface. We will show examples on how popular dataframe libraries (pandas, polars) use those exchange methods.
We will also cover an overview of the Inter Process Communication Protocol used to share Arrow data between processes and how to build your own network exchange leveraging the use of the Arrow format with Flight RPC. These overviews will be accompanied by Python examples.
By the end of the session, attendees will have a clear understanding of how pyarrow can be utilized to exchange data faster within and between their data applications. We will provide examples on how and will share our tips on when to use them.
Advanced
My software development journey started with the open source and Apache Arrow project. More specifically, I started with contributing to the Arrow R package in 2021. After that I have contributed to other open source projects connected to the Python dataframe API standard as a part of my internship at Quansight. I became an Apache Arrow committer in 2022 after regularly contributing to Apache Arrow (Python) since 2021.