Alenka Frim
My software development journey began with the open-source and the Apache Arrow project. In 2021, I made my first contribution to the Arrow R package, an experience that sparked my interest in software development and open-source collaboration. During my internship at Quansight, I was introduced to the Python DataFrame API standard, which deepened my understanding of interoperability challenges.
In 2022, after over a year of contributions, I became an Apache Arrow committer, primarily focusing on the Python implementation. I continued my work as a PyArrow maintainer at Voltron Data until mid-2024.
Apache Arrow remains the project I’m most passionate about, and I’m still actively involved in its development as a freelancer.
Session
Pandas now natively supports PyArrow-backed data types. But what does that actually mean? If you've ever wondered how these two libraries relate to each other, whether they compete or complement each other, and what happens to your data when it moves between them, this talk is for you.
As PyArrow maintainers, we took on the challenge of digging into the conversion code between PyArrow and Pandas, and we're here to share what we've learned. We'll show you what's really going on under the hood: how Arrow's columnar format differs from Pandas' block-based memory layout (including what a BlockManager actually is), when data can be shared without copying, and when a full copy is unavoidable.
We'll also clarify what each library is designed for and how they work together rather than against each other. With pandas increasingly adopting PyArrow as a backend, understanding this relationship is becoming essential rather than optional.
This talk is aimed at Python developers and data engineers who want to deepen their understanding of what's happening beneath the surface.