Vlad-Stefan Harbuz

I work on software and philosophy that contributes to the public good. I've been a programmer for 19 years and my current focus is making our Open Source ecosystem healthier and more sustainable.

I'm a maintainer of the Open Source Pledge, which is creating a new social norm of companies paying the maintainers they depend on, and has raised $6,896,498 for maintainers since launching. I also advise the board of the Open Source Endowment.

Next to my work in software, I am also a PhD researcher in philosophy at the University of Edinburgh, where I research the ethics and epistemology of Open Source software.

I think that being kind is important, and I love cats and birds. I write on vlad.website.


Session

07-16
15:50
30min
Binary Dependencies: Identifying the Hidden Packages We All Depend On
Vlad-Stefan Harbuz

Package manifests like pyproject.toml record source-level dependencies: pandas depends on numpy's code. The story is different for binary dependencies, which exist whenever compiled code, like C code, is called from Python. numpy depends on OpenBLAS's binaries, but this dependency relationship is not recorded anywhere. This makes OpenBLAS a phantom binary dependency.

Phantom dependencies are therefore hidden from programmers and researchers, which is bad for at least two reasons.

First, security. If one of your binary dependencies has a vulnerability, this means your project is probably also vulnerable — but you won't reliably find out about this, since your dependency is invisible.

Secondly, sustainability. If we can't keep track of our binary dependencies, we can't keep track of their maintainers either, which means we can't credit and financially support them. This can lead to maintainer burnout, which has already created serious supply chain issues.

Python is not only tremendously popular, but also valued for its ability to easily interface with compiled libraries. According to my research, around 20% of Python packages have binary dependencies.

This means that the problem of phantom binary dependencies is widespread, and puts the public at risk of harm, eg if critical infrastructure like hospitals or transportation is compromised by exploiting the aforementioned weaknesses.

I aim to describe how the problem of phantom binary dependencies can be fixed within the Python ecosystem, and demo some of my preliminary work.

First, binary dependencies must be identified. Tools like auditwheel and elfdeps are able to identify a project's required dynamic libraries. If we create better APIs for these tools, and integrate them with package managers such as pip and uv, we can give developers and researchers visibility into binary dependencies, dispelling the phantom.

Beyond this, standards like PEP 725, PEP 770 and PEP 804 specify how we might record binary dependency relationships in an easily accessible way. I'll explain how we can build on these standards to create tools that will allow users and researchers to explore binary dependencies and identify security issues by default.

Lastly, I want to talk about the road towards the ultimate aim of having binary dependencies be managed not by Python package managers, but by system package managers, as they should be. This will require interoperation between package managers, and I'll explain how this might work.

Tooling, Packaging, Developer Productivity
Theatre Hall (S2)