2025-07-16 –, South Hall 2B
Petabytes of unstructured data stand as the cornerstone upon which triumphant Machine Learning (ML) models are built. One common method for researchers to extract subsets of data to their local environments is by simply using the age-old copy-paste, for model training. This method allows for iterative experimentation, but it also introduces challenges with the efficiency of data management when developing machine learning models, including reproducibility constraints, inefficient data transfer, alongside limited compute power.
This is where data version control technologies can help overcome these challenges for computer vision researchers. In this workshop we'll cover:
- How to use open source tooling to version control your data when working with data locally.
- Best practices for working with data, preventing the need to copy data locally, while enabling the training of models at scale directly on the cloud. This will be demoed with an OSS stack:
- Langchain
-Tensorflow - PyTorch
- Keras
You will come away with practical methods to improve your data management when developing and iterating upon Machine Learning models, built for modern computer vision research.
Intermediate
Tal Sofer is a product manager at Treeverse, the company behind lakeFS, an open-source platform that delivers a git-like experience to object-storage based data lakes. Tal is a former engineering manager who led engineering teams building scalable tools for developers and started her journey at Treeverse as an R&D team lead.
Tal holds a B.sc in Computer Science and Chinese studies from the Hebrew University of Jerusalem. In her free time you can find her running, cooking or brushing up on her Chinese.
Itai is a seasoned software engineer, passionate about clean code and design, and about simplifying what is complex. Doing what’s needed, whether it’s backend, full-stack, or mobile development, and enjoys creating well-crafted products.