BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//programme.europython.eu//europython-2026//talk//GP39VP
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-europython-2026-GP39VP@programme.europython.eu
DTSTART;TZID=CET:20260717T125500
DTEND;TZID=CET:20260717T135500
DESCRIPTION:Over the last year\, LLM-based coding agents have matured to th
 e point where they can autonomously navigate codebases\, edit files\, run 
 tests\, and iterate on solutions with minimal human input. Many Python tea
 ms have started applying these agents to machine learning projects\, where
  the volume of repetitive experimental work makes automation appealing.\n\
 nHowever\, coding execution and ML optimization are not the same problem. 
 In software engineering\, success is often local and binary: a feature wor
 ks or it does not\, a bug is reproduced or fixed. In ML\, code correctness
  is only a prerequisite. Progress is determined by measured model behavior
  across repeated experiments\, and that measurement only happens after tra
 ining and evaluation are complete. This distinction creates a coordination
  gap. Coding agents can generate and modify training code\, but without ex
 ternal structure they drift over long experiment horizons. Teams encounter
  recurring failure modes: multiple coupled changes in a single step make r
 esults unattributable\, LLMs lose context and rediscover already tested hy
 pothesis in slightly different wording\, etc.\n\nThis poster presents an a
 rchitecture that addresses the gap by pairing coding agents with a determi
 nistic (non-LLM) orchestrator. The orchestrator manages experimentation as
  a tree search. It starts from a reproducible baseline\, samples hypothesi
 s-driven modifications constrained to a single aspect per step\, evaluates
  each modification through a fixed entrypoint that returns multiple metric
 s\, and decides which branches to expand or prune based on statistical evi
 dence. Each experiment runs in an isolated git worktree\, so every branch 
 has a clean file state\, an independent diff\, and a separate log. An anti
 -repetition memory tracks previously attempted hypotheses across the tree 
 to prevent the agent from regenerating equivalent ideas. When stopping cri
 teria are met\, an integration stage combines winning branches and evaluat
 es the result. \n\nThis presentation is aimed at ML engineers and data sci
 entists who have experimented with (or are considering) coding agents for 
 their workflows. Attendees will learn how to utilize coding agents for ML 
 tasks and structure experiments as an LLM-searchable trajectory with deter
 ministic controls rather than a sequence of ad-hoc edits.
DTSTAMP:20260524T130630Z
LOCATION:Poster Hall B
SUMMARY:Why Coding Agents Fail at ML (and How to Fix It) - Olha Poliuliakh
URL:https://programme.europython.eu/europython-2026/talk/GP39VP/
END:VEVENT
END:VCALENDAR
