BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//programme.europython.eu//europython-2026//speaker//XCBLV
 D
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-europython-2026-JEN7WA@programme.europython.eu
DTSTART;TZID=CET:20260715T160500
DTEND;TZID=CET:20260715T163500
DESCRIPTION:Modern data systems rarely stay unchanged. Schemas evolve\, sea
 rch indices are rebuilt\, and for some period of time\, multiple versions 
 of the same dataset need to be kept in sync. A typical example is a zero-d
 owntime migration\, where several versions of the same data must be synchr
 onised in parallel while the system remains live. This creates a subtle bu
 t important challenge: how to keep each version consistent without duplica
 ting extraction work or increasing database load.\n\nIn this talk\, I’ll
  describe a production ETL architecture built in Python that processes mul
 tiple data versions in parallel using a single streaming pipeline. The sys
 tem synchronises data from PostgreSQL into OpenSearch\, keeps each version
  independently consistent\, and guarantees that no version ever moves back
 wards — while querying the database only once per batch. The talk is bas
 ed on a real production system and explains the design decisions and trade
 offs behind it.\n\nThe design is based on generator pipelines and function
 al composition using the functools module. Instead of relying on threads o
 r async frameworks\, the ETL flow is expressed as a sequence of small\, co
 mposable functions: page extraction\, DTO normalisation\, version-aware fi
 ltering\, transformation\, bulk loading\, and dead-letter handling. The re
 ference implementation uses Django as the ORM layer and Celery for orchest
 ration\, but the core design is not framework-specific and can be applied 
 equally with SQLAlchemy or raw SQL.\n\nI’ll show how this design makes i
 t possible to:\n1. Synchronise multiple versions efficiently without dupli
 cate database queries\n2. Process large datasets in a streaming\, memory-e
 fficient way\n3. Build extensible pipelines from protocol-defined function
 al stages\n4. Maintain a clear separation of concerns with strong typing a
 nd isolated tests\n5. handle failures safely using bulk retries and dead-l
 etter queues\n\nAttendees will leave with concrete patterns for building f
 ast and maintainable ETL pipelines in Python\, and with a clearer understa
 nding of how generators and functional composition can be used to model co
 mplex data flows — borrowing ideas from Go-style concurrency while stayi
 ng entirely within the Python ecosystem. While the examples focus on ETL p
 ipelines\, the patterns discussed apply to any Python system that processe
 s large streams of data and needs to balance performance\, correctness\, a
 nd extensibility.\n\nAudience: Intermediate to advanced Python developers.
  Familiarity with generators and basic ETL concepts is helpful\; interest 
 in functional design patterns and backend data systems will be beneficial.
DTSTAMP:20260524T121641Z
LOCATION:Conference Hall Complex (S4)
SUMMARY:Fast Multi-Version ETL Pipelines in Python with Generators and func
 tools - Nikita Smirnov
URL:https://programme.europython.eu/europython-2026/talk/JEN7WA/
END:VEVENT
END:VCALENDAR
