Skip to content

Decouple ALIS from SLIS

Marco Fossati requested to merge T374434 into main

The first step towards decoupling ALIS from SLIS: separate DAGs. We now have one DAG for ALIS and one for SLIS.

Important

ALIS now has a sensor that waits for the SLIS DAG to succeed, and times out after 3 days. The timeout needs production monitoring.

However, SLIS needs ALIS commons_index task's output, so it has a sensor that waits for it. This kind of introduces a dependency cycle, which isn't ideal. https://phabricator.wikimedia.org/T378004 is the next iteration.

This MR also ...

  • schedules all main DAGs (ALIS, SLIS, section topics, and SEAL) on Thursdays
  • pulls out some default DAG properties, i.e., schedule, timeout, catchup, sensors poke interval, tags, and alerts email. They now live in dag_config
  • tries to tidy up code

Test run

prod = spark.read.table('analytics_platform_eng.image_suggestions_suggestions').where('snapshot="2024-10-21"')
dev = spark.read.table('T374434.image_suggestions_suggestions').where('snapshot="2024-10-21"')
print('ALIS prod:', prod.where('section_index is null').count(), '- ALIS dev:', dev.where('section_index is null').count(), '- SLIS prod:', prod.where('section_index is not null').count(), '- SLIS dev:', dev.where('section_index is not null').count())

ALIS prod: 24302041 - ALIS dev: 24293783 - SLIS prod: 1287597 - SLIS dev: 1287597

Bug: T374434

Bug: T350012

Edited by Marco Fossati

Merge request reports