Decouple ALIS from SLIS
The first step towards decoupling ALIS from SLIS: separate DAGs. We now have one DAG for ALIS and one for SLIS.
Important
ALIS now has a sensor that waits for the SLIS DAG to succeed, and times out after 3 days. The timeout needs production monitoring.
However, SLIS needs ALIS commons_index
task's output, so it has a sensor that waits for it.
This kind of introduces a dependency cycle, which isn't ideal.
https://phabricator.wikimedia.org/T378004 is the next iteration.
This MR also ...
- schedules all main DAGs (ALIS, SLIS, section topics, and SEAL) on Thursdays
- pulls out some default DAG properties, i.e., schedule, timeout, catchup, sensors poke interval, tags, and alerts email. They now live in
dag_config
- tries to tidy up code
Test run
prod = spark.read.table('analytics_platform_eng.image_suggestions_suggestions').where('snapshot="2024-10-21"')
dev = spark.read.table('T374434.image_suggestions_suggestions').where('snapshot="2024-10-21"')
print('ALIS prod:', prod.where('section_index is null').count(), '- ALIS dev:', dev.where('section_index is null').count(), '- SLIS prod:', prod.where('section_index is not null').count(), '- SLIS dev:', dev.where('section_index is not null').count())
ALIS prod: 24302041 - ALIS dev: 24293783 - SLIS prod: 1287597 - SLIS dev: 1287597
Bug: T374434
Bug: T350012