T370851-T376721 Eexternal task sensor and update paths
Contributor checklist
-
I have written tests for this DAG that will be merged into data-engineering/airflow-dags/tests/wmde -
I have locally ran the above tests and code quality checks as outlined in the tests section of the Airflow DAGs project readme -
I have tested the jobs for this DAG in my local database using the process defined in:
Note: Tests were ran for previous MRs and do not need to be reran as there are no changes to job queries.
-
I have tested the included DAGs in my local database using the process outlined in TEST_AIRFLOW_DAGS.md and the test variable files provided for each DAG - Box is checked for tests for T370851
Note: T376721 is changing some paths to files, so going to production is ok as worst comes to worst the HQL file to generate the CSV isn't accessed and thus no data is exported.
-
All Hive tables that are needed by the included DAG jobs have been created and are accessible by the analytics-wmde
Airflow user- All tables were created for prior MRs
Description
This MR contains work for two different tasks:
-
T370851: It was found that the
wd_query_segments_daily
DAG will not pick up data for every hour within the partitioned database as it is possible that the Wikidata Query Service has no requests in an hour. We thus need to switch the base sensor over to anExternalTaskSensor
. -
T376721: Some of the identifiers for previously deployed
wd_rest_api_metrics_daily
andwd_item_sitelink_segments_weekly
DAG jobs need to be updated to be more standardized. SpecificallyTASK_ID_gen_csv_DAG_INTERVAL
andcreate_DAG_ID_table
should be renamedgen_csv_DAG_ID
andcreate_table_DAG_ID
respectively to assure that the DAG ID is always unbroken and that jobs outside the main DAG job query are always prepended to the DAG ID.
Test outputs
T370851 job to show the new sensor works:
day | total_wdqs_queries | total_single_entity | total_all_entity_statements | total_single_term_statement | total_single_inverse_statement | total_single_instance_or_subclass_statement | total_single_known_relation_statement | total_single_unknown_relation_statement | total_complex_queries |
---|---|---|---|---|---|---|---|---|---|
2024-10-17 | 10747313 | 79493 | 16465 | 99509 | 666507 | 94454 | 76772 | 17461 | 9696652 |
T376721: Jobs are not changed, so testing isn't required.