Contributor checklist
-
I have written tests for this DAG that will be merged into data-engineering/airflow-dags/tests/wmde -
I have locally ran the above tests and code quality checks as outlined in the tests section of the Airflow DAGs project readme -
I have tested the jobs for this DAG in my local database using the process defined in: -
wd_query_segments_daily
: wmde/analytics/hql/airflow_jobs/wd_query_segments/_test_daily -
wd_device_type_edits_daily
: wmde/analytics/hql/airflow_jobs/wd_device_type_edits/_test_daily -
wd_rest_api_user_agents_monthly
: wmde/analytics/hql/airflow_jobs/wd_rest_api_user_agents/_test_monthly
-
-
I have tested the included DAGs in my local database using the process outlined in TEST_AIRFLOW_DAGS.md and the test variable files provided for each DAG -
wd_query_segments_daily
-
wd_device_type_edits_daily
-
wd_rest_api_user_agents_monthly
-
-
All Hive tables and HDFS directories that are needed by the included DAG jobs have been created and are accessible by the analytics-wmde
Airflow user-
Hive - wmde.wd_query_segments_daily
- wmde.wd_device_type_edits_daily
- wmde.wd_rest_api_user_agents_monthly
-
HDFS - Not needed for this MR
- Another MR will be forthcoming with exports to the published datasets for
wd_query_segments_daily
andwd_device_type_edits_daily
after these have been deployed and we have the approval for the data release
-
Description
This MR adds in three new DAGs to the wmde
Airflow instance:
-
wd_query_segments_daily
: Daily segments of Wikidata Query Service queries into a partition based on query triple characteristics -
wd_device_type_edits_daily
: Daily metrics of device type edits on Wikidata looking at mobile and desktop as well as other -
wd_rest_api_user_agents_monthly
: Monthly export of WD REST API user agents and their total requests for easy WMDE Product inspection
Test outputs
wd_query_segments_daily
day | total_wdqs_queries | total_single_entity | total_all_entity_statements | total_single_term_statement | total_single_inverse_statement | total_single_instance_or_subclass_statement | total_single_known_relation_statement | total_single_unknown_relation_statement | total_complex_queries |
---|---|---|---|---|---|---|---|---|---|
2024-09-01 | 8833718 | 74316 | 17143 | 284645 | 437955 | 311713 | 44941 | 33178 | 629827 |
wd_device_type_edits_daily
day | total_wd_edits | total_edits_mobile | total_edits_desktop | total_edits_other |
---|---|---|---|---|
2024-09-01 | 362925 | 985 | 71214 | 290726 |
wd_rest_api_user_agents_monthly
Running the tests produced the desired data of user agents and total WD REST API requests in the period.