Restructure output datasets
Generalize output formats, the metrics are generated for these four aggregation levels
-
metrics_by_category
: metrics for content gap categories (e.g. female category of the gender gap) -
metrics_by_content_gap
: metrics for content gaps (e.g. across all categories) -
metrics_by_category_all_wikis
: metrics across all wikis per category -
metrics_by_content_gap_all_wikis
: metrics across all wikis per content gap
Additional changes
- improved configuration, including adding a
sub_content_gaps
arg to select specific gaps to compute - updated validation notebook to use new output files
- bump version to 0.3.0