Add aggregation and content gap pipeline
- added aggregation pipeline (gender/sexual orientation/geographic)
- dataset generated from aggregation pipeline with the following schema: https://docs.google.com/document/d/1Z-EpXMnfzHAp-M5vdQ-NbFkXx3tQRMQpeZyYwBaK4xU/edit?usp=sharing (not prescriptive)
- added content gap pipeline (previously under
interactive/
directory) - some function names modified in func.py and util.py
- deleted unnecessary code
- tested on a small dataset using the
spark2-submit --master local
on the stat machine