The
Dataset Nutrition Label
aims to create a standard for interrogating datasets for measures that will ultimately drive the creation of better, more inclusive machine learning models. Our current prototype includes several ‘modules’ across a variety of qualitative and quantitative data that we believe is useful for exploring several aspects in datasets before the development of models.

We developed this Label on ProPublica’s Dollars for Docs (2013-2015)
dataset, which details payments made from pharmaceutical companies to doctors. You can navigate through the modules using the links on the left.
To learn more, please visit our
website, read our
paper abstract, or email us at
nutrition@media.mit.edu.