Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?
Published in Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency (FAT* 2020), 2019
Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper’s authors labeling the data themselves. In this paper, we investigate to what extent a sample of machine learning application papers in social computing – specifically papers from ArXiv and traditional publications performing an ML classification task on Twitter data – give specific details about whether best practices in human annotation were followed. Read more