Abstract |
The lack of annotated datasets for training and benchmarking is one of the main challenges of Clinical Natural Language Processing. In addition, current methods for collecting annotation attempt to minimize disagreement between annotators, and therefore fail to model the ambiguity inherent in language. In this presentation, I will discuss the CrowdTruth method for collecting medical ground truth through crowdsourcing, based on the observation that disagreement between annotators can be used to capture ambiguity in text. I will present the results of an experiment in training a classification model for relation extraction. Our findings show that the crowd can perform at least as well as medical experts when training over 2 difficult relations (treats and cause), as well as out-performing automated relation extraction with distant supervision. Finally, I will discuss preliminary work in expanding this experiment for open domain relation extraction. |