Abstract |
I will present the CrowdTruth (http://crowdtruth.org/) approach to performing relation extraction from medical data. CrowdTruth exploits inter-annotator disagreement as a useful signal, allowing us to evaluate data quality, such as ambiguity and vagueness at the sentence level, worker quality, and the quality of the target semantics. I will introduce a workflow for generating gold standard annotations for medical relation extraction through a series of crowdsourcing tasks. Then I will present an evaluation of the crowd data by comparing it with the current gold standard in medical relation extraction. The evaluation is performed by training a relation extraction classifier with both datasets, and comparing the results for F1 measure and accuracy in a cross-validation experiment. |