Title : Crowdsourcing Ambiguity-Aware Ground Truth - a Cross-Task Evaluation

Presenter Anca Dumitrache
Abstract The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, ambiguity in the data, as well as a multitude of perspectives of the information examples are continuously present. In this talk, I will present the CrowdTruth methodology for efficiently gathering of ground truth data, as it applies to a number of diverse use cases that cover a variety of domains and annotation tasks. Central to this approach is the use of CrowdTruth metrics, capturing inter-annotator disagreement. I will present the results of comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: medical relation extraction, Twitter event identification, news event extraction and sound interpretation. This evaluation shows that capturing and interpreting disagreement is essential for acquiring a high quality ground truth. The experiments also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.

Title : Constructing Disease-centric Knowledge Graphs: a case study for depression

Presenter Zhisheng Huang
Abstract A large number of medical knowledge sources have been converted to knowledge graphs, covering everything from drugs to trials and from vocabularies to gene-disease associations. Such knowledge graphs are typically generic, covering very large areas of medicine (e.g. all of internal medicine, or arbitrary drugs, arbitrary trials, etc). Such knowledge graphs become prohibitively large, hampering both efficiency for machines and usability for people. In this talk we show how we used multiple large knowledge sources to construct a much smaller knowledge graph that is focussed on single disease (in our case major depression disorder). Such a disease-centric knowledge-graph makes it more convenient for doctors (in our case psychiatric doctors) to explore the relationship among various knowledge resources and to answer realistic clinical queries.