WAI meetings

Description

Title

Crowdsourcing for Distant Supervision with Active Learning

Abstract

Relation extraction from distant supervision generates a lot of false positives in training data for natural language processing (NLP) models. Crowdsourcing is effective at gathering ground truth for training NLP systems, but it can also be quite expensive. Active Learning is a method to optimize crowdsourcing by picking those examples from the data that are most representative or most likely to need correction. In this talk, I will discuss an ongoing work to predict which distant supervision seeds are likely to be false positives and have them annotated by the crowd. Compared to annotating a random sub-sample, we expect our active learning method to provide higher quality training data and result in better performance of our relation extraction model.

Other presentations by Anca Dumitrache

Date	Title
26 January 2015	Crowdsourcing Ground Truth for Relation Extraction in the Medical Domain
29 February 2016	Scalable and High Quality Relation Extraction with CrowdTruth
31 October 2016	Crowdsourcing for Distant Supervision with Active Learning
22 May 2017	Crowdsourcing Ambiguity-Aware Ground Truth - a Cross-Task Evaluation
05 March 2018	Capturing Ambiguity in Crowdsourcing Frame Disambiguation

WAI schedule

Description

Other presentations by Anca Dumitrache