Description

Title Identity Clusters Quality Estimation Under Low Power Discriminative Identity Criteria
Abstract In this presentation I’d like to share with you the results of a fruitful collaboration between the Golden Agents Project, in which I’m working now, and Al Idrissou’s research on cluster validation supervised by Frank. The goal of the presentation is mainly to obtain feedback before submitting it. In short, in their previous work they developed a network-based metric that has been successfully applied to validate clusters in two experiments with “well-behaved” data so to say, where the entities’ feature have a highly discriminative criterion. In our project, however, that is not the case. The data is very noisy and does not have good enough discriminative power as it mainly has names and dates. This, plus the size of the data lead us to a frightening large amount of possible links. The amount remains considerable even when considering only the derived clusters. Under these circumstances, their previous method does not perform well since it relies on high power discriminative criteria. The new method counts on context as evidence to assess the quality of the links in a cluster and produces more reliable clusters, which can then be evaluated by an enhanced version of the network-based metric.