Abstract |
Linking entities between datasets is a crucial step in data-integration
in general, and in the use of multiple datasets on the semantic web in
particular. A rich literature exists on different approaches to the
entity linking problem, and a fair amount of tools is available for
practical use. However, much less work has been done on how to assess
the quality of such entity links once they have been generated by
any of these tools. Evaluation methods for link quality are typically
limited to either comparison with a ground truth (which is often not at
one's disposal), manual work (which is cumbersome and prone to error),
or crowd sourcing (which is not always feasible, especially if
background information is required). Furthermore, the problem of link
evaluation is greatly exacerbated for links between more than two
datasets, because the number of possible links grows rapidly with the
number of datasets.
In this paper we propose a method to estimate the quality of such
entity links between multiple datasets. We exploit the fact that the links between
entities from multiple datasets form a network, and we show how
simple metrics on this network of entity-links can
reliably predict the quality of these links.
We verify our results in a large experimental study using six datasets from the
domain of science and innovation studies. |