Description

Title Annotating tables with quantities and units
Abstract We study how to automatically annotate quantitative research data stored in tables. Scientists, government agencies and other parties construct such tables on the fly, typically not with computational reuse in mind. Usually no metadata is assigned to table headers and cells. Quantities are abbreviated or contracted, units are omitted, and spelling mistakes are made. The ambiguity that follows severely limits retrieval and integration of relevant data. Our objective is to enrich the little information given by inferring the quantities and units implied by them and to express these semantically. We use the Ontology of Units of Measure and related concepts (OUM) as a source of annotation concepts. Our approach has these steps: (1) table extraction; (2) tokenization; (3) matching; (4) compound unit detection; and (5) disambiguation. We use heuristics and knowledge contained in OUM to perform the last three steps. The method of disambiguation we apply in this work is not performed by existing, generic text annotation systems. We evaluate our approach in terms of precision and recall on a dataset provided by a food research institute and a dataset of tables downloaded from the Web. This is joint work with Hajo Rijgersberg (WUR), Mari Wigham (WUR) and Jan Top (VU and WUR).

Other presentations by Mark van Assem

DateTitle
02 October 2006
23 June 2008 ROC: a method for proto-ontology construction by domain experts
23 March 2009 Cognitive theory for the SemWeb: Natural Categories
15 March 2010 Annotating tables with quantities and units