Title : Annotating tables with quantities and units

Presenter Mark van Assem
Abstract We study how to automatically annotate quantitative research data stored in tables. Scientists, government agencies and other parties construct such tables on the fly, typically not with computational reuse in mind. Usually no metadata is assigned to table headers and cells. Quantities are abbreviated or contracted, units are omitted, and spelling mistakes are made. The ambiguity that follows severely limits retrieval and integration of relevant data. Our objective is to enrich the little information given by inferring the quantities and units implied by them and to express these semantically. We use the Ontology of Units of Measure and related concepts (OUM) as a source of annotation concepts. Our approach has these steps: (1) table extraction; (2) tokenization; (3) matching; (4) compound unit detection; and (5) disambiguation. We use heuristics and knowledge contained in OUM to perform the last three steps. The method of disambiguation we apply in this work is not performed by existing, generic text annotation systems. We evaluate our approach in terms of precision and recall on a dataset provided by a food research institute and a dataset of tables downloaded from the Web. This is joint work with Hajo Rijgersberg (WUR), Mari Wigham (WUR) and Jan Top (VU and WUR).

Title :

Presenter Ghazanfar Farooq
Abstract