Abstract |
Inspired by the Freedom of Information Act and the trend for governments and government-funded organizations, a lot of proprietary open data archives are made available for the masses. These data sources differ technically , syntactical, semantically and contextually. In order to process these data for a task it was non initially meant for, we can apply basic semantic web incubation technology, such as named-entity recognition. We investigate what kind of knowledge needs to be added to a simple named entity recognizer in order find patterns within and between individual non-linked data archives. The challenges that we have to resolve are (1) selection of concepts and relations to recognize (2) discover relations that are not described in the source and (3) relate the concepts and relations to the task of system. In this case the interest of cultural heritage within the Netherlands. Preliminary results are presented based on four data archives: Dutch Wikipedia, Amsterdam Historical Museum, Dutch broadcast data (Nederland 1,2,3) and Dutch governmental public relations data (Postbus 51).
|