Abstract |
Web pages contain a lot of information in form of semi-structured text. Semantic technologies offer the possibility to make it available in a machine-processable way, realizing a step forward in the integration of heterogeneous data sources and thus enabling different querying mechanisms, including reasoning and inferences over it.
Nowadays, there are different Named Entity Recognizer tools available for this purpose.
Moreover, the NERD API already offers a unified named-entity extraction framework that, embedding different extractors, is able to return a single annotation of a text combining the multiple extractors solutions with a conflict resolution mechanism.
Starting from the enrichment of the content of BBC programs as a case of study, I will introduce some limits of the NERD framework when adopted in the combined strategy. Then, the challenges arising in creating an automatic and extractor independent annotation will be discussed, presenting the current research results and issues.
|