Abstract |
Abstract:
We are experimenting with a task that involves evaluating conjunctive
queries over RDF (subject-property-object) graphs that are generated
from the results of processing text with state-of-the-art Information
Extraction. Given the fact that NLP results are imperfect, errors in
individual query conjuncts multiply, often causing Recall to drop
dramatically as queries add terms. To address this, we present a
hypothesis generation technique based on identifying missing graph edges
representing type or other binary relations that were not extracted from
the source text, and show that with suitable hypothesis validation
techniques drawn from the literature, we significantly improve Recall of
conjunctive queries while still improving F-measure.
Bio:
Chris Welty is a Research Scientist at the IBM T.J. Watson Research
Center in New York. Previously, he taught Computer Science at Vassar
College, taught at and received his Ph.D. from Rensselaer Polytechnice
Institute, and accumulated over 14 years of teaching experience before
moving to industrial research. Chris' principal area of research is
Knowledge Representation, specifically ontologies and the semantic web,
and he spends most of his time applying this technology to Natural
Language Question Answering as a member of the DeepQA/Watson team and,
in the past, Software Engineering. Dr. Welty is a co-chair of the W3C
Rules Interchange Format Working Group (RIF), serves on the steering
committee of the Formal Ontology in Information Systems Conferences, is
president of KR.ORG, on the editorial boards of AI Magazine, The Journal
of Applied Ontology, and The Journal of Web Semantics, and was an editor
in the W3C Web Ontology Working Group. While on sabbatical in 2000, he co-d
eveloped the OntoClean methodology with Nicola Guarino. Chris Welty's
work on ontologies and ontology methodology has appeared in CACM, and
numerous other publications.
|