Abstract |
Automatic conversion from existing databases has replaced manual engineering as the norm for knowledge graph
creation. This shift comes at the price of data quality, which is often reduced by either the copying of
existing artefacts or by the introduction of new ones caused by improper conversion. Detecting these artefacts
in an automatic way is difficult, and generally follows a top-down approach by validating data points against a
provided set of constraints (e.g., business rules). In this research, we introduce a bottom-up approach that
generates constraints directly from the data themselves. Specifically tailored to knowledge graphs, these
constraints take contexts (i.e., subgraphs) into account, exploit common semantics (RDF/RDFS), and incorporate
prior knowledge (e.g., schemas). Once generated, we can check any knowledge graph using an arbitrary SHACL
validator. Experiments were held in the asset management domain, and involved the generation of constraints and
the validation of data using these constraints. The results were evaluated by 1) comparison with a gold
standard, and 2) by assessing the method's usefulness within a focus group. |