Description

Title Provenance In and Outside the Database
Abstract Domains such as drug discovery, web science, and policy studies increasing rely on the combination of complex analysis pipelines with integrated data sources to come to conclusions. A key question then arises is what are these conclusions based upon? (i.e. what is their provenance?). In this talk, I describe recent work that is attempting to combine provenance within databases with the data integration and analytics pipelines that feed them. In particular, how can we mix the concepts of dataset description, provenance polynomials and Web-based provenance models. I discuss this with respect to large scale drug discovery platform, Open PHACTS (http://www.openphacts.org), that combines tens of databases with billions of facts.