Abstract |
It is a common practice to store copies of entire RDF datasets locally, to improve accessibility and performance, and to ensure reliable access to the data.
This is no longer possible once the Linked Data cloud begins to approach web-like proportions. Because we often do not need the complete graph (e.g. for demoing, development or testing), a representative sample often suffices.
We created SampLD, a method for generating samples of graphs, based on the topology of the graph alone.
However, sampling large-scale graphs generates challenges and limitations to the sampling process, as well as our evaluation measures. In this presentation I will talk about these challenges and limitations, and how I dealt with them. |