Abstract |
Benchmarking graph-oriented database workloads and graph-oriented database
systems are increasingly becoming relevant in analytical Big Data tasks,
such as social network analysis. In graph data, structure is not mainly
found inside the nodes, but especially in the way nodes happen to be
connected, i.e. structural correlations. Because such structural
correlations determine join fan-outs experienced by graph analysis
algorithms and graph query executors, they are an essential, yet typically
neglected, ingredient of synthetic graph generators. To address this, we
present S3G2: a Scalable Structure-correlated Social Graph Generator.
This graph generator creates a synthetic social graph, containing
non-uniform value distributions and structural correlations, and is intended
as a testbed for scalable graph analysis algorithms and graph database
systems. We generalize the problem to decompose correlated graph generation
in multiple passes that each focus on one so-called correlation dimension;
each of which can be mapped to a MapReduce task. We show that using S3G2 can
generate social graphs that (i) share well-known graph connectivity
characteristics typically found in real social graphs (ii) contain certain
plausible structural correlations that influence the performance of graph
analysis algorithms and queries, and (iii) can be quickly generated at huge
sizes on common cluster hardware. |