Universal Graph Compression: Stochastic Block Models
release_kb27g2dorvhvzboxiz6pjnj2xi
by
Alankrita Bhatt, Ziao Wang, Chi Wang, Lele Wang
2021
Abstract
Motivated by the prevalent data science applications of processing and mining
large-scale graph data such as social networks, web graphs, and biological
networks, as well as the high I/O and communication costs of storing and
transmitting such data, this paper investigates lossless compression of data
appearing in the form of a labeled graph. A universal graph compression scheme
is proposed, which does not depend on the underlying statistics/distribution of
the graph model. For graphs generated by a stochastic block model, which is a
widely used random graph model capturing the clustering effects in social
networks, the proposed scheme achieves the optimal theoretical limit of
lossless compression without the need to know edge probabilities, community
labels, or the number of communities.
The key ideas in establishing universality for stochastic block models
include: 1) block decomposition of the adjacency matrix of the graph; 2)
generalization of the Krichevsky-Trofimov probability assignment, which was
initially designed for i.i.d. random processes. In four benchmark graph
datasets (protein-to-protein interaction, LiveJournal friendship, Flickr, and
YouTube), the compressed files from competing algorithms (including CSR,
Ligra+, PNG image compressor, and Lempel-Ziv compressor for two-dimensional
data) take 2.4 to 27 times the space needed by the proposed scheme.
In text/plain
format
Archived Files and Locations
application/pdf 278.3 kB
file_6zjslwe2kbb7lmy6fb7azpi2ki
|
arxiv.org (repository) web.archive.org (webarchive) |
2006.02643v2
access all versions, variants, and formats of this works (eg, pre-prints)