ENFrame: A Platform for Processing Probabilistic Data
release_mqv5xq53srhgrf4k6hiw2qxnxy
by
Sebastiaan J. van Schaik and Dan Olteanu and Robert Fink
2013
Abstract
This paper introduces ENFrame, a unified data processing platform for
querying and mining probabilistic data. Using ENFrame, users can write programs
in a fragment of Python with constructs such as bounded-range loops, list
comprehension, aggregate operations on lists, and calls to external database
engines. The program is then interpreted probabilistically by ENFrame.
The realisation of ENFrame required novel contributions along several
directions. We propose an event language that is expressive enough to
succinctly encode arbitrary correlations, trace the computation of user
programs, and allow for computation of discrete probability distributions of
program variables. We exemplify ENFrame on three clustering algorithms:
k-means, k-medoids, and Markov Clustering. We introduce sequential and
distributed algorithms for computing the probability of interconnected events
exactly or approximately with error guarantees. Experiments with k-medoids
clustering of sensor readings from energy networks show orders-of-magnitude
improvements of exact clustering using ENFrame over na\"ive clustering in each
possible world, of approximate over exact, and of distributed over sequential
algorithms.
In text/plain
format
Archived Files and Locations
application/pdf 988.0 kB
file_ptsldinxmjbollf5lrfoksn7ia
|
arxiv.org (repository) web.archive.org (webarchive) |
1309.0373v1
access all versions, variants, and formats of this works (eg, pre-prints)