ENFrame: A Platform for Processing Probabilistic Data release_mqv5xq53srhgrf4k6hiw2qxnxy

by Sebastiaan J. van Schaik and Dan Olteanu and Robert Fink

Released as a article .

2013  

Abstract

This paper introduces ENFrame, a unified data processing platform for querying and mining probabilistic data. Using ENFrame, users can write programs in a fragment of Python with constructs such as bounded-range loops, list comprehension, aggregate operations on lists, and calls to external database engines. The program is then interpreted probabilistically by ENFrame. The realisation of ENFrame required novel contributions along several directions. We propose an event language that is expressive enough to succinctly encode arbitrary correlations, trace the computation of user programs, and allow for computation of discrete probability distributions of program variables. We exemplify ENFrame on three clustering algorithms: k-means, k-medoids, and Markov Clustering. We introduce sequential and distributed algorithms for computing the probability of interconnected events exactly or approximately with error guarantees. Experiments with k-medoids clustering of sensor readings from energy networks show orders-of-magnitude improvements of exact clustering using ENFrame over na\"ive clustering in each possible world, of approximate over exact, and of distributed over sequential algorithms.
In text/plain format

Archived Files and Locations

application/pdf  988.0 kB
file_ptsldinxmjbollf5lrfoksn7ia
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2013-09-02
Version   v1
Language   en ?
arXiv  1309.0373v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 5dcb2d43-dbaa-40f7-ac19-cccbce73365b
API URL: JSON