Distributed Publish/Subscribe Query Processing on the Spatio-Textual
Data Stream
release_hfouruudgvbhpp3c2bkum6flla
by
Zhida Chen, Gao Cong, Zhenjie Zhang, Tom Z.J. Fu, Lisi Chen
2016
Abstract
Huge amount of data with both space and text information, e.g., geo-tagged
tweets, is flooding on the Internet. Such spatio-textual data stream contains
valuable information for millions of users with various interests on different
keywords and locations. Publish/subscribe systems enable efficient and
effective information distribution by allowing users to register continuous
queries with both spatial and textual constraints. However, the explosive
growth of data scale and user base has posed challenges to the existing
centralized publish/subscribe systems for spatio-textual data streams.
In this paper, we propose our distributed publish/subscribe system, called
PS2Stream, which digests a massive spatio-textual data stream and directs the
stream to target users with registered interests. Compared with existing
systems, PS2Stream achieves a better workload distribution in terms of both
minimizing the total amount of workload and balancing the load of workers. To
achieve this, we propose a new workload distribution algorithm considering both
space and text properties of the data. Additionally, PS2Stream supports dynamic
load adjustments to adapt to the change of the workload, which makes PS2Stream
adaptive. Extensive empirical evaluation, on commercial cloud computing
platform with real data, validates the superiority of our system design and
advantages of our techniques on system performance improvement.
In text/plain
format
Archived Files and Locations
application/pdf 1.2 MB
file_a6ck7lam4ncx7lcsrubvwdcmye
|
arxiv.org (repository) web.archive.org (webarchive) |
1612.02564v1
access all versions, variants, and formats of this works (eg, pre-prints)