Evolution of Privacy Loss in Wikipedia
release_cizyxpf7pvf2xifgfz6xjwj7vu
by
Marian-Andrei Rizoiu, Lexing Xie, Tiberio Caetano, Manuel Cebrian
2015
Abstract
The cumulative effect of collective online participation has an important and
adverse impact on individual privacy. As an online system evolves over time,
new digital traces of individual behavior may uncover previously hidden
statistical links between an individual's past actions and her private traits.
To quantify this effect, we analyze the evolution of individual privacy loss by
studying the edit history of Wikipedia over 13 years, including more than
117,523 different users performing 188,805,088 edits. We trace each Wikipedia's
contributor using apparently harmless features, such as the number of edits
performed on predefined broad categories in a given time period (e.g.
Mathematics, Culture or Nature). We show that even at this unspecific level of
behavior description, it is possible to use off-the-shelf machine learning
algorithms to uncover usually undisclosed personal traits, such as gender,
religion or education. We provide empirical evidence that the prediction
accuracy for almost all private traits consistently improves over time.
Surprisingly, the prediction performance for users who stopped editing after a
given time still improves. The activities performed by new users seem to have
contributed more to this effect than additional activities from existing (but
still active) users. Insights from this work should help users, system
designers, and policy makers understand and make long-term design choices in
online content creation systems.
In text/plain
format
Archived Files and Locations
application/pdf 2.5 MB
file_jo73smbgijg4pbv2tlmubq6teu
|
arxiv.org (repository) web.archive.org (webarchive) |
1512.03523v2
access all versions, variants, and formats of this works (eg, pre-prints)