Learning to de-anonymize social networks release_hn6cdv5cxjbufbo6gahlxl2boq

by Kumar Sharad, Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository, Ross Anderson

Published by Apollo - University of Cambridge Repository.

2017  

Abstract

Releasing anonymized social network data for analysis has been a popular idea among data providers. Despite evidence to the contrary the belief that anonymization will solve the privacy problem in practice refuses to die. This dissertation contributes to the field of social graph de-anonymization by demonstrating that even automated models can be quite successful in breaching the privacy of such datasets. We propose novel machine-learning based techniques to learn the identities of nodes in social graphs, thereby automating manual, heuristic-based attacks. Our work extends the vast literature of social graph de-anonymization attacks by systematizing them. We present a random-forests based classifier which uses structural node features based on neighborhood degree distribution to predict their similarity. Using these simple and efficient features we design versatile and expressive learning models which can learn the de-anonymization task just from a few examples. Our evaluation establishes their efficacy in transforming de-anonymization to a learning problem. The learning is transferable in that the model can be trained to attack one graph when trained on another. Moving on, we demonstrate the versatility and greater applicability of the proposed model by using it to solve the long-standing problem of benchmarking social graph anonymization schemes. Our framework bridges a fundamental research gap by making cheap, quick and automated analysis of anonymization schemes possible, without even requiring their full description. The benchmark is based on comparison of structural information leakage vs. utility preservation. We study the trade-off of anonymity vs. utility for six popular anonymization schemes including those promising k-anonymity. Our analysis shows that none of the schemes are fit for the purpose. Finally, we present an end-to-end social graph de-anonymization attack which uses the proposed machine learning techniques to recover node mappings across intersecting graphs. Our attack enhances the state of [...]
In text/plain format

Archived Files and Locations

application/pdf  6.9 MB
file_ehg76wxwxzgx5b2g7acwi2jzki
www.cl.cam.ac.uk:80 (web)
web.archive.org (webarchive)
application/pdf  6.0 MB
file_xrhfghgobjeipkcvgmur2jdy2y
www.repository.cam.ac.uk (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2017-02-23
Language   en ?
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 07ea2bd0-a81b-4288-8285-e1360ffa436f
API URL: JSON