Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks
release_4l2lyxt3bjes7ckh5c5rsevm6a
by
Fatemehsadat Mireshghallah, Kartik Goyal, Archit Uniyal, Taylor Berg-Kirkpatrick, Reza Shokri
2022
Abstract
The wide adoption and application of Masked language models~(MLMs) on
sensitive data (from legal to medical) necessitates a thorough quantitative
investigation into their privacy vulnerabilities -- to what extent do MLMs leak
information about their training data? Prior attempts at measuring leakage of
MLMs via membership inference attacks have been inconclusive, implying the
potential robustness of MLMs to privacy attacks. In this work, we posit that
prior attempts were inconclusive because they based their attack solely on the
MLM's model score. We devise a stronger membership inference attack based on
likelihood ratio hypothesis testing that involves an additional reference MLM
to more accurately quantify the privacy risks of memorization in MLMs. We show
that masked language models are extremely susceptible to likelihood ratio
membership inference attacks: Our empirical results, on models trained on
medical notes, show that our attack improves the AUC of prior membership
inference attacks from 0.66 to an alarmingly high 0.90 level, with a
significant improvement in the low-error region: at 1% false positive rate, our
attack is 51X more powerful than prior work.
In text/plain
format
Archived Files and Locations
application/pdf 2.0 MB
file_rrx4e26zunaltkiipy4wvcec34
|
arxiv.org (repository) web.archive.org (webarchive) |
2203.03929v1
access all versions, variants, and formats of this works (eg, pre-prints)