How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing
release_rksy7oxxlbde5bol3ay44yycru
by
Samuel Sousa, Roman Kern
2022
Abstract
Deep learning (DL) models for natural language processing (NLP) tasks often
handle private data, demanding protection against breaches and disclosures.
Data protection laws, such as the European Union's General Data Protection
Regulation (GDPR), thereby enforce the need for privacy. Although many
privacy-preserving NLP methods have been proposed in recent years, no
categories to organize them have been introduced yet, making it hard to follow
the progress of the literature. To close this gap, this article systematically
reviews over sixty DL methods for privacy-preserving NLP published between 2016
and 2020, covering theoretical foundations, privacy-enhancing technologies, and
analysis of their suitability for real-world scenarios. First, we introduce a
novel taxonomy for classifying the existing methods into three categories: data
safeguarding methods, trusted methods, and verification methods. Second, we
present an extensive summary of privacy threats, datasets for applications, and
metrics for privacy evaluation. Third, throughout the review, we describe
privacy issues in the NLP pipeline in a holistic view. Further, we discuss open
challenges in privacy-preserving NLP regarding data traceability, computation
overhead, dataset size, the prevalence of human biases in embeddings, and the
privacy-utility tradeoff. Finally, this review presents future research
directions to guide successive research and development of privacy-preserving
NLP models.
In text/plain
format
Archived Files and Locations
application/pdf 1.4 MB
file_vlqubnppvvdwxhyjevwkwu45le
|
arxiv.org (repository) web.archive.org (webarchive) |
2205.10095v1
access all versions, variants, and formats of this works (eg, pre-prints)