Just Train Twice: Improving Group Robustness without Training Group Information
release_vlhwdekqdbhhrdddej7v2cm6wu
by
Evan Zheran Liu, Behzad Haghgoo, Annie S. Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, Chelsea Finn
2021
Abstract
Standard training via empirical risk minimization (ERM) can produce models
that achieve high accuracy on average but low accuracy on certain groups,
especially in the presence of spurious correlations between the input and
label. Prior approaches that achieve high worst-group accuracy, like group
distributionally robust optimization (group DRO) require expensive group
annotations for each training point, whereas approaches that do not use such
group annotations typically achieve unsatisfactory worst-group accuracy. In
this paper, we propose a simple two-stage approach, JTT, that first trains a
standard ERM model for several epochs, and then trains a second model that
upweights the training examples that the first model misclassified.
Intuitively, this upweights examples from groups on which standard ERM models
perform poorly, leading to improved worst-group performance. Averaged over four
image classification and natural language processing tasks with spurious
correlations, JTT closes 75% of the gap in worst-group accuracy between
standard ERM and group DRO, while only requiring group annotations on a small
validation set in order to tune hyperparameters.
In text/plain
format
Archived Files and Locations
application/pdf 1.9 MB
file_u2thxlahzzh6lod7ktzaftxpwi
|
arxiv.org (repository) web.archive.org (webarchive) |
2107.09044v1
access all versions, variants, and formats of this works (eg, pre-prints)