Locally Adaptive Label Smoothing for Predictive Churn
release_d6ak5z2q7zgaxeqti2bt435g6a
by
Dara Bahri, Heinrich Jiang
2021
Abstract
Training modern neural networks is an inherently noisy process that can lead
to high prediction churn – disagreements between re-trainings of the
same model due to factors such as randomization in the parameter initialization
and mini-batches – even when the trained models all attain similar accuracies.
Such prediction churn can be very undesirable in practice. In this paper, we
present several baselines for reducing churn and show that training on soft
labels obtained by adaptively smoothing each example's label based on the
example's neighboring labels often outperforms the baselines on churn while
improving accuracy on a variety of benchmark classification tasks and model
architectures.
In text/plain
format
Archived Files and Locations
application/pdf 1.3 MB
file_4k3jul44bzaufpi5ggz3gghlca
|
arxiv.org (repository) web.archive.org (webarchive) |
2102.05140v2
access all versions, variants, and formats of this works (eg, pre-prints)