Muddling Label Regularization: Deep Learning for Tabular Datasets
release_4aarhtltffcxdphyhk4o3joope
by
Karim Lounici and Katia Meziani and Benjamin Riu
2021
Abstract
Deep Learning (DL) is considered the state-of-the-art in computer vision,
speech recognition and natural language processing. Until recently, it was also
widely accepted that DL is irrelevant for learning tasks on tabular data,
especially in the small sample regime where ensemble methods are acknowledged
as the gold standard. We present a new end-to-end differentiable method to
train a standard FFNN. Our method, Muddling labels for Regularization
(), penalizes memorization through the generation of uninformative
labels and the application of a differentiable close-form regularization scheme
on the last hidden layer during training. outperforms classical NN
and the gold standard (GBDT, RF) for regression and classification tasks on
several datasets from the UCI database and Kaggle covering a large range of
sample sizes and feature to sample ratios. Researchers and practitioners can
use on its own as an off-the-shelf solution or integrate it
into the most advanced ML pipelines.
In text/plain
format
Archived Files and Locations
application/pdf 459.9 kB
file_liajxvs4kjgl7hg3yfqcvzqimu
|
arxiv.org (repository) web.archive.org (webarchive) |
2106.04462v2
access all versions, variants, and formats of this works (eg, pre-prints)