Muddling Label Regularization: Deep Learning for Tabular Datasets release_4aarhtltffcxdphyhk4o3joope

by Karim Lounici and Katia Meziani and Benjamin Riu

Released as a article .

2021  

Abstract

Deep Learning (DL) is considered the state-of-the-art in computer vision, speech recognition and natural language processing. Until recently, it was also widely accepted that DL is irrelevant for learning tasks on tabular data, especially in the small sample regime where ensemble methods are acknowledged as the gold standard. We present a new end-to-end differentiable method to train a standard FFNN. Our method, Muddling labels for Regularization (), penalizes memorization through the generation of uninformative labels and the application of a differentiable close-form regularization scheme on the last hidden layer during training. outperforms classical NN and the gold standard (GBDT, RF) for regression and classification tasks on several datasets from the UCI database and Kaggle covering a large range of sample sizes and feature to sample ratios. Researchers and practitioners can use on its own as an off-the-shelf solution or integrate it into the most advanced ML pipelines.
In text/plain format

Archived Files and Locations

application/pdf  459.9 kB
file_liajxvs4kjgl7hg3yfqcvzqimu
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-06-29
Version   v2
Language   en ?
arXiv  2106.04462v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: af7c07b1-950c-4d64-b88a-4821f4db8f3e
API URL: JSON