Classification Method Performance in High Dimensions release_w3b2rxj7zncgjhcyjpeizzyunq

by Claus Weihs, Tobias Kassner, Technische Universität Dortmund, Technische Universität Dortmund

Published by Technische Universität Dortmund.

2018  

Abstract

We discuss standard classification methods for high-dimensional data and a small number of observations. By means of designed simulations illustrating the practical relevance of theoretical results we show that in the 2-class case the following rules of thumb should be followed in such a situation to avoid the worst error rate, namely the probability π1 of the smaller class: Avoid "complicated" classifiers: The independence rule (ir) might be adequate, the support vector machine (svm) should only be considered as an expensive alternative, which is additionally sensitive to noise factors. From the outset, look for stochastically independent dimensions and balanced classes. Only take into account features which influence class separation sufficiently. Variable selection might help, though filters might be too rough. Compare your result with the result of the data independent rule "Always predict the larger class".
In text/plain format

Archived Files and Locations

application/pdf  802.1 kB
file_qhmkfuewk5bqrpd34ak6kvsaa4
eldorado.tu-dortmund.de (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  report
Stage   published
Date   2018-04-17
Language   en ?
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 01ca3a34-1582-490b-af98-67071add77fb
API URL: JSON