M-HMOGA: A New Multi-Objective Feature Selection Algorithm for Handwritten Numeral Classification
release_fpdy73lmovcvtpmjagw3zvtf5i
by
Ritam Guha, Manosij Ghosh, Pawan Kumar Singh, Ram Sarkar, Mita Nasipuri
Abstract
<jats:title>Abstract</jats:title>
The feature selection process is very important in the field of pattern recognition, which selects the informative features so as to reduce the curse of dimensionality, thus improving the overall classification accuracy. In this paper, a new feature selection approach named Memory-Based Histogram-Oriented Multi-objective Genetic Algorithm (M-HMOGA) is introduced to identify the informative feature subset to be used for a pattern classification problem. The proposed M-HMOGA approach is applied to two recently used feature sets, namely Mojette transform and <jats:italic>Regional Weighted Run Length</jats:italic> features. The experimentations are carried out on <jats:italic>Bangla</jats:italic>, <jats:italic>Devanagari</jats:italic>, and <jats:italic>Roman</jats:italic> numeral datasets, which are the three most popular scripts used in the Indian subcontinent. In-house <jats:italic>Bangla</jats:italic> and <jats:italic>Devanagari</jats:italic> script datasets and Competition on Handwritten Digit Recognition (HDRC) 2013 <jats:italic>Roman</jats:italic> numeral dataset are used for evaluating our model. Moreover, as proof of robustness, we have applied an innovative approach of using different datasets for training and testing. We have used in-house <jats:italic>Bangla</jats:italic> and <jats:italic>Devanagari</jats:italic> script datasets for training the model, and the trained model is then tested on Indian Statistical Institute numeral datasets. For <jats:italic>Roman</jats:italic> numerals, we have used the HDRC 2013 dataset for training and the Modified National Institute of Standards and Technology dataset for testing. Comparison of the results obtained by the proposed model with existing HMOGA and MOGA techniques clearly indicates the superiority of M-HMOGA over both of its ancestors. Moreover, use of K-nearest neighbor as well as multi-layer perceptron as classifiers speaks for the classifier-independent nature of M-HMOGA. The proposed M-HMOGA model uses only about 45–50% of the total feature set in order to achieve around 1% increase when the same datasets are partitioned for training-testing and a 2–3% increase in the classification ability while using only 35–45% features when different datasets are used for training-testing with respect to the situation when all the features are used for classification.
In application/xml+jats
format
Archived Files and Locations
application/pdf 1.5 MB
file_kj27ot3g35d4layfrlzp7jkhci
|
www.degruyter.com (publisher) web.archive.org (webarchive) |
article-journal
Stage
published
Date 2019-06-14
access all versions, variants, and formats of this works (eg, pre-prints)
Crossref Metadata (via API)
Worldcat
SHERPA/RoMEO (journal policies)
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar