Addressing Scalability Issues Of Named Entity Recognition Using Multi-Class Support Vector Machines
release_jea3vravl5g7tax5fjik7xfhxa
by
Mona Soliman Habib
2008
Abstract
This paper explores the scalability issues associated
with solving the Named Entity Recognition (NER) problem using
Support Vector Machines (SVM) and high-dimensional features. The
performance results of a set of experiments conducted using binary
and multi-class SVM with increasing training data sizes are
examined. The NER domain chosen for these experiments is the
biomedical publications domain, especially selected due to its
importance and inherent challenges. A simple machine learning
approach is used that eliminates prior language knowledge such as
part-of-speech or noun phrase tagging thereby allowing for its
applicability across languages. No domain-specific knowledge is
included. The accuracy measures achieved are comparable to those
obtained using more complex approaches, which constitutes a
motivation to investigate ways to improve the scalability of multiclass
SVM in order to make the solution more practical and useable.
Improving training time of multi-class SVM would make support
vector machines a more viable and practical machine learning
solution for real-world problems with large datasets. An initial
prototype results in great improvement of the training time at the
expense of memory requirements.
In text/plain
format
Archived Files and Locations
application/pdf 526.7 kB
file_gjxw3clxwnhlvix5tjchs5lrnm
|
zenodo.org (repository) web.archive.org (webarchive) |
access all versions, variants, and formats of this works (eg, pre-prints)
Datacite Metadata (via API)
Worldcat
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar