Mining Protein Databases using Machine Learning Techniques release_q362fesju5fznjlxcergnmy6li

by Renata da Silva Camargo, Mahesan Niranjan

Published in Journal of Integrative Bioinformatics by Walter de Gruyter GmbH.

2008  

Abstract

<jats:title>Summary</jats:title>With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous.We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies.In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins.
In application/xml+jats format

Archived Files and Locations

application/pdf  499.4 kB
file_dxkktbqng5f7tet2npjp2hrzx4
web.archive.org (webarchive)
eprints.ecs.soton.ac.uk (web)
application/pdf  360.9 kB
file_ql6ostsydbhv5okeucxnzqzh4u
www.degruyter.com (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2008-06-01
Journal Metadata
Open Access Publication
In DOAJ
In Keepers Registry
ISSN-L:  1613-4516
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: b7c1c98b-6822-45a4-a8ef-32aba11d4b69
API URL: JSON