Fast and scalable learning of neuro-symbolic representations of
biomedical knowledge
release_xnq3qva3qbg4fcn6twoyhpkele
by
Asan Agibetov, Matthias Samwald
2018
Abstract
In this work we address the problem of fast and scalable learning of
neuro-symbolic representations for general biological knowledge. Based on a
recently published comprehensive biological knowledge graph (Alshahrani, 2017)
that was used for demonstrating neuro-symbolic representation learning, we show
how to train fast (under 1 minute) log-linear neural embeddings of the
entities. We utilize these representations as inputs for machine learning
classifiers to enable important tasks such as biological link prediction.
Classifiers are trained by concatenating learned entity embeddings to represent
entity relations, and training classifiers on the concatenated embeddings to
discern true relations from automatically generated negative examples. Our
simple embedding methodology greatly improves on classification error compared
to previously published state-of-the-art results, yielding a maximum increase
of +0.28 F-measure and +0.22 ROC AUC scores for the most difficult
biological link prediction problem. Finally, our embedding approach is orders
of magnitude faster to train (≤ 1 minute vs. hours), much more economical
in terms of embedding dimensions (d=50 vs. d=512), and naturally encodes
the directionality of the asymmetric biological relations, that can be
controlled by the order with which we concatenate the embeddings.
In text/plain
format
Archived Files and Locations
application/pdf 367.1 kB
file_x6jxbs7yx5ej7ilmjzjrezkpd4
|
arxiv.org (repository) web.archive.org (webarchive) |
1804.11105v1
access all versions, variants, and formats of this works (eg, pre-prints)