ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation
release_oahsl4xy6jg3bmt2flm5dgb3lq
by
Ethan M. Rudd, Felipe N. Ducau, Cody Wild, Konstantin Berlin, and
Richard Harang
2019
Abstract
Malware detection is a popular application of Machine Learning for
Information Security (ML-Sec), in which an ML classifier is trained to predict
whether a given file is malware or benignware. Parameters of this classifier
are typically optimized such that outputs from the model over a set of input
samples most closely match the samples' true malicious/benign (1/0) target
labels. However, there are often a number of other sources of contextual
metadata for each malware sample, beyond an aggregate malicious/benign label,
including multiple labeling sources and malware type information (e.g.,
ransomware, trojan, etc.), which we can feed to the classifier as auxiliary
prediction targets. In this work, we fit deep neural networks to multiple
additional targets derived from metadata in a threat intelligence feed for
Portable Executable (PE) malware and benignware, including a multi-source
malicious/benign loss, a count loss on multi-source detections, and a semantic
malware attribute tag loss. We find that incorporating multiple auxiliary loss
terms yields a marked improvement in performance on the main detection task. We
also demonstrate that these gains likely stem from a more informed neural
network representation and are not due to a regularization artifact of
multi-target learning. Our auxiliary loss architecture yields a significant
reduction in detection error rate (false negatives) of 42.6
positive rate (FPR) of 10^-3 when compared to a similar model with only one
target, and a decrease of 53.8
In text/plain
format
Archived Files and Locations
application/pdf 552.2 kB
file_g4uyg5k66nanxhiov2wfvxpzy4
|
arxiv.org (repository) web.archive.org (webarchive) |
1903.05700v1
access all versions, variants, and formats of this works (eg, pre-prints)