Robust Neural Malware Detection Models for Emulation Sequence Learning
release_2u5mcgjbvzgi3dnxajajfm2euq
by
Rakshit Agrawal, Jack W. Stokes, Mady Marinescu, Karthik Selvaraj
2018
Abstract
Malicious software, or malware, presents a continuously evolving challenge in
computer security. These embedded snippets of code in the form of malicious
files or hidden within legitimate files cause a major risk to systems with
their ability to run malicious command sequences. Malware authors even use
polymorphism to reorder these commands and create several malicious variations.
However, if executed in a secure environment, one can perform early malware
detection on emulated command sequences.
The models presented in this paper leverage this sequential data derived via
emulation in order to perform Neural Malware Detection. These models target the
core of the malicious operation by learning the presence and pattern of
co-occurrence of malicious event actions from within these sequences. Our
models can capture entire event sequences and be trained directly using the
known target labels. These end-to-end learning models are powered by two
commonly used structures - Long Short-Term Memory (LSTM) Networks and
Convolutional Neural Networks (CNNs). Previously proposed sequential malware
classification models process no more than 200 events. Attackers can evade
detection by delaying any malicious activity beyond the beginning of the file.
We present specialized models that can handle extremely long sequences while
successfully performing malware detection in an efficient way. We present an
implementation of the Convoluted Partitioning of Long Sequences approach in
order to tackle this vulnerability and operate on long sequences. We present
our results on a large dataset consisting of 634,249 file sequences, with
extremely long file sequences.
In text/plain
format
Archived Files and Locations
application/pdf 289.0 kB
file_5juebksm55dbfnyoirevb254f4
|
arxiv.org (repository) web.archive.org (webarchive) |
1806.10741v1
access all versions, variants, and formats of this works (eg, pre-prints)