Extending Isolation Forest for Anomaly Detection in Big Data via K-Means
release_cti4agnzpncobllgu3mrkl77ta
by
Md Tahmid Rahman Laskar, Jimmy Huang, Vladan Smetana, Chris Stewart, Kees Pouw, Aijun An, Stephen Chan, Lei Liu
2021
Abstract
Industrial Information Technology (IT) infrastructures are often vulnerable
to cyberattacks. To ensure security to the computer systems in an industrial
environment, it is required to build effective intrusion detection systems to
monitor the cyber-physical systems (e.g., computer networks) in the industry
for malicious activities. This paper aims to build such intrusion detection
systems to protect the computer networks from cyberattacks. More specifically,
we propose a novel unsupervised machine learning approach that combines the
K-Means algorithm with the Isolation Forest for anomaly detection in industrial
big data scenarios. Since our objective is to build the intrusion detection
system for the big data scenario in the industrial domain, we utilize the
Apache Spark framework to implement our proposed model which was trained in
large network traffic data (about 123 million instances of network traffic)
stored in Elasticsearch. Moreover, we evaluate our proposed model on the live
streaming data and find that our proposed system can be used for real-time
anomaly detection in the industrial setup. In addition, we address different
challenges that we face while training our model on large datasets and
explicitly describe how these issues were resolved. Based on our empirical
evaluation in different use-cases for anomaly detection in real-world network
traffic data, we observe that our proposed system is effective to detect
anomalies in big data scenarios. Finally, we evaluate our proposed model on
several academic datasets to compare with other models and find that it
provides comparable performance with other state-of-the-art approaches.
In text/plain
format
Archived Files and Locations
application/pdf 2.2 MB
file_24vidsak65bsvk3xdynzjd7ybq
|
arxiv.org (repository) web.archive.org (webarchive) |
2104.13190v1
access all versions, variants, and formats of this works (eg, pre-prints)