Comparative study of ML models for IIoT intrusion detection: impact of data preprocessing and balancing

Eid, Abdulrahman Mahmoud; Soudan, Bassel; Nassif, Ali Bou; Injadat, MohammadNoor

doi:10.1007/s00521-024-09439-x

Comparative study of ML models for IIoT intrusion detection: impact of data preprocessing and balancing

Original Article
Published: 11 February 2024

Volume 36, pages 6955–6972, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1001 Accesses
44 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 08 May 2024

This article has been updated

Abstract

This study investigates the effectiveness of six prominent machine learning models—random forest, decision trees, K-nearest neighbor, logistic regression, support vector machines, and Naïve Bayes—for intrusion detection systems in industrial Internet of Things environments. The evaluation encompasses the effects of data preprocessing techniques, including feature engineering, data normalization, recoding, and missing data mitigation. Furthermore, the research delves into dataset balancing, examining the effects of six different techniques on model performance. The investigations are conducted using the domain-specific WUSTL-IIOT-2021 dataset, which captures the unique characteristics of IIoT data. The study also investigates multi-class attack identification utilizing an innovative SMOTE-based multi-class balancing approach to tackle dataset imbalances. The results indicate that data preprocessing and intelligent dataset balancing produce consistent enhancements in the classification performance of the selected models across binary and multi-classification tasks. Random forest emerges as the standout algorithm, delivering consistently high performance with computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

A Machine Learning-Based Vulnerability Detection Approach for the Imbalanced Dataset UNSW-NB15

Intrusion Detection Based on LSTM and Random Forests

IoT Guardian: An Intelligent Framework for Multi-Class Intrusion Detection with Machine Learning

Data availability

This study relies on a publicly available dataset (WUSTL-IIOT-2021). This dataset is available from the following reference—Zolanvari, Μ., Gupta, L., Khan, K. Μ., & Jain, R. (2021), WUSTL-IIOT-2O2l Dataset for IIoT Cybersecurity Research, Washington University in St. Louis, USA.

Change history

08 May 2024
A Correction to this paper has been published: https://doi.org/10.1007/s00521-024-09841-5

References

Stouffer K, Pillitteri V, Lightman S, et al (2015) Guide to industrial control systems (ICS) security NIST special publication 800–82 revision 2, pp 1–157
Smadi AA, Ajao BT, Johnson BK et al (2021) A comprehensive survey on cyber-physical smart grid testbed architectures: requirements and challenges. Electronics 10:1043. https://doi.org/10.3390/electronics10091043
Article Google Scholar
Bonetto R, Sychev I, Zhdanenko O, et al (2020) Smart grids for smarter cities. In: 2020 IEEE 17th annual consumer communications and networking conference (CCNC). https://doi.org/10.1109/CCNC46108.2020.9045309
Attar H (2023) Joint IoT/ML platforms for smart societies and environments: a review on multimodal information-based learning for safety and security. J Data Inf Qual. https://doi.org/10.1145/3603713
Article Google Scholar
Calabretta M, Pecori R, Vecchio M, Veltri L (2018) MQTT-AUTH: a token-based solution to endow MQTT with authentication and authorization capabilities. J Commun Softw Syst 14:320–331. https://doi.org/10.24138/jcomss.v14i4.604
Article Google Scholar
Calabretta M, Pecori R, Veltri L (2018) A token-based protocol for securing MQTT communications. In: Proceedings of the 26th international conference on software, telecommunications and computer networks, SoftCOM 2018, pp 373–378. https://doi.org/10.23919/SOFTCOM.2018.8555834
Nti IK, Adekoya AF, Narko-Boateng O, Somanathan AR (2022) Stacknet based decision fusion classifier for network intrusion detection. Int Arab J Inf Technol 19:478–490. https://doi.org/10.34028/iajit/19/3A/8
Article Google Scholar
Abdul Rahman Al-chikh Omar A, Soudan B, Ala’ Altaweel (2023) A comprehensive survey on detection of sinkhole attack in routing over low power and Lossy network for internet of things. Internet Things (Netherlands). https://doi.org/10.1016/j.iot.2023.100750
Article Google Scholar
Samara G, Aljaidi M, Alazaidah R, et al (2023) A comprehensive review of machine learning-based intrusion detection techniques for IoT networks. In: Artificial intelligence, Internet of Things, and society 5.0. pp 465–473
Manderna A, Kumar S, Dohare U et al (2023) Vehicular Network Intrusion Detection Using a Cascaded Deep Learning Approach with Multi-Variant Metaheuristic. Sensors 23:8772. https://doi.org/10.3390/s23218772
Article Google Scholar
Alamleh A, Albahri OS, Zaidan AA et al (2023) Federated Learning for IoMT Applications: A Standardization and Benchmarking Framework of Intrusion Detection Systems. IEEE J Biomed Heal Informatics 27:878–887. https://doi.org/10.1109/JBHI.2022.3167256
Article Google Scholar
Surakhi O, García A, Jamoos M, Alkhanafseh M (2022) The Intrusion detection system by deep learning methods: issues and challenges. Int Arab J Inf Technol 19:501–513. https://doi.org/10.34028/iajit/19/3A/10
Article Google Scholar
Keliris A, Salehghaffari H, Cairl B, et al (2016) Machine learning-based defense against process-aware attacks on industrial control systems. In: Proceedings of 2016 IEEE international test conference (ITC), pp 1–10. https://doi.org/10.1109/TEST.2016.7805855
Ullah I, Mahmoud QH (2017) A hybrid model for anomaly-based intrusion detection in SCADA networks. In: Proceedings of 2017 IEEE international conference on big data (big data), pp 2160–2167. https://doi.org/10.1109/BigData.2017.8258164
Vulfin AM, Vasilyev VI, Kuharev SN et al (2021) Algorithms for detecting network attacks in an enterprise industrial network based on data mining algorithms. J Phys Conf Ser. https://doi.org/10.1088/1742-6596/2001/1/012004
Article Google Scholar
Beaver JM, Borges-Hink RC, Buckner MA (2013) An evaluation of machine learning methods to detect malicious SCADA communications. In: Proceedings of 2013 12th international conference on machine learning and applications ICMLA, vol 2, pp 54–59. https://doi.org/10.1109/ICMLA.2013.105
Zhang Y, Ilić MD, Tonguz OK (2011) Mitigating blackouts via smart relays: a machine learning approach. Proc IEEE 99:94–118. https://doi.org/10.1109/JPROC.2010.2072970
Article Google Scholar
Maglaras LA, Jiang J (2014) Intrusion detection in SCADA systems using machine learning techniques. In: Proceedings of 2014 science and information conference, pp 626–631. https://doi.org/10.1109/SAI.2014.6918252
Song Y, Luo W, Li J, et al (2021) SDN-based Industrial Internet Security Gateway. In: 2021 International conference on security, pattern analysis, and cybernetics (SPAC), pp 238–243. https://doi.org/10.1109/SPAC53836.2021.9539961
Zolanvari M, Teixeira MA, Gupta L et al (2019) Machine learning-based network vulnerability analysis of industrial Internet of Things. IEEE Internet Things J 6:6822–6834. https://doi.org/10.1109/JIOT.2019.2912022
Article Google Scholar
Teixeira MA, Gupta L, Khan KM, Machine RJ (2021) WUSTL-IIOT-2021 dataset for IIoT cybersecurity research. Washington University, St. Louis
Google Scholar
Siebert J, Joeckel L, Heidrich J et al (2022) Construction of a quality model for machine learning systems. Softw Qual J 30:307–335. https://doi.org/10.1007/s11219-021-09557-y
Article Google Scholar
Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci. https://doi.org/10.1007/s42979-021-00815-1
Article Google Scholar
Eid AM, Nassif AB, Soudan B, Injadat MN (2023) IIoT network intrusion detection using machine learning. In: 2023 6th International conference on intelligent robotics and control engineering (IRCE). IEEE, pp 196–201
Ting KM (1998) Inducing cost-sensitive trees via instance weighting. Lect Notes Comput Sci (Subser Lect Notes Artif Intell Lect Notes Bioinf) 1510:139–147. https://doi.org/10.1007/bfb0094814
Article Google Scholar
Zhang YP, Zhang LN, Wang YC (2010) Cluster-based majority under-sampling approaches for class imbalance learning. In: Proceedings of 2010 2nd IEEE international conference on information and financial engineering, pp 400–404. https://doi.org/10.1109/ICIFE.2010.5609385
Richman R, Wuthrich MV (2020) Nagging predictors. SSRN Electron J. https://doi.org/10.2139/ssrn.3627163
Article Google Scholar
Mesevage TG (2021) Data cleaning steps and process to prep your data for success. MonkeyLearn, Montevideo
Google Scholar
Tableau (2022) Data cleaning: definition, benefits, and how-to. Tableau, Mountain View
Google Scholar
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. https://doi.org/10.1186/s12864-019-6413-7
Article Google Scholar
Chicco D, Jurman G (2023) The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. https://doi.org/10.1186/s13040-023-00322-4
Article Google Scholar
Khafajeh H (2020) An efficient intrusion detection approach using light gradient boosting. J Theor Appl Inf Technol 98:825–835
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, College of Computing and Informatics, University of Sharjah, Sharjah, UAE
Abdulrahman Mahmoud Eid, Bassel Soudan & Ali Bou Nassif
Department of Data Science and AI, Faculty of Information Technology, Zarqa University, Zarqa, Jordan
MohammadNoor Injadat

Authors

Abdulrahman Mahmoud Eid
View author publications
Search author on:PubMed Google Scholar
Bassel Soudan
View author publications
Search author on:PubMed Google Scholar
Ali Bou Nassif
View author publications
Search author on:PubMed Google Scholar
MohammadNoor Injadat
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Bassel Soudan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

The authors would like to convey their thanks and appreciation to the “University of Sharjah” for supporting this work.

Informed consent

This study does not involve any experiments on animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised to correct the third Author name

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Eid, A.M., Soudan, B., Nassif, A.B. et al. Comparative study of ML models for IIoT intrusion detection: impact of data preprocessing and balancing. Neural Comput & Applic 36, 6955–6972 (2024). https://doi.org/10.1007/s00521-024-09439-x

Download citation

Received: 10 October 2023
Accepted: 15 January 2024
Published: 11 February 2024
Version of record: 11 February 2024
Issue date: May 2024
DOI: https://doi.org/10.1007/s00521-024-09439-x

Keywords

Profiles

Ali Bou Nassif View author profile
MohammadNoor Injadat View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Comparative study of ML models for IIoT intrusion detection: impact of data preprocessing and balancing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Machine Learning-Based Vulnerability Detection Approach for the Imbalanced Dataset UNSW-NB15

Intrusion Detection Based on LSTM and Random Forests

IoT Guardian: An Intelligent Framework for Multi-Class Intrusion Detection with Machine Learning

Explore related subjects

Data availability

Change history

08 May 2024

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now