Skip to main content
Log in

Balancing the act? Resampling versus imbalanced data for Wi-Fi IDS

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Ensuring the security of Wi-Fi networks is a top priority and an ongoing research topic for academic and industry research groups. This work investigates the impact of class distribution in benchmark datasets on the detection performance of machine learning (ML)-driven wireless intrusion detection systems (WIDS). More specifically, we aim to address a critical question not yet explored in the literature: To what extent does class distribution influence WIDS performance? In pursuit of answers, we consider a multiclass problem and utilize several resampling schemes, including oversampling, undersampling, and both under- and oversampling, exploring their impingement on the classification results in terms of standard metrics. We evaluate a variety of ML models, both traditional and deep ones, and contrast their performance against those yielded when the algorithms are trained on the original imbalanced dataset. The key finding is that training on the original imbalanced dataset significantly reduces false positives vis-á-vis a balanced dataset created by any resampling technique. This reduction in terms of macro-averaged false positive rate is particularly noteworthy, approximately 139 and 43 times lower false positives compared to the worst and best resampling scheme, respectively. However, while resampled balanced datasets can lead to fewer false negatives (missed attacks), the improvement is less pronounced, at about 1.2 times lower relative to the imbalanced dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from €37.37 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Fig. 1
The alternative text for this image may have been generated using AI.
Fig. 2
The alternative text for this image may have been generated using AI.
Fig. 3
The alternative text for this image may have been generated using AI.
Fig. 4
The alternative text for this image may have been generated using AI.
Fig. 5
The alternative text for this image may have been generated using AI.
Fig. 6
The alternative text for this image may have been generated using AI.
Fig. 7
The alternative text for this image may have been generated using AI.
Fig. 8
The alternative text for this image may have been generated using AI.
Fig. 9
The alternative text for this image may have been generated using AI.
Fig. 10
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Kolias, C., Kambourakis, G., Stavrou, A., Gritzalis, S.: Intrusion detection in 802.11 networks: empirical evaluation of threats and a public dataset. IEEE Commun. Surv. Tutor. 18(1), 184–208 (2016). https://doi.org/10.1109/COMST.2015.2402161

  2. Chatzoglou, E., Kambourakis, G., Kolias, C.: Empirical evaluation of attacks against ieee 802.11 enterprise networks: the awid3 dataset. IEEE Access 9, 34188–34205 (2021). https://doi.org/10.1109/ACCESS.2021.3061609

  3. Chatzoglou, E., Kampourakis, V., Kambourakis, G.: Bl0ck: Paralyzing 802.11 connections through block ack frames. In: Meyer, N., Grocholewska-Czurylo, A. (eds.) ICT Systems Security and Privacy Protection - 38th IFIP TC 11 International Conference, SEC 2023, Poznan, Poland, June 14-16, 2023, Revised Selected Papers. IFIP Advances in Information and Communication Technology, vol. 679, pp. 250–264. Springer, (2023). https://doi.org/10.1007/978-3-031-56326-3_18

  4. Chatzoglou, E., Kambourakis, G., Kolias, C.: How is your wi-fi connection today? Dos attacks on WPA3-SAE. J. Inf. Secur. Appl. 64, 103058 (2022). https://doi.org/10.1016/J.JISA.2021.103058

    Article  Google Scholar 

  5. Schepers, D., Ranganathan, A., Vanhoef, M.: On the robustness of wi-fi deauthentication countermeasures. In: Proceedings of the 15th ACM Conference on Security and Privacy in Wireless and Mobile Networks. WiSec ’22, pp. 245–256. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3507657.3528548

  6. Schepers, D., Ranganathan, A., Vanhoef, M.: Framing frames: Bypassing wi-fi encryption by manipulating transmit queues. In: Calandrino, J.A., Troncoso, C. (eds.) 32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, pp. 53–68. USENIX Association, (2023). https://www.usenix.org/conference/usenixsecurity23/presentation/schepers

  7. Gollier, H., Vanhoef, M.: SSID confusion: Making wi-fi clients connect to the wrong network. In: Kim, Y., Kim, J., Koushanfar, F., Rasmussen, K. (eds.) Proceedings of the 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks, WiSec 2024, Seoul, Republic of Korea, May 27-29, 2024, pp. 156–161. ACM, (2024). https://doi.org/10.1145/3643833.3656126

  8. Kampourakis, V., Chatzoglou, E., Kambourakis, G., Dolmes, A., Zaroliagis, C.D.: Wpaxfuzz: Sniffing out vulnerabilities in wi-fi implementations. Cryptography 6(4), 53 (2022). https://doi.org/10.3390/CRYPTOGRAPHY6040053

    Article  Google Scholar 

  9. Chatzoglou, E., Kambourakis, G., Kolias, C., Smiliotopoulos, C.: Pick quality over quantity: expert feature selection and data preprocessing for 802.11 intrusion detection systems. IEEE Access 10, 64761–64784 (2022). https://doi.org/10.1109/ACCESS.2022.3183597

  10. Chatzoglou, E., Kambourakis, G., Smiliotopoulos, C., Kolias, C.: Best of both worlds: detecting application layer attacks through 802.11 and non-802.11 features. Sensors 22(15), 5633 (2022). https://doi.org/10.3390/S22155633

  11. Belenguer, A., Pascual, J.A., Navaridas, J.: Göwfed: a novel federated network intrusion detection system. J. Netw. Comput. Appl. 217, 103653 (2023). https://doi.org/10.1016/j.jnca.2023.103653

    Article  Google Scholar 

  12. Arp, D., Quiring, E., Pendlebury, F., Warnecke, A., Pierazzi, F., Wressnegger, C., Cavallaro, L., Rieck, K.: Dos and don’ts of machine learning in computer security. In: Butler, K.R.B., Thomas, K. (eds.) 31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10-12, 2022, pp. 3971–3988. USENIX Association, (2022)

  13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/JAIR.953

    Article  MATH  Google Scholar 

  14. Batista, G.E.A.P.A., Bazzan, A.L.C., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: Lifschitz, S., Jr., N.F.A., Jr., G.J.P., Linden, R. (eds.) II Brazilian Workshop on Bioinformatics, December 3-5, 2003, Macaé, RJ, Brazil, pp. 10–18 (2003)

  15. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004). https://doi.org/10.1145/1007730.1007735

  16. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS One 10(3), 0118432 (2015)

    Article  Google Scholar 

  17. Boyd, K., Costa, V.S., Davis, J., Page, C.D.: Unachievable region in precision-recall space and its effect on empirical evaluation. In: Proceedings of the International Conference on Machine Learning. International Conference on Machine Learning, vol. 2012, p. 349 (2012). NIH Public Access

  18. Pozzolo, A.D., Caelen, O., Waterschoot, S., Bontempi, G.: Racing for unbalanced methods selection. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., Yao, X. (eds.) Intelligent Data Engineering and Automated Learning - IDEAL 2013 - 14th International Conference, IDEAL 2013, Hefei, China, October 20-23, 2013. Proceedings. Lecture Notes in Computer Science, vol. 8206, pp. 24–31. Springer, (2013). https://doi.org/10.1007/978-3-642-41278-3_4

  19. Pozzolo, A.D., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Costa, V.S., Soares, C., Gama, J., Jorge, A. (eds.) Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I. Lecture Notes in Computer Science, vol. 9284, pp. 200–215. Springer, (2015).https://doi.org/10.1007/978-3-319-23528-8_13

  20. Li, Q., Zhao, C., He, X., Chen, K., Wang, R.: The impact of partial balance of imbalanced dataset on classification performance. Electronics 11(9), 1322 (2022)

    Article  MATH  Google Scholar 

  21. kuan: Moore Dataset. https://figshare.com/articles/dataset/Moore_dataset/18467507. last visited 10/08/2023 (2022)

  22. Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: A detailed analysis of the cicids2017 data set. In: Information Systems Security and Privacy: 4th International Conference, ICISSP 2018, Funchal-Madeira, Portugal, January 22-24, 2018, Revised Selected Papers 4, pp. 172–188 (2019). Springer

  23. Wang, T., Zhao, J., Yatskar, M., Chang, K.-W., Ordonez, V.: Balanced datasets are not enough: estimating and mitigating gender bias in deep image representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

  24. Pawlicki, M., Choras, M., Kozik, R., Holubowicz, W.: On the impact of network data balancing in cybersecurity applications. In: Krzhizhanovskaya, V.V., Závodszky, G., Lees, M.H., Dongarra, J.J., Sloot, P.M.A., Brissos, S., Teixeira, J. (eds.) Computational Science - ICCS 2020 - 20th International Conference, Amsterdam, The Netherlands, June 3-5, 2020, Proceedings, Part IV. Lecture Notes in Computer Science, vol. 12140, pp. 196–210. Springer, (2020).https://doi.org/10.1007/978-3-030-50423-6_15

  25. Pokhrel, S., Abbas, R., Aryal, B.: Iot security: Botnet detection in iot using machine learning. CoRR abs/2104.02231 (2021) https://arxiv.org/abs/2104.02231

  26. Al-Abassi, A., Karimipour, H., Dehghantanha, A., Parizi, R.M.: An ensemble deep learning-based cyber-attack detection in industrial control system. IEEE Access 8, 83965–83973 (2020). https://doi.org/10.1109/ACCESS.2020.2992249

    Article  Google Scholar 

  27. Talukder, M.A., Hasan, K.F., Islam, M.M., Uddin, A., Akhter, A., Yousuf, M.A., Alharbi, F., Moni, M.A.: A dependable hybrid machine learning model for network intrusion detection. J. Inf. Secur. Appl. 72, 103405 (2023). https://doi.org/10.1016/J.JISA.2022.103405

    Article  Google Scholar 

  28. Rani, M.: Gagandeep: An efficient network intrusion detection system based on feature selection using evolutionary algorithm over balanced dataset. In: Marriwala, N., Tripathi, C.C., Jain, S., Kumar, D. (eds.) Mobile Radio Communications and 5G Networks, pp. 179–193. Springer, Singapore (2022)

    Chapter  MATH  Google Scholar 

  29. Zebin, T., Rezvy, S., Luo, Y.: An explainable AI-based intrusion detection system for dns over https (doh) attacks. IEEE Trans. Inf. Forensics Secur. 17, 2339–2349 (2022). https://doi.org/10.1109/TIFS.2022.3183390

    Article  MATH  Google Scholar 

  30. MontazeriShatoori, M., Davidson, L., Kaur, G., Lashkari, A.H.: Detection of doh tunnels using time-series classification of encrypted traffic. In: 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 63–70 (2020)

  31. Dina, A.S., Siddique, A.B., Manivannan, D.: Effect of balancing data using synthetic data on the performance of machine learning classifiers for intrusion detection in computer networks. IEEE Access 10, 96731–96747 (2022). https://doi.org/10.1109/ACCESS.2022.3205337

    Article  MATH  Google Scholar 

  32. Sinha, J., Manollas, M.: Efficient deep cnn-bilstm model for network intrusion detection. In: AIPR 2020: 3rd International Conference on Artificial Intelligence and Pattern Recognition, Xiamen, China, June 26-28, 2020, pp. 223–231. ACM, (2020). https://doi.org/10.1145/3430199.3430224

  33. Zeeshan, M., Riaz, Q., Bilal, M.A., Shahzad, M.K., Jabeen, H., Haider, S.A., Rahim, A.: Protocol-based deep intrusion detection for dos and ddos attacks using UNSW-NB15 and bot-iot data-sets. IEEE Access 10, 2269–2283 (2022). https://doi.org/10.1109/ACCESS.2021.3137201

    Article  Google Scholar 

  34. Tyagi, S., Mittal, S.: Sampling approaches for imbalanced data classification problem in machine learning. In: Singh, P.K., Kar, A.K., Singh, Y., Kolekar, M.H., Tanwar, S. (eds.) Proceedings of ICRIC 2019, pp. 209–221. Springer, Cham (2020)

  35. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2008, Part of the IEEE World Congress on Computational Intelligence, WCCI 2008, Hong Kong, China, June 1-6, 2008, pp. 1322–1328. IEEE, (2008). https://doi.org/10.1109/IJCNN.2008.4633969

  36. Zhang, J., Mani, I.: KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In: Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Datasets (2003)

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: All authors; Methodology: All authors; Validation: All authors; Writing: G.K, K.K. E.C; Review & editing: G.K., D.S.; Supervision: D.S., G.K.

Corresponding author

Correspondence to Georgios Kambourakis.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kampourakis, K.E., Chatzoglou, E., Kambourakis, G. et al. Balancing the act? Resampling versus imbalanced data for Wi-Fi IDS. Int. J. Inf. Secur. 24, 47 (2025). https://doi.org/10.1007/s10207-024-00958-1

Download citation

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s10207-024-00958-1

Keywords

Profiles

  1. Konstantinos E. Kampourakis