Augmented Framework to Enhance the Detection Accuracy of Rare Classes in Intrusion Detection Systems

Abstract

Protecting critical infrastructures has become an urgent priority as our digital landscape becomes more interconnected and data proliferates. Intrusion Detection Systems (IDS) are essential for detecting and addressing potential security threats, and maintaining the integrity and confidentiality of sensitive information. By scrutinizing patterns and anomalies in network traffic, IDS aims to enhance the overall security posture by identifying and mitigating threats in real time or through post-incident analysis. IDS can be categorized into two main types Signature-based and Anomaly-based. Signature-based detection relies on a database of known attack patterns while anomaly-based detection focuses on deviation from established baselines. Using machine learning in IDS has become increasingly popular due to its proactive approach in addressing evolving cyber threats. It empowers IDS to recognize patterns and anomalies in network or system behavior, thereby supporting both detection modes. Though, machine learning offers a resilient defense against cyber-attacks, it is not infallible. Firstly, given the sheer volume and complexity of data generated in the network activities, feature selection becomes a critical aspect for designing effective IDS. Secondly, the imbalance in the distribution of normal and malicious activities leads to a bias in the learning process, favoring accuracy at the expense of actual threats. The objective of the research is to propose a model for building a network-based intrusion detection system using a machine learning technique that selects the most relevant features from the incoming traffic through feature selection using meta-heuristic algorithms and facilitates the detection of rare classes of attacks by training the model using oversampled training data. The present research addresses the issue of inherent imbalance in network data. It applies the hybrid meta-heuristic-based feature selection algorithm CFS-MHA (Correlation-based Feature Selection using Meta-Heuristic Approaches) to reduce the number of features from 41 to 15 in a benchmark dataset NSL-KDD. To address the issue of imbalance in network data and reduce the false negative rate of rare attack classes, the present study propose a creative approach towards decision-based SMOTE variants as PASMOTE (Proximity Adaptive Synthetic Minority Oversampling Technique). This methodology performs adaptive synthetic sampling on rare and complex samples by using decision boundaries. Both the proposed CFS-MHA and PASMOTE have been compared with existing state-of-art techniques. Datasets containing records of attacks were used to validate the models using performance metrics like accuracy, recall, precision, and F1-score. newline

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced