The Importance of the Clustering Model to Detect New Types of Intrusion in Data Traffic
In the current digital age, the volume of data generated by various cyber activities has become enormous and is constantly increasing. The data may contain valuable insights that can be harnessed to improve cyber security measures. However, much of this data is unclassified and qualitative, which poses significant challenges to traditional analysis methods. In order to over- come these challenges, clustering, a crucial method in machine learning (ML) and data analysis, has become increasingly effective. Clustering facilitates the identification of hidden patterns and structures in data through grouping sim- ilar data points, which makes it simpler to identify and address threats. Clus- tering can be defined as a data mining (DM) approach, which uses similarity calculations for dividing a data set into several categories. Each data cluster that the clustering algorithm has identified has a high degree of similarity, and there is a fair amount of similarity between other clusters of data. Hierar- chical, density-based, along with partitioning clustering algorithms are typi- cal. The presented work use K-means algorithm, which is a popular clustering technique. Utilizing K-means algorithm, we worked with two different types of data: first, we gathered data with the use of XG-boost algorithm fol- lowing completing the aggregation with K-means algorithm. Data was gath- ered utilizing Kali Linux environment, cicflowmeter traffic, and Putty Soft- ware tools with the use of diverse and simple attacks. The concept could assist in identifying new attack types, which are distinct from the known attacks, and labeling them based on the characteristics they will exhibit, as the dy- namic nature regarding cyber threats means that new attack types of- ten emerge, for which labeled data might not yet exist. The model counted the attacks and assigned numbers to each one of them. Secondly, We tried the same work on the ready data inside the Kaggle repository called (Intrusion Detection in Internet of Things Network), and the clustering model worked well and detected the number of attacks correctly as shown in the results sec- tion.