 Open Access
 Total Downloads : 310
 Authors : Kalaimathi B, Annapoorani. G, Gayathri N
 Paper ID : IJERTV3IS10704
 Volume & Issue : Volume 03, Issue 01 (January 2014)
 Published (First Online): 24012014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Analysis of Intrusion Detection System Based on Kmeans Algorithm and Particle Swarm Optimization
Kalaimathi B1, Annapoorani. G2 and Gayathri N3
1Assistant Professor, Department of CSE , MVJ College of Engineering Channasandra, Near ITPL, Bangalore67, India
2Assistant Professor, Department of CSE, MVJ College of Engineering Channasandra, Near ITPL, Bangalore67, India
3Assistant Professor& HOD,Department of ISE, MVJ College of Engineering Channasandra, Near ITPL, Bangalore67, India
Abstract:
In this modern world internet security has been one of the most threatening problems in the world. Intrusion detection is the important process of monitoring the events in the network system to find out the hackers. Clustering is the most important unsupervised learning process used to find the structures or patterns in a collection of unlabeled data. A intrusion detection system model based on PSO optimization and Kmeans was analysed in this paper.
Keywords: Intrusion Detection System, Cluster, Particle swarm optimization, K means algorithm.

Introduction:
Clustering analysis is an important part of the Data Mining research; it is an important method of the unsupervised learning. It divides the data into certain polymerization classes according to the attribute of the data. We need effective intrusion detection systems to protect our computers from
unauthorized or malicious actions. Data mining can improve variants detection rate, control false alarm rate, and reduce false dismissals .Data mining based on intrusion detection systems can be roughly categorized into major two Groups misuse detection and anomaly detection. Network intrusion detection is the process of monitoring the events occurring in a computing system or network and analysing them for signs of intrusions, defined as attempts to compromise the confidentiality.
The intrusion attacks can be divided into four categories: Probe (e.g. IP sweep, vulnerability scanning), denial of service (DoS) (e.g. mail bomb, UDP storm), usertoroot (U2R) (e.g. buffer overflow attacks, root kits) and remotetolocal (R2L) (e.g. password guessing, worm attack).Clustering is the method of grouping objects into meaningful subclasses so that the members from the same cluster are quite similar, and the members from different clusters are quite different from each other[1]. Therefore clustering methods can be useful for classifying log data and detecting intrusion.
Clustering algorithms can be categorized into
four main groups: partitioning algorithm, hierarchical algorithm, densitybased algorithm and
1
gridbased algorithm. Partitioning algorithms
then terminate the algorithm, else make Ci=C 1 and
construct a partition of a database of N objects into a set of K clusters. Usually they start with an initial partition and then use an iterative control strategy to optimize an objective function. PSO algorithm converge rapidly .It is easy to adjust parameter and can be applied to the condition when there is large number of samples and the dimensions of samples are large. Kmeans clustering algorithm is an effective method has been proved for apply to the intrusion detection system but it is part of the optimal solution.

Kmeans Algorithm:
Clustering is the method of grouping objects into meaningful subclasses so that the members from the same cluster are quite similar, and the members from different clusters are quite different from each other groups: partitioning algorithm, hierarchical algorithm, densitybased algorithm and gridbased algorithm[7]. Partitioning algorithms construct a partition of a data base of N objects into a set of K clusters. Usually they start with an initial partition and then use an iterative control strategy to optimize an objective function.
Idea of Algorithm:
Given the Ddimensional data set X= {xi/xi Rd i=1,2,N}
Clusters are w1, w2, w3 .wk. To define K centroids (c1, c2, c3.ck), one for each cluster,
Ci = 1/ni x ni is the number of datasets in the cluster.
The basic Kmeans algorithm[1] consists of the following steps:

Assigns the clustering number K.

Random select K points C1C2 CK as the initial clustering centers from the given point set
{x1x2 xN}

Select C1C2CK as the clustering centers and divide the set {x1x2 xN} according to the following regulation:
If d (xi, cp) <d (xi, cq) p, q=1, 2k and p! =q,
then xi Kp ( Kp is a class and its center is Cp)

Recalculate the new clustering centers
C 1 ,C2 1 C 1 according to the equation
return to point 3
Define KCentroids
Take each point belonging to the nearest centroid
No
Repeat no point is pending
Yes
Recalculate Centroids
Until no more changes are done
Figure1:Kmeans algorithm flow chart


Intrusion detection system in kmeans algorithm:
Due to data preprocessing different features of raw data are on different scales[3]. This causes bias toward some larger features over other smaller features. This leads to intrusion in clusters. To solve the problem a measurement is performed as follows

First calculate the mean absolute eviation hf:
hf =1/n(x1fmf+x2fmf+.+xnfmf) where x1f,x2f..xnf are n measuring values of variants ,tf is the mean value of the variant f,
that is
tf=(x1f+x2f+..+xnf)/n;

Calculate the standardized measurement:
Yif=xiftf/hf

Then convert every instance in the training sets to a new one based on previous algorithm.
This is the method that transforms the standardized space based on statistical information retrieved from the training sets and also to detect intrusion.
1 K
C 1=1/ki x
1 j
i =1, 2 K


Particle swam optimization algorithm:
ki is the point number in ki

If c 1=c
i=1, 2, k (or the algorithm has
PSO is an effectively global optimization
i i
achieved the hypothesis biggest iterative times)
algorithm, it guides optimization search by the Swarm Intelligence, which comes from cooperation
and competition between particles, Compares with the evolutionary algorithm, PSO retains the global search strategy based on population, its operation is simple, and solution of each generation population has Dual advantages of Selflearning and learning from others. So it can find the optimal solution by lesser iterative times.
Particle Swarm Optimization (PSO) is a swarm intelligence method developed by Kennedy and Eberhart in 1995 .The behavior of PSO mimics the social interaction between individuals such as interactions between the birds in flocks trying to locate an optimal food source. The direction of the movement of each bird is controlled by its current location, the best food location it ever found, and the best food location any bird in the flock ever found. The basic process of the PSO[5] algorithm is given by:
Step 1: (Initialization) randomly generate initial particles.
Step 2: (Fitness) Measure the fitness of each particle in the population.
Step 3: (Update) Compute the velocity of each particle. V=m/h; M represent at which time the connection can open. It represent at which the connection can request.
Step 4: (Construction) For each particle, move to the next position.
Step 5: (Termination) Stop the algorithm if the termination criterion is satisfied; returnto Step 2 otherwise
Trainee Data set
Initialization Population
Update particle
Update Velocity
Condition next
Output
Figure 2:PSO Algorithm flowchart
In PSO, a number of simple entities called particles are placed in the search space of the target problem, and each particle evaluates its position based on a given objective function calculating the fitness value. Each particle then determines its movement (velocity) through the problem search space by taking the history of its own current position and best position achieved by the whole swarm. Furthermore, the movement of a particle is affected by its inertia, and other constants. However, the whole swarm after several iterations is likely to move close to the global best solution.PSO[5] updates the particles positions inside the problem search space using the following equations:
Xi (t+ 1) =Xi (t) +Vi (t+ 1)
Where Xi is the position of particle i,t is the iteration number, and Vi is the velocity of particle I given by the following equation:
Vi (t+ 1) =WVi (t) +r1c1 [XPiXi (t)] +r2c2[XGXi(t)] Where W is inertia weight,r1andr2 are randomly generated numbers uniformly distributed between 0 and 1,c1,c2 are constant coefficients, XPi is the current best position of particle i and XG is the current global best position of the whole swarm.


Particle Swarm Optimization for Intrusion Detection systems:
A module of the system namely Red Teams emulates the behavior of hackers[5]. The Red Teams component employs PSO techniques in their intrusion methodology. The acquired results can dynamically help the IDS reconfigure onthefly in order to be more effective. Since most of the PSO based IDS are hybrid anomaly detection systems, it is possible to categorize them according to the additional ML method that is employed. We distinguish
(a)Hybrid PSONeural Network Systems, (b)Hybrid PSOSVMSystems,
(c)Hybrid PSOKmeans Systems

PSO & neural network hybrid approaches:
Artificial Neural Networks (ANN)[2] is one of the most popular soft computing techniques for data classification. PSO is a technique which is used extensively in combination with various types of ANN for improving the performance of the system. During the training phase the PSO is executed recursively to train the network. In PSOANN approach, So corresponds to synaptic weigths are fed to ANN during the test phase. To detect intrusion in system PSOANN approach is divided into two processes:

ANN, a system component does the classification process

PSO is used to improve the critical parameters and train the synaptic weights.

Advantages:
A hybrid PSOANN system but also introduce an evolutionary mutation algorithm as an extra step in order to

Protect PSO from trapping into local minima

Increase the diversity of the population

Expand the scope of the search

Hybrid PSOSVM Systems:
Similarly to ANN, another technique frequently used in combination with PSO[5] is Support Vector Machines (SVM). SVM is based on structural risk minimization of statistical learning theory and shows good learning ability and generalization skill in high dimensional or noisy datasets, two attributes highly appreciated in intrusion detection They used two different flavors of PSO the Standard Particle Swarm Optimization (SPSO) and Binary Particle Swarm Optimization (BPSO) for seeking optimal SVM parameters and extracting a feature subset respectively.

In BPSO each particle features and
parameters values should be taken as first then that results and training datasets are fed to the SVM classifiers.

In SPSO, the classification process and optimum features happen simultaneously .so the dataset features and the crucial SVM parameters are represented by each particle position.

Advantage: Hybrid PSOSVM System is more accurate to detect intrusion.
c) hybrid PSOKmeans Systems
In PSOKmeans systems[6], each particles position is the set of D dimensional centroids produced by the KMeans algorithm. Thus, each particles position can be represented as an array:
Z11 Z12 . Z1D Z21 Z22 . Z2D
. . . .
Zk1 Zk2 . ZkD
where D is the number of the dimensions of the dataset (therefore the centroids dimension) and k represents the
number of clusters. Initially, data points are assigned to k clusters in a random manner. Then the centroids are calculated and the position of each particle is
deduced. For each particle, the fitness function evaluates the position and if necessary the Pbest and Gbest values are updated along with that of velocity and position.
Finally, the KMeans algorithm runs in order to optimize the new generation of particles. The algorithm converges to local optimum with very low probability and has high convergence speed.


Comparison of kmeans and Particle Swarm Optimization:
Advantages of Kmean clustering
Kmean clustering is simple and flexible.
Kmean clustering algorithm is easy to understand and implements.
Disadvantages of Kmean clustering

In Kmean clustering user need to specify the number of cluster in advanced
Kmean clustering algorithm performance depends on a initial centroids that why the algorithm doesnt have guarantee for optimal solution
Advantages of PSO

PSO based on the intelligence and it is applied on both scientific research and engineering.

PSO has no mutation and overlapping calculation. The search can be take place by the speed of the particle. Most optimist particle can able to transmit the information onto the other particles during the development of several generations, and the speed of researching is faster.[8]

PSO accepts the real number code, and that is decided directly by the solution. Calculation in PSO is simpler and efficient in global search
Disadvantages of PSO

It is slow convergence in refined search stage and weak local search ability.

The method cannot work on the problems of non coordinate systems like the solution of energy field and the moving rules for the particles in the energy field


Conclusion:
Survey of Kmean clustering algorithm is most widely used clustering local search method for detecting intrusion in systems. Kmean which is depending on initial condition, which causes the algorithm, may converge to suboptimal solution. K means method is an effective algorithm for partioning large data set. One basic point to be taken into account is that the use of PSO has significantly boosted the performance of all the machine learning techniques in which it was applied. So Particle swarm optimization is more likely to find near optimal solution. Compare to Kmeans as we have discussed in this paper, the PSO has more approaches to detect intrusion in the systems.

References:

International Journal of Science and Modern Engineering (IJISME) ISSN: 23196386, Volume1, Issue 3, February 2013 A Survey on Kmean Clustering and Particle Swarm Optimization Pritesh Vora, Bhavesh Oza

Swarm intelligence in intrusion detection: A survey C. Kolias a,b,*, G. Kambourakis a,b, M. Maragoudakis a,b a Laboratory of Information and Communication Systems Security, University of the Aegean, Samos GR83200, Greece b Department of Information and Communication Systems Engineering,

University of the Aegean, Samos GR83200, Greece,2011 [3]The Application on intrusion detection based
on kmeans algorithm.Meng Jianling Shang Haikum Bian Ling Department of computer science and technology college,North China Electic Power University,Hebei Boading 2009International Conference on information technology and applications.
[4]2008 International Confernce on Intelligent Computation Technology and Automation The Clustering Algorithm Based on Particle Swarm Optimization Algorithm Pei Zhenkui1,2 , Hua Xia1 , and Han Jinfeng1 1College of Computer and Communication ngineering, China University of Petroleum,Dongying 257061, China 2School of Computer and Information Technology,Beijing Jiaotong University, Beijing 100044

The Clustering Algorithm based on particle swarm optimization algorithm Pei Zhenkui,Hua Xia,and Han Jinfeng ,2008 International Conference on Intelligent Computation Technology and Automation.

2008 International Conference on Intelligent Computation Technology and Automation The Clustering Algorithm Based on Particle Swarm Optimization Algorithm Pei Zhenkui1,2 , Hua Xia1 , and Han Jinfeng1 1College of Computer and Communication ngineering, China University of Petroleum,Dongying 257061, China 2School of Computer and Information Technology,
Beijing Jiaotong University, Beijing 100044
[6]S. Gao, J.Y. Yang, New Clustering Method Based on Particle Swarm Algorithm, Journal of Nanjing University of Aeronautics &Astronautics, 2006, pp.6265. 
Bradley,Fayyad.Refining Initial Point for K means Clustering.Proceedings of the Fifteenth International Conference on Machine Learning,1998