 Open Access
 Total Downloads : 13
 Authors : Sreepradhana.T, Soundarya.S
 Paper ID : IJERTCONV2IS01008
 Volume & Issue : IFET – 2014 (Volume 2 – Issue 01)
 Published (First Online): 30072018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
COMPARISON OF ANOMALY DETECTION TECHNIQUES FOR WIRELESS SENSOR NETWORK
Sreepradhana.T and Soundarya.S
ABSTRACT:A wireless sensor network with a large number of sensor nodes can be used as an effective tool for gathering data in various situations. As wireless sensor networks continue to grow, so does the need for effective security mechanisms. Because sensor networks may interact with sensitive data and operate in hostile unattended environments, it is imperative that these security concerns be addressed from the beginning of the system design. The main issue in the sensor network is the data reliability. There are some factors that affect the data reliability in the sensor networks. It includes the noise and missing values, duplicated or inconsistent data. During transmission using sensor networks when the battery power is exhausted, then the probability of getting the erroneous data will grow rapidly. It leads to faulty data and it affects the data reliability of the system.An efficient technique to detect anomalous data is proposed. The outlier detection at the aggregator level is done using density based technique by using Local Outlier Factor (LOF).Here enhancement over LOF is introduced. Enhanced LOF is simpler and are used to find the sparse clusters.
Keywordsoutlier detection,Local Outlier Factor

INTRODUCTION
algorithm can effectively identify local outliers which deviate from their clusters.Each data reading is assigned an outlier factor, which is the outlying degree of the reading. The sensor data whose local outlier factor does not lie nearer to 1 are termed as outliers. The only parameter required by LOF is Minpts, which corresponds to the minimum number of points that are used to define a neighborhood of a given data reading.MinPts reveals the minimum number of points used to define the neighborhood of a data point. Minpts are used to compute the density of each point, so if MinPts is set too high, some outliers near dense clusters may be misidentified as clustering points. If the MinPts is set too low, the groups of outlying objects will be wrongly identified as clusters.
The enhanced version of LOF involves the use of two Minpts thereby defining two different neighborhoods: (1) neighbors in computing the density and (2) neighbors in comparing the densities. In LOF, these two neighborhoods are identical due to the fact that only one Minpts is used. Hence the sparse anomalous clusters can be identified as outliers rather detected as normal data set as in LOF.

NETWORK STRUCTURE
Sensor networks consist of a huge number of small sensor nodes, which communicate wirelessly. These sensor nodescan be spread out in hard accessible areas by what newapplications fields can be pointed out. A sensor nodecombines the abilities to compute, communicate and sense. In all the applications data is prone to attacks. The reliability of the data cannot be trusted. There may be outliers. Outliers are those patterns that deviate from the normal pattern of the sensed data. These outliers decrease the performance and quality of the system. So, the outlier detection is necessary for the wireless sensor network. At each cluster data is aggregated and outlier detection is performed to reduce energy consumption. Data aggregation is the process of one or several sensors that collects the detection result from other sensor. The collected data must be processed by sensor to reduce transmission burden before they are transmitted to the base station or sink. After the data is aggregated density based
WSN
Fig1.: Cluster based mechanism in
outlierdetection is performed. The simplest version
of densitybased outlier detection is named as Local Outlier Factor (LOF). LocalOutlier Factor (LOF) measuresthedegree of outlierness of an object withregard to itssurrounding neighborhood. LOF basically scores outliers on the basis of the density of their neighborhood. LOFoutlier mining
The nodes are clustered based on LEACH protocol.
LowEnergy Adaptive ClusteringHierarchy (LEACH)protocol for sensor network helps to minimizes energy dissipation in sensor networks. It is very famous hierarchical routing algorithms for sensor networks which make clusters of the sensor nodes based on the received signal strength.
LEACH forms clusters by using a distributed algorithm, where nodes make autonomous decisions without any centralized control. The advantages of this approach are that no longdistance communication with the base station is required and distributed cluster formation can be done without knowing the exact location of any of the nodes in the network. In addition, no global communication is needed to set up the clusters, and nothing is assumed about the current state of any other node during cluster formation. The goal is to achieve the global result of forming good clusters out of the nodes, purely via local decisions made autonomously by each node.
In LEACH, data fusion and aggregation are local to the cluster. Cluster heads change randomly over time to balance the energy dissipation of nodes. The node chooses a random number between 0 and 1. The node becomes a cluster head for the current round if the number is less than the following threshold:
Kdistance neighborhood
Reachability distance Enhanced LOF
Fig2.: Enhanced LOF methodology
B. KDistance
kdistance is defined as the furthest distance among the knearest neighbors of a data point p, where k is the minimum number of points that are used to define a neighborhood of a given data point.
Consider the point p as shown in the Fig 3. Let k be
2. The 2nearest neighbours of p are the data points O1 and O. Among O and O1, the furthest distance from p is O. As per definition, kdistance is distance between p and O.
The first phase is setup phase in leach protocol where the main motive is cluster formation. For electing cluster heads residual energy and threshold are taken into account. After the clusters are formed and cluster heads are elected we move on to the next steady state phase. In the steady state phase the data transmission takes place in the network.

PROPOSED NETWORK
O1
O1O1
p
Kdist(p)
o
k=2
In the proposed system, consensusbased outlier detection approach has been applied by enhancing the previous versions such that it addresses the problems of existing system. The main feature of the system is to eliminate the malicious data at the cluster head level. The defined system provides mechanisms to identify the sparse anomalous clusters and also the problem of aggregator compromise is addressed. The enhanced version of LOF involves the use of two Minpts thereby defining two different neighborhoods: (1) neighbors in computing the density and (2) neighbors in comparing the densities. In LOF, these two neighborhoods are identical due to the fact that only one Minpt is used. Hence the sparse anomalous clusters can be identified as outliers rather detected as normal data set as in LOF.

Enhanced LOF
The enhanced LOF can be calculated be calculated as follows
Fig3.: kdist(p) for k=2
The kdistance of object p, is defined as the distance d(p,o) between p and an object o D such that:

for at least k objects oD \ {p} it holds that d(p,o) d(p,o), and

for at most k1 objects oD \ {p} it holds that d(p,o) < d(p,o).
k=2
o1C. KDistance Neighborhood:
Reachdist(o1,p)=kdist(p) p
Reachdist(o2,p)
o
Kdistance
o2
The kdistnce neighborhood is defined as the set of k neighbors which lie within kdistance of a point p.
Consider the point p as shown in the Fig 4. Let k be
2. The 2nearest neighbors of p are the data points O1 and O. As per definition, kdistance neighborhood of p is O and O1.
O1 k=2
p
O
o
statistical fluctuations of d(p,o) for all the p's close to o can be significantly reduced. The strength of this smoothing effect can be controlled by the parameter k. The higher the value of k, the more similar are the reachability distances for objects within the same neighborhood.
E.Enhanced LOF
Enhanced LOF of an object p is defined as the average of the ratio of the local reachability density of p and those k1nearest neighbours of p.
Fig 4.: kdistance neighborhood of p
The kdistance neighborhood of p contains every object, whose distance from p is not greater than the kdistance,
Nkdistance(p) = { q D\{p}  d(p, q)
kdistance(p)}
These objects q are called the knearest neighbors of p.
In order to detect densitybased outliers, the density of the neighborhood of each object is determined which is defined by a parameter Minpts(positive integer) that specifies the minimum number of points that resides in ps neighborhood.
D. Reachability Distance
Reachability distance of p from o is the maximum of the radius of the neighborhood of o if p is in the neighborhood of o or the real distance from p to o.
Consider the point p, O1 and O2 as shown in the Fig
5. The neighbourhood circle of p is denoted using dotted circle and the radius denoted the kdistance. The maximum of radius of neighbourhood circle of p and the distance of p from O1 is the radius of neighbourhood circle of p. As per definition, reachability distance is the distance between p and O.
Fig5.: reachdist(o1,p) and reachdist(o2,p), for k=2
The reachability distance of object p with respect to object o is defined as
reachdist (p, o) = max { kdistance(o), d(p, o) } where o NMinPts(p)
Figure 5 illustrates the idea of reachability distance with k =2. Intuitively, if object p is far away from o (e.g. o2 in the figure), then the reachability distance between the two is simply their actual distance. However, if they are sufficiently close (e.g., o1 in the figure), the actual distance is replaced by the k distance of p. The reason is that in so doing, the
oNMinPts1 dist(p)
Enhanced LOF(p)=
N MinPts1 – dist(p)(p)



PERFORMANCE ANALYSIS
A.Selection Of Input Parameters
Fig 6.: Selection of Minpts2
Considered various samples of sensor reading of sensor nodes with sample having 30% of anomalous value. Minpts1 is set to 3040% of sample size, we have to determine a suitable value for Minpts2 such that detection rate is high. To identify the sparse anomalous clusters, setting of Minpts2 value is crucial as it is used to determine the density of each sensed data. Thus a plot is done by varying the Minpts2 value proportional to the Minpts1 value for various sample sensor data against detection rate. From the graph it is found that when Minpts2 value is more than 50% of Minpts1 value the detection rate is high for all samples. When it is less than 50% of Minpts1 value the detection rate is low.
B.Performance For Dense Clustered Data
The following graphs are plotted for various versions of LOF scheme named as follows, LOF the basic versions and LOF3 denote the enhanced version where LOF3 is based on density based outlier detection.
C.Detection Rate
The number of correctly detected anomalies from the total number of anomalies is said to be the detection rate of a system. The graph below depicts the detection rate of various LOF versions. It is found that the enhanced version of LOF has higher DR until 40% outliers whereas basic versions deviate after 30% outliers.
Fig 7.: Comparison of Detection rate
D.False Alarm Rate
The graph represents the variation of false positive rate across various outlier percentages for different LOF schemes. It is found that the enhanced version of LOF with clustering produces very low FPR until 50% outliers whereas basic versions deviate after 30% to 40% outliers.
Fig 8.: Comparison of False alarm rate E.False Positive Rate
The following graph depicts the false positive rate of various versions of LOF. It is found that the FPR values stay low until 40% for LOF3 and until 30% for other version and finds a steep increase henceforth.
Fig 9.: Comparison of False positive rate

PERFORMANCE FOR SPARSE CLUSTERED DATA
A.Detection Rate
The main feature of enhanced LOF version is their ability to detect sparse anomalous cluster. It is rightly justified from the graph that the enhanced version of LOF has higher DR until 40% outliers whereas basic versions provide mixed results after 30% anomalies.
Fig 10.: Comparison of Detection rate
B.False Alarm Rate
The graph represents the variation of false positive rate across various outlier percentages. It is found that the enhanced version of LOF with clustering produces very low FAR in comparison to other versions where FAR is very high due to the presence of sparse clusters.
Fig 11.: Comparison of False alarm rate
C.False Positive Rate
The variation of false positive rate against various outlier percentages is monitored for the sensor measurements under consideration. The basic versions provide very high FPR value after 30% whereas the enhanced versions provide a stable result across various outlier ranges.
Fig 12.: Comparison of False positive rate
D.Data Accuracy Rate
The data accuracy rate, after applying outlier detection mechanism, is nearer to normal data until detection is high after which it coincides with the anomalous data.

CONCLUSION
In this project, an efficient outlier detection mechanism is implemented such that anomalies are detected and eliminated from the entire system by monitoring all levels. The performance of the protocol in detecting outliers is analyzed using the detection rate, false alarm rate and false positive rate using the data collected from sensors. The observed results show that the protocol can be used to detect dense as well as sparse clusters at the cluster head level. There is also a considerable reduction in communication overhead and energy consumption due to the application of data aggregation mechanism using LEACH protocol.

REFERENCES
Mandigobindgarh
[3]. KiranMaraiya, Kamal Kant, Nitin Gupta, Wireless Sensor Network: A Review on DataAggregation, ISSN 22295518,IJSERÂ© 2011,International Journal of Scientific & Engineering Research Volume 2, Issue 4,
April 2011
[4]. MarkusM. Breunig, HansPeter Kriegel, Raymond T. Ng, JÃ¶rg Sander.: LOF: Identifying DensityBased Local Outliers Proc. ACMSIGMOD 2000 Int. Conf. On Management of Data, Dalles, TX, 2000. [5]. K.Padmanabhan , P.Kamalakkannan A Study On Energy EfficientRouting Protocols In Wireless Sensor Networks, European Journal of ScientificReasearch, ISSN 1450 216X vol.60 No.4(2011) Â© Euro Journals Publishing, Inc,2011. [6]. Rajesh Patel, Sunil Pariyani, Vijay Ukani,Energy and Throughput Analysis of Hierarchical Routing Protocol (LEACH) for Wireless Sensor Network, International Journal of Computer Applications (0975 8887), Volume 20 No.4, April 2011. [7]. SutharshanRajasegarar, Christopher Leckie, andMarimuthuPalaniswami, University Of Melbourne, Australia., Anomaly Detection In Wireless Sensor Networks, 1536 1284/08/Â© 2008 IEEE, IEEE Wireless Communications ,August 2008.Fig 13.: Data accuracy rate