COMPARISON OF ANOMALY DETECTION TECHNIQUES FOR WIRELESS SENSOR NETWORK

DOI : 10.17577/IJERTCONV2IS01008

Download Full-Text PDF Cite this Publication

Text Only Version

COMPARISON OF ANOMALY DETECTION TECHNIQUES FOR WIRELESS SENSOR NETWORK

Sreepradhana.T and Soundarya.S

ABSTRACT:A wireless sensor network with a large number of sensor nodes can be used as an effective tool for gathering data in various situations. As wireless sensor networks continue to grow, so does the need for effective security mechanisms. Because sensor networks may interact with sensitive data and operate in hostile unattended environments, it is imperative that these security concerns be addressed from the beginning of the system design. The main issue in the sensor network is the data reliability. There are some factors that affect the data reliability in the sensor networks. It includes the noise and missing values, duplicated or inconsistent data. During transmission using sensor networks when the battery power is exhausted, then the probability of getting the erroneous data will grow rapidly. It leads to faulty data and it affects the data reliability of the system.An efficient technique to detect anomalous data is proposed. The outlier detection at the aggregator level is done using density based technique by using Local Outlier Factor (LOF).Here enhancement over LOF is introduced. Enhanced LOF is simpler and are used to find the sparse clusters.

Keywords-outlier detection,Local Outlier Factor

  1. INTRODUCTION

    algorithm can effectively identify local outliers which deviate from their clusters.Each data reading is assigned an outlier factor, which is the outlying degree of the reading. The sensor data whose local outlier factor does not lie nearer to 1 are termed as outliers. The only parameter required by LOF is Minpts, which corresponds to the minimum number of points that are used to define a neighborhood of a given data reading.MinPts reveals the minimum number of points used to define the neighborhood of a data point. Minpts are used to compute the density of each point, so if MinPts is set too high, some outliers near dense clusters may be misidentified as clustering points. If the MinPts is set too low, the groups of outlying objects will be wrongly identified as clusters.

    The enhanced version of LOF involves the use of two Minpts thereby defining two different neighborhoods: (1) neighbors in computing the density and (2) neighbors in comparing the densities. In LOF, these two neighborhoods are identical due to the fact that only one Minpts is used. Hence the sparse anomalous clusters can be identified as outliers rather detected as normal data set as in LOF.

  2. NETWORK STRUCTURE

    Sensor networks consist of a huge number of small sensor nodes, which communicate wirelessly. These sensor nodescan be spread out in hard accessible areas by what newapplications fields can be pointed out. A sensor nodecombines the abilities to compute, communicate and sense. In all the applications data is prone to attacks. The reliability of the data cannot be trusted. There may be outliers. Outliers are those patterns that deviate from the normal pattern of the sensed data. These outliers decrease the performance and quality of the system. So, the outlier detection is necessary for the wireless sensor network. At each cluster data is aggregated and outlier detection is performed to reduce energy consumption. Data aggregation is the process of one or several sensors that collects the detection result from other sensor. The collected data must be processed by sensor to reduce transmission burden before they are transmitted to the base station or sink. After the data is aggregated density based

    WSN

    Fig1.: Cluster based mechanism in

    outlierdetection is performed. The simplest version

    of density-based outlier detection is named as Local Outlier Factor (LOF). LocalOutlier Factor (LOF) measuresthedegree of outlierness of an object withregard to itssurrounding neighborhood. LOF basically scores outliers on the basis of the density of their neighborhood. LOF-outlier mining

    The nodes are clustered based on LEACH protocol.

    Low-Energy Adaptive ClusteringHierarchy (LEACH)protocol for sensor network helps to minimizes energy dissipation in sensor networks. It is very famous hierarchical routing algorithms for sensor networks which make clusters of the sensor nodes based on the received signal strength.

    LEACH forms clusters by using a distributed algorithm, where nodes make autonomous decisions without any centralized control. The advantages of this approach are that no long-distance communication with the base station is required and distributed cluster formation can be done without knowing the exact location of any of the nodes in the network. In addition, no global communication is needed to set up the clusters, and nothing is assumed about the current state of any other node during cluster formation. The goal is to achieve the global result of forming good clusters out of the nodes, purely via local decisions made autonomously by each node.

    In LEACH, data fusion and aggregation are local to the cluster. Cluster heads change randomly over time to balance the energy dissipation of nodes. The node chooses a random number between 0 and 1. The node becomes a cluster head for the current round if the number is less than the following threshold:

    K-distance neighborhood

    Reachability distance Enhanced LOF

    Fig2.: Enhanced LOF methodology

    B. K-Distance

    k-distance is defined as the furthest distance among the k-nearest neighbors of a data point p, where k is the minimum number of points that are used to define a neighborhood of a given data point.

    Consider the point p as shown in the Fig 3. Let k be

    2. The 2-nearest neighbours of p are the data points O1 and O. Among O and O1, the furthest distance from p is O. As per definition, k-distance is distance between p and O.

    The first phase is set-up phase in leach protocol where the main motive is cluster formation. For electing cluster heads residual energy and threshold are taken into account. After the clusters are formed and cluster heads are elected we move on to the next steady state phase. In the steady state phase the data transmission takes place in the network.

  3. PROPOSED NETWORK

    O1

    O1O1

    p

    K-dist(p)

    o

    k=2

    In the proposed system, consensus-based outlier detection approach has been applied by enhancing the previous versions such that it addresses the problems of existing system. The main feature of the system is to eliminate the malicious data at the cluster head level. The defined system provides mechanisms to identify the sparse anomalous clusters and also the problem of aggregator compromise is addressed. The enhanced version of LOF involves the use of two Minpts thereby defining two different neighborhoods: (1) neighbors in computing the density and (2) neighbors in comparing the densities. In LOF, these two neighborhoods are identical due to the fact that only one Minpt is used. Hence the sparse anomalous clusters can be identified as outliers rather detected as normal data set as in LOF.

    1. Enhanced LOF

      The enhanced LOF can be calculated be calculated as follows

      Fig3.: k-dist(p) for k=2

      The k-distance of object p, is defined as the distance d(p,o) between p and an object o D such that:

      1. for at least k objects oD \ {p} it holds that d(p,o) d(p,o), and

      2. for at most k-1 objects oD \ {p} it holds that d(p,o) < d(p,o).

      k=2

      o1C. K-Distance Neighborhood:

      Reach-dist(o1,p)=k-dist(p) p

      Reach-dist(o2,p)

      o

      K-distance

      o2

      The k-distnce neighborhood is defined as the set of k neighbors which lie within k-distance of a point p.

      Consider the point p as shown in the Fig 4. Let k be

      2. The 2-nearest neighbors of p are the data points O1 and O. As per definition, k-distance neighborhood of p is O and O1.

      O1 k=2

      p

      O

      o

      statistical fluctuations of d(p,o) for all the p's close to o can be significantly reduced. The strength of this smoothing effect can be controlled by the parameter k. The higher the value of k, the more similar are the reachability distances for objects within the same neighborhood.

      E.Enhanced LOF

      Enhanced LOF of an object p is defined as the average of the ratio of the local reachability density of p and those k1-nearest neighbours of p.

      Fig 4.: k-distance neighborhood of p

      The k-distance neighborhood of p contains every object, whose distance from p is not greater than the k-distance,

      Nk-distance(p) = { q D\{p} | d(p, q)

      kdistance(p)}

      These objects q are called the k-nearest neighbors of p.

      In order to detect density-based outliers, the density of the neighborhood of each object is determined which is defined by a parameter Minpts(positive integer) that specifies the minimum number of points that resides in ps neighborhood.

      D. Reachability Distance

      Reachability distance of p from o is the maximum of the radius of the neighborhood of o if p is in the neighborhood of o or the real distance from p to o.

      Consider the point p, O1 and O2 as shown in the Fig

      5. The neighbourhood circle of p is denoted using dotted circle and the radius denoted the k-distance. The maximum of radius of neighbourhood circle of p and the distance of p from O1 is the radius of neighbourhood circle of p. As per definition, reachability distance is the distance between p and O.

      Fig5.: reach-dist(o1,p) and reach-dist(o2,p), for k=2

      The reachability distance of object p with respect to object o is defined as

      reach-dist (p, o) = max { k-distance(o), d(p, o) } where o NMinPts(p)

      Figure 5 illustrates the idea of reachability distance with k =2. Intuitively, if object p is far away from o (e.g. o2 in the figure), then the reachability distance between the two is simply their actual distance. However, if they are sufficiently close (e.g., o1 in the figure), the actual distance is replaced by the k- distance of p. The reason is that in so doing, the

      oNMinPts1 dist(p)

      Enhanced LOF(p)=

      N MinPts1 – dist(p)(p)

  4. PERFORMANCE ANALYSIS

    A.Selection Of Input Parameters

    Fig 6.: Selection of Minpts2

    Considered various samples of sensor reading of sensor nodes with sample having 30% of anomalous value. Minpts1 is set to 30-40% of sample size, we have to determine a suitable value for Minpts2 such that detection rate is high. To identify the sparse anomalous clusters, setting of Minpts2 value is crucial as it is used to determine the density of each sensed data. Thus a plot is done by varying the Minpts2 value proportional to the Minpts1 value for various sample sensor data against detection rate. From the graph it is found that when Minpts2 value is more than 50% of Minpts1 value the detection rate is high for all samples. When it is less than 50% of Minpts1 value the detection rate is low.

    B.Performance For Dense Clustered Data

    The following graphs are plotted for various versions of LOF scheme named as follows, LOF the basic versions and LOF3 denote the enhanced version where LOF3 is based on density based outlier detection.

    C.Detection Rate

    The number of correctly detected anomalies from the total number of anomalies is said to be the detection rate of a system. The graph below depicts the detection rate of various LOF versions. It is found that the enhanced version of LOF has higher DR until 40% outliers whereas basic versions deviate after 30% outliers.

    Fig 7.: Comparison of Detection rate

    D.False Alarm Rate

    The graph represents the variation of false positive rate across various outlier percentages for different LOF schemes. It is found that the enhanced version of LOF with clustering produces very low FPR until 50% outliers whereas basic versions deviate after 30% to 40% outliers.

    Fig 8.: Comparison of False alarm rate E.False Positive Rate

    The following graph depicts the false positive rate of various versions of LOF. It is found that the FPR values stay low until 40% for LOF3 and until 30% for other version and finds a steep increase henceforth.

    Fig 9.: Comparison of False positive rate

  5. PERFORMANCE FOR SPARSE CLUSTERED DATA

    A.Detection Rate

    The main feature of enhanced LOF version is their ability to detect sparse anomalous cluster. It is rightly justified from the graph that the enhanced version of LOF has higher DR until 40% outliers whereas basic versions provide mixed results after 30% anomalies.

    Fig 10.: Comparison of Detection rate

    B.False Alarm Rate

    The graph represents the variation of false positive rate across various outlier percentages. It is found that the enhanced version of LOF with clustering produces very low FAR in comparison to other versions where FAR is very high due to the presence of sparse clusters.

    Fig 11.: Comparison of False alarm rate

    C.False Positive Rate

    The variation of false positive rate against various outlier percentages is monitored for the sensor measurements under consideration. The basic versions provide very high FPR value after 30% whereas the enhanced versions provide a stable result across various outlier ranges.

    Fig 12.: Comparison of False positive rate

    D.Data Accuracy Rate

    The data accuracy rate, after applying outlier detection mechanism, is nearer to normal data until detection is high after which it coincides with the anomalous data.

  6. CONCLUSION

    In this project, an efficient outlier detection mechanism is implemented such that anomalies are detected and eliminated from the entire system by monitoring all levels. The performance of the protocol in detecting outliers is analyzed using the detection rate, false alarm rate and false positive rate using the data collected from sensors. The observed results show that the protocol can be used to detect dense as well as sparse clusters at the cluster head level. There is also a considerable reduction in communication overhead and energy consumption due to the application of data aggregation mechanism using LEACH protocol.

  7. REFERENCES

[1]. Anny Lai-mei Chiu, Ada Wai-chee Fu, "Enhancements on Local Outlier Detection," ideas, pp.298, Seventh International Database Engineering and Applications Symposium (IDEAS'03), 2003

[2]. Er. Barjinder Singh Kaler ,Er. Manpreet Kaur Kaler, Challenges in Wireless Sensor Networks , RIMT-MAEC,

Mandigobindgarh

[3]. KiranMaraiya, Kamal Kant, Nitin Gupta, Wireless Sensor Network: A Review on DataAggregation, ISSN 2229-5518,IJSER

© 2011,International Journal of Scientific & Engineering Research Volume 2, Issue 4,

April -2011

[4]. MarkusM. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander.: LOF: Identifying Density-Based Local Outliers Proc. ACMSIGMOD 2000 Int. Conf. On Management of Data, Dalles, TX, 2000.

[5]. K.Padmanabhan , P.Kamalakkannan A Study On Energy EfficientRouting Protocols In Wireless Sensor Networks, European Journal of ScientificReasearch, ISSN 1450- 216X vol.60 No.4(2011) © Euro Journals Publishing, Inc,2011.

[6]. Rajesh Patel, Sunil Pariyani, Vijay Ukani,Energy and Throughput Analysis of Hierarchical Routing Protocol (LEACH) for Wireless Sensor Network, International Journal of Computer Applications (0975 8887), Volume 20 No.4, April 2011.

[7]. SutharshanRajasegarar, Christopher Leckie, andMarimuthuPalaniswami, University Of Melbourne, Australia., Anomaly Detection In Wireless Sensor Networks, 1536- 1284/08/© 2008 IEEE, IEEE Wireless Communications ,August 2008.

Fig 13.: Data accuracy rate

Leave a Reply