Predicting Critical Events in Dynamic Social Network

A social network is usually conceived as a graph in which individuals in the network are represented by the nodes and the nodes are connected to each other by links which depict the relations among the individuals. The term “community” for any group of nodes that are densely connected among themselves and sparsely connected to others. As time evolves, communities in a social network may undergo various changes (split, expand, shrink, stable, merge) known as critical events. Prediction of critical events is an important and difficult issue in the study of social networks. This paper proposes a sliding window analysis, an autoregressive model and survival analysis techniques. The autoregressive model is here to simulate the evolution of the community structure, and the survival analysis techniques allow the prediction of future changes the community may undergo. In our approach Critical events are treated based on a weighting


INTRODUCTION
A social network is a social structure of people related to each other through a common relationship or interest. Usually, a social network is conceived as a graph in which individuals in the network are represented by the nodes. The relations among the individuals are represented by links. Social network analysis is the interactions between people and groups of people, as well as the associated resources for understanding their behavior. Tracking community structures over time and predicting their future changes has important applications in various domains such as criminology, public health, education.
In a dynamic and evolving nature of online social networks with time, as most often i) new members join the network, ii) existing members leave the network, and iii) members establish/break ties and/or change intensity/weight of interactions with other members. The term "community" for any group of nodes that are densely connected among themselves and sparsely connected to others. A community structure can be drastically affected by changes in nodes and variations in their links, such as appearing and disappearing over time. Hence, from one time point to another time point , > , a community can split into several other communities, expand into a larger community, shrink to smaller community or remain constant in the same way, several communities can merge into one community. We call these (split, expand, shrink, stable, merge) critical events which communities may undergo over time.
For learning the evolution of communities over time and predicting the critical events the communities may undergo. this paper proposes a sliding window analysis, an autoregressive model and survival analysis techniques. The autoregressive model is here to simulate the evolution of the community structure, whereas the survival analysis techniques allow the prediction of future changes the community may undergo.
Sliding window concept is useful to track consistent evolving communities, defining the size of window is a challenge. This paper is to automatically identify the optimal window size in order to address the aforementioned drawbacks of existing approaches.
In the recent literature treat critical events with equal importance. However, in some applications, different communities may have their own life cycles. Critical events should thus be treated accordingly. In our paper critical events are treated based on a weighting scheme, in which the importance of events such as "appear" and "disappear" is not considered equal but based on the dynamics of communities.
Our work can be summarized as follows: 1) An approach for automatically detecting the size of the window to adopt when identifying and tracking communities over time. The size of the window here is estimated based on the numbers of nodes appearing, disappearing and remaining in the dynamic network at two consecutive, independent timestamps 2) To obtain the critical events, propose autoregressive modeling and survival analysis. Autoregressive modeling is to simulate the evolution of the community structure, and the survival analysis techniques allow the prediction of future changes the community may undergo.
3) Critical events are treated based on a weighting scheme.

RELATED WORK
There has been increasing interest in studying the evolution of community structures in dynamic social networks.The important issues are how to track communities, how to discover critical events a community can undergo over time.
S. Y. Bhat [1] propose a unified framework, HOCTracker, for tracking the evolution of hierarchical and overlapping communities in online social networks.It is a density-based approach for detecting verlapping community structures, and automatically tracks evolutionary events like birth, growth, contraction, merge, split, and death of communities. HOCTracker adapts a preliminary community structure (identified through a novel density-based overlapping community detection approach) to the changes occurring in a network and processes only active nodes for the new time step. But Time complexity is a Challenge.

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181 http://www.ijert.org P. Lee [2] model social streams as dynamically evolving post networks and model events as clusters over these networks, obtained by means of a clustering approach that is robust to the large amount of noise present in social streams. Typical cluster evolution patterns include birth, death, growth, decay, merge and split. Event detection can be viewed as a subproblem of cluster evolution tracking in social streams.It dosnot predict the future events. N. Du [3] define that a community is with high strength if it has relatively stronger internal interactions connecting its members than the external interactions with the members to the rest of the world. community strength analysis discovering how the strength of each detected community changes over the entire observation period.framework that provides reliable and consistent community strength scores.But the information provided by these studies is limited to only adjacent snapshots which cannot give us a whole picture of the community evolution.
E.G.Tajeuna [4] presented a new framework to track community structures in time-evolving social networks and to detect changes that may occur in communities. Then propose a new similarity measure, named mutual transition, for tracking the communities and rules for capturing significant transition events a community can undergo.This framework is not capable of predicting the future transitions a community may undergo.
Xiujuan Xu [5] describes the problems of dynamic social network using data mining theory and introduces a novel dynamic social network algorithm called iDBMM based on the improvement of dynamic behavioral mixed membership model algorithm (DBMM). iDBMM algorithm classifies the training set to obtain the basic characteristics of each role. Then it scores the test set relative to each role and distribute the role of the highest score to the corresponding node. Finally, the transition model is obtained by the statistical method. The experimental results are largely affected by selected characteristics. If the characteristic differences between the roles are too large, the conversion between roles (except for its owning conversion) tends to 0. If the characteristic differences between the roles are too small, the error will appear in the allocation of roles.
3. PROBLEM DEFINITION Learning the evolution of communities over time is a key step towards predicting the critical events the communities may undergo. This is an important and difficult issue in the study of social networks. In the work to date, there is a lack of formal approaches for modeling and predicting critical events over time and treat the critical events with equal importance.

PROPOSED SYSYTEM
Here various theories and techniques for (1) tracking communities, (2) estimating feature values by vector autoregression (VAR) and (3) predicting critical events by survival analysis.

Notation
Tracking communities means is to align communities at different time points in such away as to represent an evolution. For instance, a sequence S ={ ; ; ; } is considered as an evolution of community if all communities , + ∈ are similar in terms of nodes. Note that in most cases the similarity is given in terms of nodes shared.An evolving community may undergo critical events during its evolution.Understanding how these occur requires an analysis of the past history of the topological features in relation to the critical events. Model-based approaches such as VAR could thus be used to generate feature values, VARs are multi-linear autoregressive models in which each vector observation is represented as a combination of previous observations considering to be a random d-dimensional vector observed at time tn, this vector can be expressed as a linear combination of the p lag vectors − − … . . − , as follows: where is a d-by-d matrix representing the coefficients (weights) associated with the lag vectors − and is the additive Gaussian noise with zero mean.
Survival analysis is a statistical method for studying the occurrence and timing of events. Its aim is to estimate, via a probability (generally called the survivor function S()), the risk of an events occurringgiventhepasthistoryofasetoftimevaryingobservations. Formally, given the risk or hazard (λ(t)) of an events occurring at a specific time t, the survivor function is given as the cumulative risk over time: (2) Let us take a dynamic social network from which individual interrelationships are collected at regular timestamps for a duration going from to tm. At any time (i=1,...,m), we use the graph structure = ( , ) to represent the snapshot of the social network, where stands for the set of nodes and the set of edges. We then use the series G={( , )| ≤ ≤ } = ( ) ≤ ≤ to denote the dynamic social network over the whole period [ , ]. We use = { , , … . − } to denote the set of interval durations. For a fixed duration s∈ we use the notation → to mean a window W of size s that slides from left to right by a step of one time-stamp. It is worth noting that → can also be represented as Hence, the graph instance corresponding to the window instance can be expressed as follows: For each graph we define a partition{ , … } representing the communities detectedat window-stamp each detected community has and as its sets of edges and vertices, respectively.
to denote the sequence of communities that reflects the evolution of a community C from window-stamp to − + .

4.2) Determining window size.
Given a sliding window → moving through the = | + | − ( , + ) (6) Note that the larger the value of , the more the network tends to be static, which may make it impossible to capture changes such as merge, split, shrink and expand that evolving communities may undergo. In the same way, the smaller the value of , the more the network changes over time, which may result in several cases where communities are evolving in a non-consecutive way. Note that the number of nodes remaining in the graph over time, increases according to the size of the window.we first calculate thefluctuation fls ofthegraphgivenasize s oftheslidingwindow, as follows: In this work to study the various changes the evolvingcommunities are undergoing, we selected the minimum window size ̂ such that the fluctuation is bounded within[0,1]with deviation lower than a small value .
is the set of flucuations given a sliding window → and ( ) its standard deviation. Algorithm 1 to estimate the minimal window size.

4.3) Modeling critical events.
Let ={ Split, Merge, Shrink, Expand, Stable} be the set of critical events an evolving community may undergo. We assume that these events are mutually independent, which means the probability that a community may pass through one event is not affected by another event. In other words, the probability that a community may undergo an event can be evaluated independently of other events. Hence, for a critical event e in observing a community evolving over time, we will either see this event occurring or we will not. two possible responses will then be recorded when observing evolving communities: either the event e occurs (codified as 1) or it does not (codified as 0). We generalize this process by modeling the instantaneous risk that a community may undergo an event e.

4.3.1) Hazard function.
Given an event e and a sliding window → ,for each window stamp W → , we calculate the number of times ( ) the event e has occurred, which corresponds to the number of evolving communities that passed through the critical event e during the time transition from one windowstamp to the next. number of events ( ) is affected at each window-stamp by a harmful function . The counting process can thus be decomposed as follows: ( )=∧ ( ) + ( ) (9) Where ∧ ( ) is a non-decreasing predictable process, called the cumulative intensity process, and ( ) is a mean zero martingale. Considering ∧ ( )to be continuous, there exists a predictable non-negative intensity process (W) such that: Given an evolving community at each window timw stamp,this community may or may not observed.Hence the intensity of community with regard to the event e at any window-stamp W is ( | ) = ( | ) ( | ) (11) Where ( | ) takes the value 1 if has an instance at window-stamp W and 0 otherwise ( | ) is the intensity or the hazard rate for evolving community undergoing event e at window-stamp W.

4.3.2) Probability of an event occurring
The intensity process defined in (11) gives the risk that a community will pass through an event at a given time point. From this, we can then calculate, at any time point, the probability that a community will undergo an event by computing the cumulative probability ( | ).

.3) Estimation of the Cox parameters
Suppose that we observe communities from to some which is not the last window-stamp.Clearly from to we do not have a total view of all communities, which implies that the notion of likelihood cannot be explicit. Due to this constraint, we instead define the partial likelihood of the parameters ̅ = ( … ) as is a binary indicator which takes value 1 when community has an instance at window-stamp and 0 otherwise.
Having obtained the partial log-likelihood, we can approximate the parameter by solving the following recursive equation: where (it) denotes the current iteration step, " (. ) corresponds to the second derivative of the partial loglikelihood, and ′ (. ) to the first derivative of the partial loglikelihood.

4.4) Predicting an event
Once the parameters of our model have been estimated and the hazard function identified, we can calculate the probability that an evolving community will undergo a critical event at any time (observable or not) by applying (12). However, at each more distant unobservable time, all the parameters need to be re-estimated in order to calculate the new probability value. At any unobservable time, we thus predict that a given community will undergo an event e if its probability (12) of occurrence has the highest value compared to the probability of occurrence of the other events. Hence, assuming that we have observed the evolution of communities from to + − .which corresponds to the set of observable window-stamps ={ … . . }, at later unobservable times corresponding to the set of unobservable window-stamps ̂= { + , + … . . + … }, we can run Algorithm 2 to predict critical events at more distant times.

4.5) Assigning priority to Critical events.
It is a method of assigning weights, which applies hierarchy structure of analytic hierarchy process and pairwise comparison.This method has advantages that the number of comparisons can be reduced and also consistency is automatically maintained via determination of priorities first on multiple entities and subsequent comparisons between entities with adjoined priorities.
To determine priorities of multiple attributes we use the following steps.The first step to determine priority is to create a hierarchy using attributes and entities. Then setting priorities of entities within each group. After giving priorities to entities in the same group, a priority between the entities with the same priority from the different groups is determined. Thus, A, D, and G(are entites) with the highest priority in groups I, II, and III, respectively, are compared and priorities are given between them. The same practice is repeated for the entities with the second and third priorities from each group, respectively.finaly setting a priority between the entities with adjoined priorities. Once the priority of the entire entities is determined, the weight for each attribute is assigned.If weights are assigned while this priority is kept unchanged, the consistency is consequently maintained. Thus, comparisons were made between entities of adjoined priorities while the priority was maintained. In this pairwise comparison, the entity with a higher priority is given a high score and that with a lower priority in turn is given a lower relative score.

5.EXPERIMENT SETUP AND RESULT
To evaluate our proposed approach, we first determined the appropriate window size to use, After obtaining the appropriate window size for each of our networks, we detected the communities and tracked them over time in terms of nodes and the extracted features. Finally, to predict the critical events, we first detected these critical events and modeled them.
In our experiment the code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for `Facebook Politicians` and `Facebook TV Shows` are included in the `input/` directory. 6.CONCLUSION Learning the evolution of communities over time is a key step towards predicting the critical events the communities may undergo. This is an important and difficult issue in the study of social networks. In the work to date, there is a lack of formal approaches for modeling and predicting critical events over time. In this paper we proposes a model which predicting the critical events the communities may undergo. The sliding window analysis to tracking communities over time. Then autoregressive modeling and survival analysis predict not only the next event an evolving community may undergo, but future events more distant in time. The current literature treat critical events with equal importance. The advantage of our paper is we use a weighting scheme in which the importance of events such as "appear" and "disappear" is not considered equal but based on the dynamics of communities.