Predicting Critical Events in Dynamic Social Network

DOI : 10.17577/IJERTV9IS060989

Download Full-Text PDF Cite this Publication

Text Only Version

Predicting Critical Events in Dynamic Social Network

Nedha Abdulla. P

Department of Computer Science and Engineering AWH Engineering College

Kozhikode, India

Sona. P

Department of Computer Science and Engineering AWH Engineering College

Kozhikode, India

Abstract A social network is usually conceived as a graph in which individuals in the network are represented by the nodes and the nodes are connected to each other by links which depict the relations among the individuals. The term community for any group of nodes that are densely connected among themselves and sparsely connected to others. As time evolves, communities in a social network may undergo various changes (split, expand, shrink, stable, merge) known as critical events. Prediction of critical events is an important and difcult issue in the study of social networks. This paper proposes a sliding window analysis, an autoregressive model and survival analysis techniques. The autoregressive model is here to simulate the evolution of the community structure, and the survival analysis techniques allow the prediction of future changes the community may undergo. In our approach Critical events are treated based on a weighting scheme.

Keywords Dynamic Network; Community Evolution; critical events; Predict

  1. INTRODUCTION

    A social network is a social structure of people related to each other through a common relationship or interest. Usually, a social network is conceived as a graph in which individuals in the network are represented by the nodes. The relations among the individuals are represented by links. Social network analysis is the interactions between people and groups of people, as well as the associated resources for understanding their behavior. Tracking community structures over time and predicting their future changes has important applications in various domains such as criminology, public health, education. In a dynamic and evolving nature of online social networks with time, as most often i) new members join the network, ii) existing members leave the network, and iii) members establish/break ties and/or change intensity/weight of interactions with other members. The term community for any group of nodes that are densely connected among themselves and sparsely connected to others. A community structure can be drastically affected by changes in nodes and variations in their links, such as appearing and disappearing over time. Hence, from one time point to another time point

    , > , a community can split into several other

    communities, expand into a larger community, shrink to smaller community or remain constant in the same way, several communities can merge into one community. We call these (split, expand, shrink, stable, merge) critical events which communities may undergo over time.

    For learning the evolution of communities over time and predicting the critical events the communities may undergo. this paper proposes a sliding window analysis, an

    autoregressive model and survival analysis techniques. The autoregressive model is here to simulate the evolution of the community structure, whereas the survival analysis techniques allow the prediction of future changes the community may undergo.

    Sliding window concept is useful to track consistent evolving communities, dening the size of window is a challenge. This paper is to automatically identify the optimal window size in order to address the aforementioned drawbacks of existing approaches.

    In the recent literature treat critical events with equal importance. However, in some applications, different communities may have their own life cycles. Critical events should thus be treated accordingly. In our paper critical events are treated based on a weighting scheme, in which the importance of events such as appear and disappear is not considered equal but based on the dynamics of communities.

    Our work can be summarized as follows:

    1. An approach for automatically detecting the size of the window to adopt when identifying and tracking communities over time. The size of the window here is estimated based on the numbers of nodes appearing, disappearing and remaining in the dynamic network at two consecutive, independent time- stamps

    2. To obtain the critical events, propose autoregressive modeling and survival analysis. Autoregressive modeling is to simulate the evolution of the community structure, and the survival analysis techniques allow the prediction of future changes the community may undergo.

    3. Critical events are treated based on a weighting scheme.

  2. RELATED WORK

    There has been increasing interest in studying the evolution of community structures in dynamic social networks.The important issues are how to track communities, how to discover critical events a community can undergo over time.

    S. Y. Bhat [1] propose a unied framework, HOCTracker, for tracking the evolution of hierarchical and overlapping communities in online social networks.It is a density-based approach for detecting verlapping community structures, and automatically tracks evolutionary events like birth, growth, contraction, merge, split, and death of communities. HOCTracker adapts a preliminary community structure (identified through a novel density-based overlapping community detection approach) to the changes occurring in a network and processes only active nodes for the new time step. But Time complexity is a Challenge.

    P. Lee [2] model social streams as dynamically evolving post networks and model events as clusters over these networks, obtained by means of a clustering approach that is robust to the large amount of noise present in social streams. Typical cluster evolution patterns include birth, death, growth, decay, merge and split. Event detection can be viewed as a subproblem of cluster evolution tracking in social streams.It dosnot predict the future events.

    N. Du [3] dene that a community is with high strength if it has relatively stronger internal interactions connecting its members than the external interactions with the members to

    cases the similarity is given in terms of nodes shared.An evolving community may undergo critical events during its evolution.Understanding how these occur requires an analysis of the past history of the topological features in relation to the critical events. Model-based approaches such as VAR could thus be used to generate feature values, VARs are multi-linear autoregressive models in which each vector observation is represented as a combination of previous observations considering to be a random d-dimensional vector observed at time tn, this vector can be expressed as a linear combination of the p lag vectors . . , as follows:

    the rest of the world. community strength analysis discovering

    how the strength of each detected community changes over the

    =

    +

    +

    =

    (1)

    entire observation period.framework that provides reliable and consistent community strength scores.But the information provided by these studies is limited to only adjacent snapshots which cannot give us a whole picture of the community evolution.

    E.G.Tajeuna [4] presented a new framework to track community structures in time-evolving social networks and to detect changes that may occur in communities. Then propose a new similarity measure, named mutual transition, for tracking the communities and rules for capturing signicant transition events a community can undergo.This framework is not capble of predicting the future transitions a community may undergo.

    Xiujuan Xu[5] describes the problems of dynamic social network using data mining theory and introduces a novel dynamic social network algorithm called iDBMM based on the improvement of dynamic behavioral mixed membership model algorithm (DBMM). iDBMM algorithm classifies the training set to obtain the basic characteristics of each role. Then it scores the test set relative to each role and distribute the role of the highest score to the corresponding node. Finally, the transition model is obtained by the statistical method. The experimental results are largely affected by selected characteristics. If the characteristic differences between the roles are too large, the conversion between roles (except for its owning conversion) tends to 0. If the characteristic differences between the roles are too small, the error will appear in the allocation of roles.

    where is a d-by-d matrix representing the coefcients (weights) associated with the lag vectors and is the additive Gaussian noise with zero mean.

    Survival analysis is a statistical method for studying the occurrence and timing of events. Its aim is to estimate, via a probability (generally called the survivor function S()), the risk of an events occurringgiventhepasthistoryofasetoftime- varyingobservations. Formally, given the risk or hazard ((t)) of an events occurring at a specic time t, the survivor function is given as the cumulative risk over time:

    S(t)=1- ( ()() ) (2)

    Let us take a dynamic social network from which individual interrelationships are collected at regular time- stamps for a duration going from to tm. At any time

    (i=1,…,m), we use the graph structure = (, ) to represent the snapshot of the social network, where stands for the set of nodes and the set of edges. We then use the series G={(, )| } = () to denote the dynamic social network over the whole period [, ]. We use

    = {, , . } to denote the set of interval durations.

    For a xed duration s we use the notation to mean a window W of size s that slides from left to right by a step of one time-stamp. It is worth noting that can also be represented as the set

    {{ . { . +} . . {+ . }} =

    {, , +} where each is an instance of .

    =

    =

    Hence, the graph instance corresponding to the window instance can be expressed as follows:

  3. PROBLEM DEFINITION

    Learning the evolution of communities over time is a key

    = +

    (3)

    step towards predicting the critical events the communities

    For each graph we dene a partition{ , }

    may undergo. This is an important and difcult issue in the study of social networks. In the work to date, there is a lack of

    representing the communities detectedat window-stamp

    each detected community has and as its sets of

    formal approaches for modeling and predicting critical events

    over time and treat the critical events with equal importance.

  4. PROPOSED SYSYTEM

    Here various theories and techniques for (1) tracking communities, (2) estimating feature values by vector autoregression (VAR) and (3) predicting critical events by survival analysis.

    edges and vertices, respectively. to denote the sequence of communities that reects the evolution of a community C from window-stamp to +.

    4.2) Determining window size.

    Given a sliding window moving through the interval [] for any window-stamp the set of nodes

    observed at this instance are given by =+{ }.At the

    =

    4.1 Notation

    Tracking communities means is to align communities at different time points in such away as to represent an evolution. For instance, a sequence S ={; ; ; } is considered as an evolution of community if all communities

    , + are similar in terms of nodes. Note that in most

    next window stamp + we have the set of nodes given by

    + =

    + =

    =+{ }. At each step in which a window of size s

    moves from step j to step j+1, we then calculate the numbers of nodes that remain ( ),disappear( ),and appear( ) in the graph as follows:

    ( ,

    )= |

    | (4)

    ()= () + () (9)

    +

    +

    (, +)=| ( +) |

    =|| (, +) (5)

    (, +) = |+ ( +) |

    +

    +

    = |+| ( , ) (6)

    Where () is a non-decreasing predictable process, called the cumulative intensity process, and () is a mean zero martingale. Considering ()to be continuous, there exists a predictable non-negative intensity process (W) such that:

    Note that the larger the value of , the more the network

    tends to be static, which may make it impossible to capture

    ( )= ( )

    ( )= ( )

    (10)

    changes such as merge, split, shrink and expand that evolving communities may undergo. In the same way, the smaller the

    value of , the more the network changes over time, which

    Given an evolving community at each window timw stamp,this community may or may not observed.Hence the intensity of community with regard to the event e at any

    may result in

    several cases where communities are evolving in

    window-stamp W is

    a non-consecutive way. Note that the number of nodes remaining in the graph over time, increases according to the size of the window.we rst calculate theuctuation fls ofthegraphgivenasize s oftheslidingwindow, as follows:

    ( , )× ( , )

    (|) = (|)(|) (11)

    Where (|) takes the value 1 if has an instance at window-stamp W and 0 otherwise (|) is the intensity or the hazard rate for evolving community undergoing

    event e at window-stamp W.

    = (,

    +

    )= + +

    Algorithm 1: Determining window size.

    Algorithm 1: Determining window size.

    (,+)

    =1- ||+|+| + || . |+|

    (7)

    (,+) (,+)

    Note that if the network is static, i.e., when

    =| |=| |,theuctuationis 0. When the graph is highly

    1: Input: G

    2: Output: Size 3: for s do 4: size

    5: {}

    1: Input: G

    2: Output: Size 3: for s do 4: size

    5: {}

    // The graph at different timestamp

    // The windows size

    // The graph at different timestamp

    // The windows size

    +

    changing, =|

    | =N | |,| where N is the

    +

    +

    minimum number of nodes remaining in the network.Hence the uctuation is || . |+|. Hence, the uctuation is always

    bounded in[0, | | . |+| ]

    In this work

    to study the various changes the

    evolvingcommunities are undergoing, we selected the minimum window size such that the fluctuation is bounded within[0,1]with deviation lower than a small value .

    = { . () < (8)

    is the set of flucuations given a sliding window and () its standard deviation. Algorithm 1 to estimate the minimal window size.

      1. Modeling critical events.

        Let={ Split, Merge, Shrink, Expand, Stable} be the set of critical events an evolving community may undergo. We assume that these events are mutually independent, which means the probability that a commuity may pass through one event is not affected by another event. In other words, the probability that a community may undergo an event can be evaluated independently of other events. Hence, for a critical event e in observing a community evolving over time, we will either see this event occurring or we will not. two possible responses will then be recorded when observing evolving communities: either the event e occurs (codied as 1) or it does not (codied as 0). We generalize this process by modeling the instantaneous risk that a community may undergo an event e.

        1. Hazard function.

          Given an event e and a sliding window ,for each window stamp W , we calculate the number of times

          () the event e has occurred, which corresponds to the number of evolving communities that passed through the critical event e during the time transition from one window- stamp to the next. number of events () is affected at each window-stamp by a harmful function .

          The counting process can thus be d omposed as follows:

          //Initializ the set of fluctuations

          //Initializ the set of fluctuations

          6: for , + do

          7: Resize the graphs +

          8: Calculate the uctuation using (7) //equation 7 9: {}

          10: end for

          11: Dev () // Calculate the variation 12:if Dev< and ,0 13: break //stop the algorithm

          14: end if

          15: end for

          16: Return Size

          6: for , + do

          7: Resize the graphs +

          8: Calculate the uctuation using (7) //equation 7 9: {}

          10: end for

          11: Dev () // Calculate the variation 12:if Dev< and ,0 13: break //stop the algorithm

          14: end if

          15: end for

          16: Return Size

        2. Probability of an event occurring

          The intensity process dened in (11) gives the risk that a community will pass through an event at a given time point. From this, we can then calculate, at any time point, the probability that a community will undergo an event by computing the cumulative probability (|).

          (|) =1-exp {-

          (|)(|) } (12)

          ec

          Algorithm 2: Predicting critical events

          1: Input: Ec, Cre,Ow, Ow // Sets of evolving communities, critical events, observable and non-observable window- stamps.

          2: Output:Prc //Set of predicted events 3: Initialization

          Algorithm 2: Predicting critical events

          1: Input: Ec, Cre,Ow, Ow // Sets of evolving communities, critical events, observable and non-observable window- stamps.

          2: Output:Prc //Set of predicted events 3: Initialization

          Consider a new d-dimensional space.The features extracted from a given community by the vector

          = (., . . . .).To calculate the risk at a more

          distant unobservable timewe need to predict the feature values

          M Ow, parameters Prc {}

          Par {}

          M Ow, parameters Prc {}

          Par {}

          //First period used to estimate the

          //First period used to estimate the

          +

          +

          at that unobservable time. For this, we assume that the compositional data vector can be written as a linear combination of the p lag compositional vectors.

          =

          =

          (13)

        3. Estimation of the Cox parameters

          Suppose that we observe communities from to some

          4: Estimating parameters

          • (a) Calculate the VAR parameters

          • (b) Estimate the cox parameters at the period M for each events using (15) //equation 15

            Par Par {{(a), (b)}}

            5: Updating & Predicting

          • Updating

          4: Estimating parameters

          • (a) Calculate the VAR parameters

          • (b) Estimate the cox parameters at the period M for each events using (15) //equation 15

            Par Par {{(a), (b)}}

            5: Updating & Predicting

          • Updating

          which is not the last window-stamp.Clearly from to we do not have a total view of all communities, which implies that the notion of likelihood cannot be explicit. Due to this constraint, we instead dene the partial likelihood of the

          parameters = ( ) as

          ( ) = {

          =

          [ ( |) ( ) (14)

          For each evolving communitySc1 Ec,at each

          For each evolving communitySc1 Ec,at each

          wj

          wj

          window stamp W Ow

          MM {M}

          window stamp W Ow

          MM {M}

          Where is a binary indicator which takes value 1 when community has an instance at window-stamp and 0 otherwise.

          //Update period

          //Update period

          • Calculate the communitys features using(13) at

            W //Update features values at next window-stamp,

          • Run step 4 to update parameters

          • Predicting

          • Calculate the communitys features using(13) at

            W //Update features values at next window-stamp,

          • Run step 4 to update parameters

          • Predicting

          Having obtained the partial log-likelihood, we can approximate the parameter by solving the following

          recursive equation:

          ( )(+) =( )()

          {" (( ())) (( ()))} (15)

          For each evolving communitySc1 Ec,at each

          For each evolving communitySc1 Ec,at each

          wj

          window stamp W Ow and for each event e Cre

          • calculate the probability Se using(12)

          wj

          window stamp W Ow and for each event e Cre

          • calculate the probability Se using(12)

          where (it) denotes the current iteration step, " (. ) corresponds to the second derivative of the partial log- likelihood, and (. ) to the rst derivative of the partial

          loglikelihood.

      2. Predicting an event

        Once the parameters of our model have been estimated and the hazard function identied, we can calculate the probability that an evolving community will undergo a critical event at any time (observable or not) by applying (12). However, at each more distant unobservable time, all the parameters need to be re-estimated in order to calculate the new probability value. At any unobservable time, we thus predict that a given community will undergo an event e if its probability (12) of occurrence has the highest value compared to the probability of occurrence of the other events. Hence, assuming that we have observed the evolution of communities from to +.which corresponds to the set of observable window-stamps

        ={ . . }, at later unobservable times corresponding to the set of unobservable window-stamps

        ={+, + . . + },

        we can run Algorithm 2 to predict critical events at more

        distant times.

        • Prc Prc max {Se, e|e Crc} // select

        the event having highest Se.

        • Prc Prc max {Se, e|e Crc} // select

        the event having highest Se.

      3. Assigning priority to Critical events.

    It is a method of assigning weights, which applies hierarchy structure of analytic hierarchy process and pairwise comparison.This method has advantages that the number of comparisons can be reduced and also consistency is automatically maintained via determination of priorities first on multiple entities and subsequent comparisons between entities with adjoined priorities.

    To determine priorities of multiple attributes we use the following steps.The first step to determine priority is to create a hierarchy using attributes and entities.Then setting priorities of entities within each group. After giving priorities to entities in the same group, a priority between the entities with the same priority from the different groups is determined. Thus, A, D, and G(are entites) with the highest priority in groups I, II, and III, respectively, are compared and priorities are given between them. The same practice is repeated for the entities with the second and third priorities from each group, respectively.finaly setting a priority between the entities with adjoined priorities.

    Once the priority of the entire entities is determined, the weight for each attribute is assigned.If weights are assigned while this priority is kept unchanged, the consistency is consequently maintained. Thus, comparisons were made between entities of adjoined priorities while the priority was maintained. In this pairwise comparison, the entity with a higher priority is given a high score and that with a lower priority in turn is given a lower relative score.

    Algorithm 3: Assigning priority to Critical events.

    1: Determination of Priorities

    • Create hierarchical structuring

    • setting priorities of entities within each group

    • setting a priority between the entities with the same priority from the different groups

    • setting a priority between the entities with adjoined priorities

    2: Assigning weights for each attribute.

    Algorithm 3: Assigning priority to Critical events.

    1: Determination of Priorities

    • Create hierarchical structuring

    • setting priorities of entities within each group

    • setting a priority between the entities with the same priority from the different groups

    • setting a priority between the entities with adjoined priorities

    2: Assigning weights for each attribute.

  5. EXPERIMENT SETUP AND RESULT

    To evaluate our proposed approach, we rst determined the appropriate window size to use, After obtaining the appropriate window size for each of our networks, we detected the communities and tracked them over time in terms of nodes and the extracted features. Finally, to predict the critical events, we rst detected these critical events and modeled them.

    In our experiment the code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for

    `Facebook Politicians` and `Facebook TV Shows` are included in the `input/` directory.

    Figure 1:input and output options

    Figure 2: With default resolution 1.0

    Figure 3: Training a model with higher resolution 2.5

    Figure 4: Training a model with a lower resolution 0.5

    Figure 5: Training a model on the Facebook TV shows dataset

    Figure 6: And finally the plot

    Figure 7: The output get saved as a json file as shown below

  6. CONCLUSION

Learning the evolution of communities over time is a key step towards predicting the critical events the communities may undergo. This is an important and difcult issue in the study of social networks. In the work to date, there is a lack of formal approaches for modeling and predicting critical events over time. In this paper we proposes a model which predicting the critical events the communities may undergo. The sliding window analysis to tracking communities over time. Then autoregressive modeling and survival analysis predict not only the next event an evolving community may undergo, but future events more distant in time. The current literature treat critical events with equal importance. The advantage of our paper is we

use a weighting scheme in which the importance of events such as appear and disappear is not considered equal but based on the dynamics of communities.

REFERENCES

  1. S. Y. Bhat and M. Abulaish, Hoctracker: Tracking the evolution of hierarchical and overlapping communities in dynamic social networks, IEEE Transactions on Knowledge and Data engineering, vol. 27, no. 4, pp. 10191013, 2015

  2. P. Lee, L. V. Lakshmanan, and E. E. Milios, Incremental cluster evolution tracking from highly dynamic network data, in Proceedings of the IEEE 30th International Conference on Data Engineering (ICDE), 2014, pp. 314

  3. N. Du, X. Jia, J. Gao, V. Gopalakrishnan, and A. Zhang, Tracking temporal community strength in dynamic networks, IEEE Transactions on Knowledge & Data Engineering, no. 1, pp. 11, 2015.

  4. E.G.Tajeuna, M.Bouguessa,andS.Wang,Tracking the evolution of community structures in time-evolving social networks,in Proceedings of the IEE International Conference on Data Science and Advanced Analytics (DSAA), 2015, pp. 110.

  5. X.Xu, WeiWang, YuLiu, HongYu, XiaoweiZhao, iDBMM: aNovel Algorithm to Model Dynamic Behavior in Large Evolving Graphsin 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure Computing.

Leave a Reply