 Open Access
 Total Downloads : 310
 Authors : Mrs. Kanchan Jadhav, Prof. Nalini Mhetre
 Paper ID : IJERTV2IS110273
 Volume & Issue : Volume 02, Issue 11 (November 2013)
 Published (First Online): 06112013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Collective Behaviour Prediction Via Social Dimensions Extraction
Mrs. Kanchan Jadhav SKNCOE Pune,
University of Pune.
Abstract
Day by day the clicks are increasing in a particular network. Behaviour is nothing but to know the interest and requirement of users. Collective behaviour, which indicates the group of data generated on a large scale. In this framework affiliations of actors are capture by extracting social dimensions and then classify the actors using extracted dimensions. As existing approaches to extract social dimensions are not scalable and can not handle network of huge size. We solve these problem by sparsifying social dimension to make this extraction scalable by using edgecentric clustering scheme and kmeans variant algorithm. In social media, multiple modes of actors can be involved in the same network, resulting in a multimode network. In this work, we attempt to harness the predictive power of social connections to determine the preferences or behaviours of individuals such as whether a user supports a certain political view, whether one likes one product, whether he/she would like to vote for a presidential candidate, etc.
Keywords: Collective Behaviour, Affiliations, Scalable learning, Edge Clustering, kmeans variant.

Introduction
Human intention is the valuable source in the present world. As the trends in the technology are increasing, there is a need to know and resolve the issues of the visitors who leave their footprints in their fascinated areas. We have so many social networking websites. If analysis is made on the fascinated areas of different people, the united behaviour of the people can be identified. In Social
Prof. Nalini Mhetre Dept. Of computer Engg.
SKNCOE Pune,
University of Pune
networking sites and enterprises collaborative software coupled with advancement in computing communication technologies are enabling people to share information in innovative ways from wikipedia, facebook to microsoft share points the numerous advancement which are helping using sharing information. These tools and software provides ampere opportunities to study human Interactions and collective behaviour on unprecedented scale[1]. In social media usually network consist of millions of actors. Hence size of extracted social media is huge. It is not possible to hold such amount of huge information in RAM and hence it is extremely difficult to do mathematical operations. These problems are solve by sparsifying social dimension to make this extraction scalable. In this work we present edge centric approach to extract sparse social dimensions[2].
In this work, we study how networks in social media can predict some human behaviour and individual performances, that means when behaviour of some Individuals in social media is given we can conclude or predict behaviour of other Individuals in networks.
Connections in social media networks are not homogeneous for e.g. One user may have multiple connections. One for his friend, family, classmates and colleagues. Unfortunately this relative information is not fully available in reality. We also dont know why the users are connected with each other. To study this heterogeneity a new framework is designed and it is called as social dimensions. as shown in Fig 1.
468
Fig 1: Contacts of one user in a Social Network

Collective Behaviour
Collective behaviour is a behaviour of individuals in social networking environment but it is not addition of individual behaviours[3]. In connectual environment behaviour of one person is usually influenced by behaviour of friend. Another observation is that actors of same affiliation tend to connect with each other for e.g. If a person likes shoes then chances of some one who likes football can also likes shoes. it is not simply the aggregation of individual behaviours. This study of collective behaviour is to understand how individuals act in a social networking environment. Our approach follows a socialdimension based learning framework. Social dimensions are extracted to represent the potential affiliations of actors before discriminative learning occurs.

Sparse Social Dimensions
The latest trend in the social network enables us to study intersection of behaviour or interests on a large scale. The interests include connecting a person, clicking an add, joining in a group, marking friends as buddies, becoming a fan of a celebrity, searching for friends, sending gifts for their special days etc. We have a given network with the behavioural outcome of the remaining users within the same network[3].
Existing methods to extract social dimensions can be categorized into nodeview and edgeview. Node view methods concentrate on clustering nodes of a network into communities. It take one node and their
affiliation and find out in how many communities that node is involved. So it required larger memory space for these all nodes.
Edgeview methods concentrate on clustering edges of a network into communities. It find out in how many communities that edge is involved
We implement an edge centric view using kmeans clustering by using following methodologies. Table 1 shows how affiliation is represented.
Table1: shows how an affiliation is represented
Actors
Affiliation1
Affiliation2
..
Affiliationk
1
2
.
.
.
0
0.5
.
.
.
1
0.3
.
.
.
..
..
.
.
.
0.8
0
.
.
.

Communities In An Edge Centric View
Instead of directly clustering the nodes of a network into some communities, we can take an edgecentric view, i.e., partitioning the edges into disjoint sets such that each set represents one latent affiliation. SocioDim with soft clustering for social dimension extraction verified promising results, but its limitation is scalability. A network may be sparse (i.e., the density of connectivity is very low), whereas the extracted social dimensions are not sparse.
Fig 2: Toy example
Lets look at the toy network example with two communities in Figure 3. Its social dimensions following modularity maximization are shown in Table 1. There is no one of the entries is zero. Then extracting dimensions from the large network may be in big number and maintaining these many dimensions also a big problem to the persons who
469
are taking care of this work. And the memory required to do this is also a problem. Maintainability is also one of the problem for this.
Hence, we need minimum measurable dimensions. Hence, it is imperative to develop some other approach so that the extracted social dimensions are sparse.
Table2:Social dimension of Toy example
Actors
Modularity Maximization
Edge centric Clustering
1
0.1185
1
1
2
0.4043
1
0
3
0.4473
1
0
4
0.4473
1
0
5
0.3093
0
1
6
0.2628
0
1
0.1690
0
1
8
0.3241
0
1
9
0.3522
0
1
An actor is considered associated with one relationship if one of his connections is assigned to that affiliation. There are the two communities in Figure 2. In Figure 3 the dashed edges represent one affiliation, and the remaining edges denote the second affiliation.
Figure 3: Edge Clusters
In Table 2, where an entry is 1 (0) if an actor is (not) involved in that corresponding social dimension. To extract sparse social dimensions, we partition edges into disjoint sets rather instead of nodes. In addition, the extracted social dimensions following edge partition are guaranteed to be sparse. This is because the number of ones affiliations is no more than that of her connections. The density of extracted social dimension is find by following theorem.
density
=
Where k is number of social dimensions to be extracted, m is number of edges, n is number of nodes and di is the degree of node. Moreover, for many realworld networks whose node degree follows a power law distribution, the upper bound in above equation can be approximated as bellow.
–
Where > 2 is the exponent of the power law distribution.

Edge Partition Via Line Graph
In order to partition edges into disjoint sets, one way is to look at the dual view of a network, i.e., the line graph. A graph represented communities in a network can be drawn with respect to edge partition based on edge connections. An edge represents the connectivity of two vertices. Hence an edge based connections are very useful to calculate approximately the relationships of a user with different communities.
In a line graph L(G), each node corresponds to an edge in the original network G, and edges in the line graph represent the adjacency between two edges in the original graph. The line graph of the toy example is shown in Figure 4.
Figure 4: The line graph of Toy example
470
For instance, e(1, 3) and e(2, 3) are connected in the line graph as they share one terminal node 3. They are adjacent to each other. Each node in the line graph corresponds to an edge in the original graph. Following equation is use to increase many more edges in the graph.
N = m, M m
Where n denotes the number of nodes, m denotes number of connections in a network. N & M denotes number of nodes and connections in its line graph respectively.

Edge Partition Via Clustering Edge Instances
By using edge centric view it easy to identify which nodes are connected each other .Then we apply edge clustering methods for finding the similarity between the edges. Edgecentric clustering essentially treats each edge as one data instance with its ending nodes . Then a typical kmeans clustering algorithm can be applied to find out disjoint partitions. One apprehension with this scheme is that the total number of edges might be too huge.
Then, clustering algorithm like kmeans clustering can be applied to Table 3 to find disjoint partitions[1].
Table3: Edge centric view
Edge 
Feature 1 2 3 4 5 6 7 8 9 
e(1,3) 
1 0 1 0 0 0 0 0 0 
e(1,4) 
1 0 0 1 0 0 0 0 0 
e(2,3) 
0 1 1 0 0 0 0 0 0 
. 
. 
. 
.. 
. 
… 
KMeans clustering can be implemented abiding following algorithm
Input: data instances {xi1 i m}, number of clusters k
Output: {idxi}
Procedure:

Construct a mapping from features to instances

initialize the centroid of cluster {Cj 1 j k}

repeat

Reset {MaxSimi}, {idxi}

for j=1:k

identify relevant instances Sj to centroid Cj

for i in Sj

compute sim(i,Cj) of instance i and Cj

if sim(i,Cj)>MaxSimi

MaxSimi = sim(i,Cj)

idxi = j;

for i=1:m 1 i m}, number of clusters k

update centroid Cidxi

until change of objective value <
Algorithm of Scalable kmeans Variant
As a simple kmeans is used to extract social dimensions, it is easy to update social dimensions if a given network changes. If a new member joins the network and a new connection emerges, we can simply assign the new edge to the corresponding clusters. The updation of centroids when the new connection is appear also straightforward. This k means scheme is mainly appropriate for dynamic large scale networks[1].
The detailed algorithm is summarized.
Input: network data, labels of some nodes, number of social dimensions;
Output: labels of unlabeled nodes.
Procedure:
1. Convert network into edgecentric view. 2.Perform edge clustering as in above algorithm.

Construct social dimensions based on edge partition. A node belongs to one community as long as any of its neighbouring edges is in that community.

Apply regularization to social dimensions.

Construct classifier based on social dimensions of labelled nodes.

Use the classifier to predict labels of unlabeled ones based on their social dimensions. Algorithm for Learning of Collective Behaviour
471

Regularization On Communities
After clustering of the edges we construct classifier based on the social dimensions. We designed a new classifier
The extracted social dimensions are treated as features of nodes. In order to handle largescale data with high dimensionality and vast numbers of instances, we adopt a linear SVM, which can be finished in linear time generally; the larger a community is, the weaker the connections within the community are. so we build an SVM classifier for the classification.

Conclusion
We propose an edgecentric clustering scheme to extract social dimensions and a scalable kmeans variant to handle edge clustering. ,in the propose framework we introduced new edge centric based classification. Compared to existing algorithms it more advantageous and reduce time for unlabeled edge classification. We used incremental clustering for grouping the labelled edge, it is one of the best clustering algorithm. We tested theoretically and give best results.
With this edgecentric view, we guaranteed that the extracted social dimensions are to be sparse. An incomparable advantage of our model is that it easily scales to handle networks with millions of actors while the earlier models fail. This scalable approach offers a feasible solution to effective learning of online collective behaviour on a large scale.
In social media, multiple modes of actors can be involved in the same network, resulting in a multi mode network. For instance, in YouTube, users, videos, tags, and comments are intertwined with each other in coexistence. Extending the edge centric clustering scheme to address this object heterogeneity can be a promising future direction.

References

Lei Tang, Xufei Wang, and Huan Liu, Scalable Learning of Collective Behavior, 2012.

L. Tang and H. Liu, Toward predicting collective behavior via social dimension extraction, IEEE Intelligent Systems, vol. 25,pp. 1925, 2010

L. Tang and H. Liu, Scalable learning of collective behavior based on sparse social dimensions, in CIKM 09: Proceeding of the 18th
ACM conference on Information and knowledge management. New York, NY, USA: ACM, 2009.

M. Newman, Power laws, Pareto distributions and Zipfs law, Contemporary physics vol. 46, no. 5, pp. 323352, 2005.

T. Evans and R. Lambiotte, Line graphs, link partitions, and overlapping communities, Physical Review E, vol. 80, no. 1, p.16105, 2009

L. Tang, S. Rajan, and V. K. Narayanan, Large scale multilabel classification via metalabeler, in WWW 09: Proceedings of the 18th international conference on World Wide Web. New York, NY,
USA: ACM, 2009

M. Newman, Finding community structure in networks using the eigenvectors of atrices, Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), vol. 74, no. 3, 2006.

X. Zhu, Semisupervised learning literature survey, 2006.

F. Sebastiani, Machine learning in automated text categorization, ACM Comput. Surv., vol. 34, no. 1, pp. 147, 2002.

S. Fortunato, Community detection in graphs, Physics Reports, vol. 486, no. 35, pp. 75
174, 2010.

J. Bentley, Multidimensional binary search trees used for associative searching, Comm.
ACM, vol. 18, pp. 509175, 1975.

T. Kanungo, D. M. Mount, N. S. Netanyahu, C.
D. Piatko, R. Silverman, and A. Y. Wu, An efficient kmeans clustering algorithm: Analysis and implementation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 881 892, 2002.

P. Bradley, U. Fayyad, and C. Reina, Scaling clustering algorithms to large databases, in ACM KDD Conference, 1998.
472