Filtering Unwanted Multimedia Messages from Online Social Network User Walls

DOI : 10.17577/IJERTV3IS050034

Download Full-Text PDF Cite this Publication

Text Only Version

Filtering Unwanted Multimedia Messages from Online Social Network User Walls

1Mr. N. Venkateswarulu

Asst.Professor, Dept of Computer Science & Engineering G.Narayanamma Institute of Technology & Science Hyderabad, Andhra Pradesh, India

2J. Divya

PG Scholar, Dept of Computer Science & Engineering G.Narayanamma Institute of Technology & Science

Hyderabad, Andhra Pradesh, India

Abstract – In the present day scenario online social networks (OSN) are very popular and one of the most interactive medium to share, communicate and exchange numerous types of information like text, image, audio, video etc. All these publicly shared information are explicitly viewed by connected people in the blog or networks and having an enormous social impact in human mind. Posting or commenting on particular public/private areas called wall, may include superfluous messages or sensitive data. Information filtering can therefore have a solid influence in online social networks and it can be used to give users the facility to organize the messages written on public areas by filtering out unwanted wordings. In this paper, we have proposed a system which may allow OSN users to have a direct control on posting or commenting on their walls with the help of information filtering. Whenever user posts a message it will be intercepted by the filtered wall, and applies Filtering and Black List Rules to the message. If it is not violated by filtering and black list rules, then the message will be displayed on user walls.

Keywords: Content Based Message Filtering, Demographic Filtering, Collaborative Filtering.

1 INTRODUCTION

A social networking service is a platform to build social networks or social relations among people who, for example, share interests, activities and distribute a considerable amount of human life information. Daily and continuous communications imply the exchange of several types of content, including free text, image, audio, and video data. With the rapid growth of social media, users especially adolescents are spending significant amount of time on various social networking sites to connect with others, to share information, and to pursue common interests. OSNs provide very little support to prevent unwanted messages on user walls. A main part of social network content is constituted by short text, a notable example are the messages permanently written by OSN users on particular public or private areas, called in general walls. With the lack of classification or filtering tools, the user receives all messages posted by the users he follows. In most cases, the user receive a noisy stream of updates. There is a need to develop more security mechanisms for different communication technologies, particularly online social networks. Therefore a major task of todays Online Social Networks (OSN) is information filtering. Information filtering has been greatly explored for textual

documents and more recently, web content[1][2][3]. Filtering messages can be used to give users the ability to

automatically control the messages written on users walls, by filtering out unwanted messages.

Filtered wall is proposed for OSN users to have a direct control on the messages posted on their walls. For filtering mechanism filtered wall uses Machine Learning technique for assigning categories to each message, and also uses Filtering rules so that user can explicitly specify which contents should not be displayed on their walls. Filtered wall also contains Black List Rules for blocking particular user up to certain period of time. The proposed system gives security to the On-line Social Networks.

  1. EXISTING SYSTEM

    Online social Networks (OSNs) provide very little support to prevent unwanted messages on user walls. For example, facebook allows people to post any kind of messages and can also share and upload photos to the user wall i.e., from friends, friends of friends or defined groups. Although it does not provide any content-based support and therefore it is not possible to avert undesired messages, such as political, general advertisements, product based advertisements, no matter of the user who posts them. Providing this service is not only a matter of using previously defined web content mining techniques for a different application, rather it requires to design adhoc classification strategies. This is because wall messages are composed by short text for which traditional classification methods have serious limitations since short texts do not provide adequate word occurrences. In Existing System there is no mechanism for filtering unwanted content in user walls.

  2. PROPOSED SYSTEM

    Filtered Wall (FW) architecture has been proposed for filtering unwanted messages from OSN user walls. Filtered wall architecture utilizes Machine Learning (ML) techniques for text categorization to automatically assign a category to each message according to its content. The prime efforts in building a resilient short text classifier (STC) are concentrated in the extraction and selection of a set of characterizing and distinguish features. Filtered wall uses neural learning model, which is today recognized as one of the most efficient solutions in text classification.

    The overall short text classification strategy is based on Radial Basis Function Networks (RBFN)[5][10].Besides classification facilities, the system present a powerful rule layer exploiting a flexible language to specify Filtering Rules (FRs), by which users can specify what contents, should not be unveil on their walls. In addition, the system furnish the support for user-defined Blacklists (BLs), that is, the list of users that are temporarily halt to post any kind of messages on a user wall. Different semiology for filtering rules to better fit the considered domain, an online setup assistant (OSA) is used to help users in FR specification. The main components of Proposed System are Short Text Classification (STC), Content Based Message Filtering (CBMF), Collaborative Filtering, Filtering Rules (FRs), and Black list Rules (BLs)[12].In addition to text Filtering this paper also implements how to filter the text of a given image[17][19][20], and also Inserts particular objects in the Black list for avoiding specific advertisements.

  3. WHAT IS INFORMATION FILTERING ?

    An Information filtering system is a system that removes unwanted information from an information stream using (semi) automated or computerized methods prior to presentation to a human user. In social networking sites user may get different types of messages which may be unrelated or may have different meanings, so user does not have any use with that type of messages, so user should have one mechanism for avoiding unwanted messages i.e., Information Filtering. Information Filtering first stores the user preferences of items, based on the preferences it filters unwanted data and accepts only recommended items. These recommended items are only displayed on the user wall. Why we need Information Filtering means it saves user time and accepts only user interested items.

    Information Filtering

  4. FILTERED WALL ARCHITECTURE

Filtered wall architecture filters the unwanted messages from online social networks. It consists of three layers.

  • Social Network Manager (SNM).

  • Social Network Applications (SNA).

  • Graphical User Interface (GUI).

Filtered wall architecture

Social Network Manager (SNM) provides basic OSN functionalities, and it represents the user profile as a social graph i.e., eachnode represents network user and edges represents relationship between two users. It maintains the data related to user profile and provides the data to the second layer for applying filtering rules (FR) and Blacklists (BL). Second layer composed of Content Base Message Filtering (CBMF) and a short text classifier (STC).Third layer consists of graphical user interface by which user provide his input and is able to see published wall messages. Additionally GUI provides user the facility to apply filtering rules for his wall messages and helps to provide list of BL user who are temporally prevented to publish messages on users wall. The GUI also consists of Filtered Wall (FW) where the user is able to see his desirable messages.

As per the filtered wall architecture, when the user post a message on a private wall of his or her contact it is intercepted by the filtered wall. Then a short text classifier categories a message according to its content and CBMF applies FR and BL as per the data provided by the third layer. Based on the result of above step the message is published or filtered by FW.

    1. Short Text Classifier (STC)

      Short text classifier consists of two components.

      • Text Representation

      • Machine Learning Classification

        In Text Representation Short Text Classifier extracts the features of text by using vector space model. Machine Learning Classification classifies messages based on Radial Basis Function Network Method.

        Text Representation

        In automatic text classification, it has been proved that the term is the best unit for text representation and classification [6].Though a text document expresses vast range of information, unfortunately, it lacks the imposed structure of traditional database. Therefore, unstructured data, particularly free running text data has to be transformed into a structured data [15]. To do this, many preprocessing techniques are proposed in literature [7,8].After converting an unstructured data into a structured data, we need to have an effective document representation model to build an efficient classification system.

        Text representation extracts three types of features, Bag of Words (Bow), Document properties (Dp) and Contextual Features (CF)[4][8][9][10]. The first two types of features are endogenous, that is, they are completely derived from the information contained within the text of the message. Bag of Word (Bow) is one of the basic methods of representing a document. The Bow is used to form a vector representing a document using the frequency count of each term in the document.

        The underlying model for text representation is the Vector Space Model (VSM)[16][11].In the vector space model a document D is represented as an m- dimensional vector, where each dimension corresponds to a distinct term and m is the total number of terms used in the collection of documents. The document vector is written as, where wi is the weight of term ti that indicates its importance. If document D does not contain term ti then weight wi is zero. Term weights can be determined by using the tf-idf scheme. In the Boolean vector approach

        the terms are assigned a weight that is based on how often a term appears in a particular document and how

        frequently it occurs in the entire document collection. Value 1 is assigned to the term if it does occurs in a document, otherwise value 0 is assigned to the term. A more sophisticated measure is the tf-idf scheme. tf is called the term frequency tfi, i.e., the number of occurrences of term ti in document D. idf is called the inverse document frequency and is calculated as follows.

        idfi = log(n/ dfi)

        where n is the total number of documents in the collection and dfi the number of documents in which term appears at least once. The weighting factor wi of document i is determined by the product of the term frequency and the inverse document frequency. In the Bow representation, terms are identified with words. In the case of nonbinary weighting, the weight wkj of term tk in document dj is computed according to the standard term frequency inverse document frequency (tf-idf) weighting function, defined as

        tf idf(tk,dj) = #(tk, dj).log. |Tr |/ #Tr(tk)

        where #(tk, dj) denotes the number of times tk occurs in dj, and #T r(tk) denotes the document frequency of term tk, i.e., the number of documents in Tr in which tk occurs.

        Machine Learning Classification

        Short text classification is a hierarchical two level classification. In the first level Radial Basis Function Network (RBFN) classifies whether a message is neutral or non neutral, in the second level, Non neutral messages are classified producing gradual estimates of appropriateness to each of the considered category.

        RBFNs have a single hidden layer of processing units with local, restricted activation domain, a Gaussian function is commonly used[12]. RBFN main advantages are that classification function is nonlinear, the model may produce confidence values and it may be robust to outliers. The first-level classifier is then structured as a regular RBFN[13]. In the second level of the classification stage, a modification to the standard use of RBFN[6]. Its regular use in classification includes a hard decision on the output values, according to the winner-take-all rule[14], a given input pattern is assigned with the class corresponding to the winner output neuron which has the highest value. In proposed approach it considers all values of the output neurons as a result of the classification task and interpret them as gradual estimation of multi membership to classes. The collection of preclassified messages presents some critical aspects greatly affecting the performance of the overall classification strategy.

        The overall classification strategy as follows. Let be the set of classes to which each message can belong to.

        Each element of the supervised collected set of messages D={(mi,yi) . . . (m|D|,y|D|)}

        is composed of the text mi and the supervised label yi{0,1}|| describing the belongingness to each of the defined classes. The set D is then split into two partitions, namely the training set TrSD and the test set TeSD. The performance of two levels are calculated by using training set.

    2. Content Based Message Filtering (CBMF)

      Content-based filtering, also referred as cognitive filtering, recommends items based on a comparison between the user profile and content of the items. Each items content is represented as a set of descriptors or terms, typically the words that occur in a document [7][18]. There are several ways in which terms can be represented in order to be used as a basis for the learning component. A representation method that is often used is the vector space model. In addition to this We use another approach i.e., categorizing text in a Local Language (Natural Language Processing)[21][22][23].

    3. Collaborative Filtering

      Unlike content-based recommendation methods, collaborative recommender systems [2][18] (or collaborative filtering systems) try to predict the utility of items for a particular user based on the items previously rated by other users. More formally, the utility u(c, s) of item s for user c is estimated based on the utilities u(cj, s)

      assigned to item s by those users cjC who are similar to user c.

    4. Demographic Filtering

      Demographic filtering allows users to establish criteria to sort information by age, gender and education to identify the types of users that like a certain item [18].

    5. Filtering Rules (FRs)

      Filtered wall provides a powerful rule layer that uses a flexible language to define Filtering Rules (FRs), by which users can specify which contents should not be present on their walls. users can create their own rules[1].This implies to specify conditions on depth, type and trust values of the relationship(s) creators should be involved in order to apply them the specified rules.

      Definition 1 (Creator specfication)

      A creator specification creatorSpec absolutely denotes a set of OSN users. It can have one of the following forms, possibly combined [1].

      1. A set of attribute constraints of the form {an OP av} an is a attribute name of user profile.

        OP is a comparison operator, compatible with ans domain. av is a attribute value of user profile.

      2. Relationship constraints of the set consists of (m, rt, minDepth,maxTrust) denotes all participating OSN users with user m of relationship type rt having depth greater than or equal to minDepth, and trust value lessthan or equal to maxtrust.

      Definition 2 (Filtering rule)

      A filtering rule FR is a tuple (author, creatorSpec, contentSpec, action) where

      • author is the user who specifies the rule.

      • creatorSpec is a creator specification, itemize according to Definition 1.

      • contentSpec is a content consaraints of the form(C,ml) of a Boolean expression where C represents class of the first or second level and ml represents the minimum membership level threshold required for class C to make the constraint satisfied.

      • Action {block, notify} specifies the action to be performed by the system on the messages matching content Spec and created by users identified by creatorSpec.

    6. Black list Rules (BLs)

BL users are those users whose messages are prevented independent from their contents. BL rules enable the wall owner to determine users to be blocked on the basis of their profiles and relationship with wall owner. This banning can be done for a specified period or forever according wall owners desire. Like FR, BL is also dependent on author, creator specification and creator behavior[1].

Definition 3 (BL rule)

A BL rule is a tuple (author,creatorSpec, creatorBehavior,

  1. where

    • creatorBehavior consists of two components RFBlocked and minBanned.

      RFBlocked = (RF, mode, window) is defined such that RF = #b Messages / #t Messages

      where #t Messages is the total number of messages

      whereas #b Messages is the number of messages among those in #t Messages that have been blocked.

    • window is the time interval of creation of those messages that have to be considered for RF computation.

      minBanned = (min, mode, window) where min is the minimum number of times in the time interval specified in window that OSN users identified by creatorSpec have to be inserted into the BL due to BL rules specified by author wall [1] (mode = myWall) or all OSN users (mode=SN) in order to satisfy the constraint.

    • T denotes the time period of the banned users Which is identified by creatorSpec and CreatorBehavior.

Online setup assistant for FRs thresholds

OSA presents the user with a set of messages selected from the dataset [1]. For each message, the user tells the decision to system whether to accept or reject the message.

CONCLUSION

Users will use online social networks for many purposes but it may have disadvantages by getting unwanted data, so for avoiding unwanted data we have proposed filtered wall. In this Filtered wall architecture it consists of Short Text Classifier (STC), Content Based Message Filtering (CBMF), Filtering and Black list Rules. Whenever user gets a message it is intercepted by the filtered wall, then Short Text Classifier (STC) extracts the metadata and classifies the message, then Content Based Message Filtering (CBMF) assigns a category to the message based on the content. Based on the result of STC and CBMF Filtered wall applies filtering and black list rules. Finally message will be displayed on the user wall, if it does not violates the filtering and black list rules. By using this Filtered wall architecture performance will be improved. Proposed system allows OSN users to have direct control on the messages posted on their walls. Additionally we plan to enhance our system by filtering data in videos.

REFERENCES

  1. Marco Vanetti, Elisabetta Binaghi, Elena Ferrari, Barbara Carminati, and Moreno Carullo, A System to Filter Unwanted Messages from OSN User Walls,IEEE Trans. Knowledge and Data Eng., vol. 25, no. 2, pp. 1041-4347 February 2013.

  2. Mayuri Uttarwar, Prof. Yogesh Bhute, A Review on Customizable Content-Based Message Filtering from OSN User Wall IJCSMC, Vol. 2, Issue. 10, October 2013, pg.198 202.

  3. Robin van Meteren ,Maarten van SomerenUsing Content-Based Filtering for Recommendation.

  4. B S Harish,D S Guru, S Manjunath , Representation and Classification of Text Documents:A Brief Review IJCA Special Issue on Recent Trends in Image Processing and Pattern Recognition RTIPPR, 2010.

  5. M.Ikonomakis, S. Kotsiantis, V. Tampakas, Text Classification Using Machine Learning Techniques transactions on computers, Issue 8, Volume 4, August 2005, pp. 966-974.

  6. R.J. Mooney and L. Roy, Content-Based Book Recommending Using Learning for Text Categorization, Proc. Fifth ACM Conf. Digital Libraries, pp. 195-204, 2000.

  7. N.J. Belkin and W.B. Croft, Information Filtering and Information Retrieval: Two Sides of the Same Coin? Comm. ACM, vol. 35,no. 12, pp. 29-38, 1992.

  8. M. Vanetti, E. Binaghi, B. Carminati, M. Carullo, and E. Ferrari,Content-Based Filtering in On-Line Social Networks, Proc.ECML/PKDD Workshop Privacy and Security Issues in Data Mining and Machine Learning (PSDML10), 2010.

  9. M. Carullo, E. Binaghi, and I. Gallo, An Online Document Clustering Technique for Short Web Contents, Pattern Recognition Letters, vol. 30, pp. 870-876, July 2009.

  10. M. Carullo, E. Binaghi, I. Gallo, and N. Lamberti, Clustering of Short Commercial Documents for the Web, Proc. 19th Intl Conf.Pattern Recognition (ICPR 08), 2008.

  11. T. Joachims, Text Categorization with Support Vector Machines:Learning with Many Relevant Features, Proc. European Conf.Machine Learning, pp. 137-142, 1998.

  12. J. Moody and C. Darken, Fast Learning in Networks of Locally- Tuned Processing Units, Neural Computation, vol. 1, no. 2,pp. 281-294, 1989.

  13. M.J.D. Powell, Radial Basis Functions for Multivariable Interpolation:A Review, Algorithms for approximation, pp. 143- 167,Clarendon Press, 1987.

  14. J. Park and I.W. Sandberg, Approximation and Radial-Basis- Function Networks, Neural Computation, vol. 5, pp. 305-316, 1993.

  15. F. Sebastiani, Machine Learning in Automated Text Categorization,ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.

  16. C.D. Manning, P. Raghavan, and H. Schu¨ tze, Introduction toInformation Retrieval. Cambridge Univ. Press, 2008.

  17. Michael G. Christel, Neema Moraveji, Chang Huang Evaluating Content-Based Filters for Image and Video Retrieval.

  18. Michael J. Pazzani A Framework for Collaborative, Content-Based and Demographic Filtering.

  19. Battista Biggio, Giorgio Fumera, Ignazio Pillai, Fabio Roli Improving Image Spam Filtering Using Image Text Features.

  20. Nadia Bianchi-Berthouze K-DIME:An Affective Image Filtering System.

  21. kavi Narayana Murthi Advances in Automatic Text Categorization.

  22. kavi Narayana Murthi Automatic Categorization of Telugu News Articles.

  23. Vishnu Murthy.G, Dr. B. Vishnu Vardhan, K. Sarangam and P. Vijay pal Reddy A Comparative study on Term Weighting Methods For Automated Telugu Text Categorization With Effective Classifiers.

Leave a Reply