Recommendation System using Web Usage Mining for users of E-commerce site

DOI : 10.17577/IJERTV3IS071381

Download Full-Text PDF Cite this Publication

Text Only Version

Recommendation System using Web Usage Mining for users of E-commerce site

Prajyoti Lopes

Department of Computer Engineering St. Francis Institute of Technology Mumbai, India

Bidisha Roy

Department of Computer Engineering St. Francis Institute of Technology Mumbai, India

Abstract World Wide Web (WWW) provides abundant information for the Internet users. Users accesses behavior is recorded in web logs. This information seems to be very helpful in an E-commerce environment for several applications such as personalization and recommendation. This paper focuses on providing real time recommendation to online users who can be either registered or unregistered. This technique makes use of traditional web usage mining steps for data acquisition and data cleaning and finally to construct useful session. Two different approaches are proposed to provide effective recommendation. In product based technique, recommendation is provided to unregistered user based on IP address as obtained from log file. This method has an edge over traditional techniques which provide recommendation to unregistered user based on cache memory. Another technique is user based technique which provides recommendation to registered customer based on session constructed for each unique user based on users navigation. Evaluation of the proposed system is done by performing experiment on real E-commerce data. Results show that both the techniques provide better recommendation quality and accuracy.

Keywords Collaborative filtering, common log file, e- commerce, personalized recommendation, web usage mining.

  1. INTRODUCTION

    Online shopping has gained huge popularity in recent years. It has changed the traditional (conventional) way of doing businesses. This rapid growth of e-commerce has aroused new challenges to both customers and companies. The customer is provided with multiple choices for a specific product which leads to product overload. Therefore it has led to confused customer where he is not able to choose effectively from the offered products [1]. As a result, the need for new marketing strategies such as one-to one marketing and customer relationship management (CRM) has been stressed both from researches as well as from practical affair [2]. One effective solution to handle this issue is to make use of personalized recommendation system that provides each customer with a list of product recommendation that he or she would be interested in. Recommendation system can be broadly classified into: Content based system and Collaborative Filtering system. Content based system examines the properties of products recommended. Collaborative Filtering system makes use of product consumer interaction data and ignoring other facts to provide recommendation [3]. Collaborative filtering has been reported as one of the most successful recommendation technique, and has been widely

    used in number of different applications such as recommending movies, articles, products, web pages etc. [3]. CF Collaborative filtering focuses on identifying customers (neighbors) whose interests are similar to those of a given customer and recommends neighbors items of a given customer [2]. Despite its popularity and widespread use it suffers from two major limitations [4] [5]. The first is related to sparsity. The number of ratings already obtained is very less in comparison to the total number of ratings that need to be predicted since collaborative filtering requires explicit non- binary user ratings for like products. Therefore, collaborative filtering based recommendations systems are unable to accurately compute the neighborhood and identify the right item to recommend. The second issue is related to scalability. As number of customers and products increases in an E- commerce site the computation time to locate neighborhood grows linearly resulting in poor scalability [1] [4]. Studies in

    [6] show that web usage mining can be used to overcome the issues associated with collaborative systems.

    Web usage mining is an application of data mining techniques to discover interesting and useful patterns from web data. Users clickstream data (navigational data) can act as a very rich source of information to provide effective recommendation. Clickstream data is defined as customers path through a website. It provides information about customers shopping pattern and behavior, like details about the products viewed by the customer, the products they buy, they items they add to their shopping cart etc. This information is captured in web log files. Analyses of this usage data helps identify customers preferences and interests. Furthermore, this data can be used to discover interesting relationships, correlation and rules. In our proposed system we try to provide the customer with better quality recommendations. A good quality recommendation has a significant impact on customers future shopping behavior. Poor quality of recommendations can lead to two types of distinct errors: false negatives, items that are not recommended even though the customers like them and false positive, items that are recommended even though the customer does not like them. In an E-commerce domain, the most important errors to avoid is false positives, as it can result in irritated, unsatisfied customers thus reducing their probability to revisit the site once again. Therefore it is highly important to provide the customer with the type of product he or she is interested in.

    In this paper we propose two different techniques namely product based and user based recommendation that makes use of web usage data, product purchase data and customer related data. Recommendation is provided to not only registered users but also unregistered users of the site. For implementation of proposed system a recommendation system is developed using open source tools.

    The remnant part of this paper is arranged as follows. Section II gives the detailed information of data preprocessing and data mining techniques that are available for use. In section III, overview of entire proposed system is discussed followed by the implementation details and conclusion in Section IV, and Section V respectively.

  2. REVIEW OF LITERATURE

    The term, web usage mining, was first coined by Cooley et al., and it focuses on predicting and learning the users preferences on the Internet [7]. The entire process of web usage mining is generally divided into two important tasks: data preparation and pattern discovery [2]. Web servers, proxy servers and web clients hold the data required for web usage mining. Estimates in [8] [9] show that 80% of data mining time goes in preprocessing the web log data. The preprocessing task can follow either of the two techniques: In the first technique web logs are mapped into corresponding relational databases, and then appropriate mining algorithms are adapted to further analyze it [10]. The second technique makes use of special pre-processing process to convert the log data to fit specific mining algorithms. Incidentally, the technique proposed in this study uses the first approach to perform data pre-processing. The data preparation tasks constructs a server session file where each session is a sequence of requests of different types made by single user during a single visit to a site [2]. A set of various preprocessing tasks are followed for web log data. A detailed description of data preparation methods for mining web browsing patterns is given in [11]. Different methods to discover usage patterns namely Apriori [12], Naïve Bayesian [13], and Agglomerative clustering [14] are discussed and implemented The pattern discovery tasks involve the discovery of association rules, sequential patterns, user classifications etc [2]. Usage pattern extracted from web data can be applied to a wide range of applications such as customized personalization, system improvement, site modification, business intelligence discovery, usage characterization, and so on [2]. This research provides product recommendation based on web log data, sales data and customer related data [2]. Over the years, a wide variety of recommendation techniques have been known and developed. The most commonly used technique for recommendation is Collaborative filtering. Authors in [7] discuss a Navigation Pattern that constructs a tree to store web access information using NP-Miner Algorithm. Based on this information real time recommendations are provided to online users. The research proves that this algorithm efficiently performs online dynamic recommendation in a stable manner. In [1] [2] a personalized recommendation system for an Internet shopping mall is described. This system makes use of web usage data, association rules, product taxonomy and decision tree induction to provide better quality recommendation. Our research tries to provide effective recommendation to all

    visitors of an E-commerce site regardless of them being registered or unregistered. This research work tries to improve quality of recommendation to unregistered users so that they are provided with a personalized feeling. This helps us to retain existing customers and attract one time visitors of the site.

  3. PROPOSED WORK

    The proposed system boasts of the users visiting the web portal and entire clickstream data will be collected and maintained in a log file. The log file will then be processed to remove irrelevant data. Different techniques are then applied on cleaned log file to provide effective personalized recommendation. The proposed system is depicted in Figure 3.1.

    Figure 3.1. Proposed system

    The entire system has following important phases namely data acquisition followed by preprocessing , recommendation generation and finally pattern analysis.

    1. Data Acquisition

      In this phase the entire clickstream data of all the customers, which consists of all the web pages visited is collected and maintained in a log file. This work makes use of common log file format to record the data. Following important attributes namely IP address, time, date, status code, URL, method (GET and POST), user agent and Referrer URL are considered for analysis.

    2. Data Preprocessing

      For effective data analysis, good and better quality of data should be served as an input. The collected web log data consists of lot of irrelevant and inconsistent data and needs to

      be cleaned for effective mining. Following steps are followed for data pre-processing as depicted in Figure 3.2. And are mostly same for any web usage mining problem as discussed in [11].

      Figure 3.2. Steps for Pre-processing

      Field separation: It focuses on separating individual fields by making use of separator character such as space.

      Data cleaning: Data cleaning is a process of filtering out irrelevant and outliers data [9]. It eliminates all irrelevant items by checking the suffix of the URL name. Therefore, all log entries with filename suffixes such as gif, jpeg, GIF, JPEG, and JPG are removed. All records of failed HTTP status code i.e. Status code less than 200 and greater than 299 are eliminated. For the present study, we consider only the GET and POST methods. Data cleaning reduces the total number of records and also log file size.

      User differentiation: It is important to distinguish between different users for analysing different user access behaviour patterns. A different user ID will be assigned to different IP address. In case of same IP address referrer information and browser details will be used to distinguish among different web users.

      Session identification: A session is defined as an ordered sequence of web pages visited by a user. A new session is constructed based on new IP address. Each new IP addresses implies (correspond) to an unique user. A maximum session time limit is considered to be 30 minutes.

      Data formatting: Finally data will be formatted to appropriate tabular format for further analysis.

    3. Recommendation techniques

      This paper suggests two different methodologies namely product based and user based recommendation system. Both the technique generates a list of recommended products to the individual users for providing personalized recommendations in an E-commerce environment. Effective recommendations are not only provided to privileged users but to all the visitors of an E-commerce site.

      1. Product based recommendation technique:

        This technique is more suitable to provide recommendation for unregistered users. The beauty of this technique is it is not dependant on cache memory or cookies as in multiple E-commerce sites. Effective recommendation is provided even in the case, where the user clears the cache memory on his/her browser. Also it tries to provide better recommendations if different users access the same system and browser by providing a combination of recommendation based on most recent session and timestamp. Following assumptions for the recommendation:

        1. For a Website, a session S is a collection of sequence of Web pages {url1, url2,…, urln}.

        2. A session identifier is associated with each session

          {s_id1, s_id2.s_idn}.

        3. Each url consists of important information such as ip address, time stamp, product identifier url1

          {ip_addr, ts , p_id} which will be considered for analysis.

        4. Every product P is associated with product identifier

          {p_id1, p_id2..p_idn}, a manufacturer identifier

          {m_id1, m_id2, m_id3..m_idn} and a category identifier {c_id1,c_id2,. c_idn}.

          In this approach we fetch last three sessions based on most recent timestamp. For each session we extract products in descending order and place last two products in the recommendation list. If the recommendation list, has less than ten products we fetch related product based on category and manufacturer details. If recommendation list has any redundant product than we filter it out and related product is added to the recommended set.

          This approach helps us to reduce false positive errors that normally occur in traditional recommendation technique.

      2. User based recommendation technique

        This technique provides recommendation to the registered users of the web portal.

        Following assumptions are made for recommendation.

        1. We have m registered users and n transactions in processed log file.

        2. Let each user be associated with unique identifier (UID).

        3. We assume that we have a minimum of three sessions for each user to provide effective recommendation since we are using mining operation.

      A list of ten products is shown as recommendation. Other terminology used for user based technique is similar to product based recommendation technique. For each unique user that exists in log file we fetch sessions in descending order based on most recent timestamp. In each session we retrieve all visited products. If a specific product is already ordered then it is discarded from the recommendation list and related product is added in recommendation set. If a specific product is added to cart or wishlist then it is shown as top recommendation. If recommendation list has a count of less than ten than we check for the next recent session and repeat the above procedure

      Both the technique provides recommended products to the end user. Based on this recommendation list pattern analysis is done

    4. Pattern analysis

    In this module the proposed techniques will be evaluated based on the recommendation list generated, for its accuracy.

    This will help us to know how effcient the proposed system is.

  4. RESULTS AND DISCUSSION

    The proposed system is implemented using XAMPP server, PHP MyAdmin and Sublime Text 3 IDE. The testing program was written in OpenCart [15] which an open source is shopping cart system based on model view controller (MVC) framework. It is a very rich tool that has an intuitive admin tool and control over the entire store. All experiments are performed on a computer system with a CPU clock rate of 2 GHz and 4 GB of main memory. Figure 4.1. shows front end of the system that offers different electronic products divided as per different categories. The proposed approach has many modules. The first and the most important step is data acquisition and preprocessing the clickstream data. Figure 4.2. shows unprocessed log data. Upon application of different preprocessing steps a cleaned log file is obtained. Figure 4.3. Shows cleaned log file in a tabular format.

    A. Quality Evaluation metrics

    We calculate three parameters to evaluate the proposed techniques i.e. recall, precision and accuracy [16]. Recall is the fraction of all relevant items that were recommended. Precision is a ratio of all the recommended products that are relevant.

    (1)

    (2)

    (3)

    Precision and Recall are inversely proportional and therefore we calculate F1 measure.

    (4)

    Where,

    True positives (TP), indicates item that is relevant and recommended correctly.

    True negative (TN), indicates relevant item that is not included in recommendation list.

    False positive (FP), indicates irrelevant items that are added in recommendation list.

    False negative (FN), indicates items that are expected but did not appear in recommendation list. We construct a matrix as shown in Table 1.

    Table 1: Matrix for Recommendation

    Recommended Items

    Not recommended items

    Relevant

    True positive(TP)

    True negative(TN)

    Irrelevant

    False positive(FP)

    False negative(FN)

    For an unregistered user having IP address 103.245.66.43 following is the list of recommendation as shown in Figure

      1. and highlighted in red. For a registered user prajyoti following recommendation is generated based on its previous session as shown in Figure 4.5. and highlighted in red.

        Figure 4.1. Front end of the system

        Figure 4.2. Unprocessed log File

        Figure 4.3. Processed log file

        Figure 4.4. Product based Recommendation Figure 4.5. User based recommendation

  5. CONCLUSION

The rapid expansion and rising popularity of E- commerce has forced the existing recommendation system to handle large number of customers and to provide them with high quality of recommendation. In this paper, we focused on issues faced by recommendation system and proposed methodology that makes use of web usage mining to minimize it. The research work conducted in this paper

provides effective recommendation not only to registered users but also unregistered users. The beauty of the proposed system is it will help retain the existing customers and attract new customers. The technique will minimize false positive errors which can lead to unsatisfied customers. It can greatly benefit E-commerce organization for forecasting demands, sales, target advertisement, attracting potential customers and also retaining them and getting a competitive edge in the market. Both the techniques suggested provide effective and

efficient product recommendations for E-commerce sites. However it is important to evaluate our methodology with existing collaborative filtering techniques and check its effectiveness.

REFERENCES

      1. Y. Cho, and J. Kim, "Application of Web usage mining and product taxonomy to collaborative recommendations in e- commerce", Expert systems with Applications, vol. 26, no. 2, pp. 233-246, February 2004.

      2. Y. Cho, J. Kim, and S. Kim, "A personalized recommender system based on web usage mining and decision tree induction", Expert Systems with Applications, vol. 23, no. 3, pp. 329-342, October 2002.

      3. Z. Huang, D. Zeng, and H. Chen, "A comparative study of recommendation algorithms in e-commerce applications", IEEE Intelligent Systems, vol. 22 , pp. 68-78,2007.

      4. B. Sarwar, G. Karypis, J Konstan ,and J. Riedl , "Analysis of recommendation algorithms for e-commerce", Proceedings of the 2nd ACM conference on Electronic commerce. ACM, pp.158- 167, 2000.

      5. M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, and M. Sartin, "Combining content-based and collaborative filters in an online newspaper", Proceedings of ACM SIGIR workshop on recommender systems, vol. 60, 1999.

      6. B Mobasher, R Cooley, J Srivastava, "Automatic personalization based on Web usage mining" , Communications of the ACM, vol.43 no. 8 pp. 142-151, 2000. Add extra literature review start

      7. YM Huang, YH Kuo, JN Chen, and YL Jeng, " NP-miner: A real-time recommendation algorithm by using web usage mining", Knowledge-Based Systems, vol.19, no.4, pp. 272-286, 2006.

      8. C.R. Varnagar, N.N. Madhak, T. M. Kodinariya, and J. N. Rathod, "Web usage mining: A review on process, methods and techniques", Information Communication and Embedded Systems (ICICES), International Conference on. IEEE, pp. 40-46, 2013.

      9. P. Nithya, and P. Sumathi, "Novel pre-processing technique for web log mining by removing global noise and web robots." In Computing and Communication Systems (NCCCS) IEEE, pp. 1- 5,2012.

      10. J. Borges, and M. Levene, "Data mining of user navigation patterns", Web usage analysis and user profiling, Springer Berlin Heidelberg ,pp.92-112, 2000.

      11. R. Cooley, B. Mobasher, J. Srivastava, "Data preparation for mining world wide web browsing patterns", Knowledge and information systems, vol.1, pp. 5-32, 1999.

      12. W.Bin and L. Zhijing, Web Mining Research, Fifth International Conference on Computational Intelligence and Multimedia Applications, pp. 84 89, 2003.

      13. M. Khosravi and M.J. Tarokh , "Dynamic Mining of Users Interest Navigation Patterns Using Naive Bayesian Method", Intelligent Computer Communication and Processing (ICCP), IEEE, (pp. 119-122, 2010.

      14. B. Devi, Y. Devi, B. Rani and R. Rao, "Design and Implementation of Web Usage Mining Intelligent System in the Field of e-commerce." International Conference on Communication Technology and System Design Procedia Engineering, vol. 30 , pp.20-27, 2012.

      15. OpenCart,(2014,January4).[Online]Available: http://www.opencart.com/

      16. F. Olmo, and E. Gaudioso , "Evaluation of recommender systems: A new approach." Expert Systems with Applications, vol. 35, pp. 790-804, 2008.

Leave a Reply