Applications of data mining in knowledge management

Download Full-Text PDF Cite this Publication

Text Only Version

Applications of data mining in knowledge management

S. Venkata lakshmi, M.Tech K. Hema, M.Tech Assistant professor in MCA Dept, Assistant professor in MCA Dept,

KMM colleges,Tirupati KMM colleges, Tirupati

AbstractData mining is one of the most important steps of the knowledge discovery in databases process and is considered as significant knowledge management. Now a days in data mining continues growing in business and in learning organization. This review paper explores the applications of data mining techniques which have been developed to support knowledge management process. The journal articles indexed in Science Direct Database from 2007 to 2012 are analyzed. The discussion on the findings is divided into 4 topics : (i) knowledge resource; (ii) knowledge types and/or knowledge datasets; (iii) data mining tasks; and (iv) data mining techniques and applications used in knowledge management. The article briefly describes the definition of data mining and data mining functionality. Then the knowledge management rationale and major knowledge management tools integrated in knowledge management cycle are described. Finally, the applications of data mining techniques in the process of knowledge management are summarized and discussed.

KeywordsData mining, Data mining application, Knowledge management, data mining functionalities.

  1. INTRODUCTION

    In information era, knowledge is becoming a crucial organizational resource that provides competitive advantage and giving rise to knowledge management (KM) initiatives. Many organizations have collected and stored vast amount of data. However, they are unable to discover valuable information hidden in the data by transforming these data into valuable and useful knowledge. Managing knowledge resources can be a challenge. Many organizations are employing information technology in knowledge management to aid creation, sharing, integration, and distribution of knowledge. The basis of data mining is a process of using tools to extract useful knowledge from large datasets; data mining is an essential part of knowledge management. data mining can be useful for KM in two main manners: (i) to share common knowledge of business intelligence (BI) context among data miners and (ii) to use data mining as a tool to extend human knowledge. Thus, data mining tools could help organizations to discover the hidden knowledge in the enormous amount of data.

    As a part of data mining research, focuses on surveying data mining applications in knowledge management through a literature review from 2007 to 2012. The reason for reviewing research this period is that data mining has emerged in KM research theme since 2006 and it plays important roles as a link between business intelligence and knowledge management. For article filtering, we search for the keyword data mining and knowledge management in the article title, abstract, and keywords fields on the Science Direct database. We limit the document search to date range published 2007 to 2012 and only document type of research article is included. The total number of documents published for this query by year shows in Figure 1.

    6

    5

    Number of Articles

    Number of Articles

    4

    3

    2

    1

    0

    2007 2008 2009 2010 2011 2012

    Year of publications

    The full text of each article is carefully reviewed to eliminate those articles that are not related to application of data mining in KM and not described how data mining could be employed or helped in KM. On data mining applications for knowledge management, we survey and classify according to the six categories of data mining techniques: classification, regression, clustering, dependency modeling, deviation detection, and summarization.

    The purpose of this review literature related to application of data mining techniques for KM in academic journals between 2007 and 2012. The organizational paper as follows: first, data mining definition and the data mining task primitively used in this study are described; second, the definition of knowledge management and the knowledge capture and creation tools are presented; third,

    articles about data mining in KM are analyzed and the results of the classification are reported; and last, the conclusions of the study are discussed.

  2. DATA MINING

    1. Definition of Data Mining

      Data mining is an essential step in the knowledge discovery in databases (KDD) process that produces useful patterns or models from data (Figure 2) .The terms of KDD and data mining are different. KDD refers to the overall process of discovering useful knowledge from data. Data mining refers to discover new patterns from a wealth h of data in databases by focusing on the algorithms to extract useful knowledge.

      Figure 2 Data Mining and the KDD Process(Source: Fayyad, et.al., 1996)

      Data Mining and the KDD Process(Source: Fayyad, et.al., 1996) Based on figure 2, KDD process consists of iterative sequence methods as follows :

      1. Selection: Selecting data relevant to the analysis task from the database

      2. Preprocessing: Removing noise and inconsistent data; combining multiple data sources

      3. Transformation: Transforming data into appropriate forms to perform data mining

      4. Data mining: Choosing a data mining algorithm which is appropriate to pattern in the data; Extracting data patterns .

      5. Interpretation/Evaluation: Interpreting the patterns into knowledge by removing redundant or irrelevant patterns; Translating the useful patterns into terms that human- understandable.

    2. Data Mining Tasks

Fayyad et.al. (1996) define six main functions of data mining:

  1. Classificationis finding models that analyze and classify a dataitem into several predefined classes .

  2. Regression is mapping a data item to a real-valued prediction variable .

  3. Clustering is identifying a finite set of categories or clusters to describe the data .

  4. Dependency Modeling(Association Rule Learning) is finding a model which describes significant dependencies between variables .

  5. Deviation Detection (Anomaly Detection) is discovering the most significant changes in the data .

  6. Summarization is finding a compact description for a subset of data Data mining has two primary objectives of prediction and description. Prediction involves using some

variables in data sets in order to predict unknown values of other relevant variables (e.g classification, regression, and anomaly detection ) Description involves finding human- understandable patterns and trends in the data (e.g. clustering, association rule learning, and summarization) .

3. KNOWLEDGEMANAGEMENT

    1. Definition of Knowledge Management

      There are various concepts of knowledge management. In this we use the definition of knowledge management by McInerney (2002): Knowledge management (KM) is an effort to increase useful knowledge within the organization. Ways to do this include encouraging communication, offering opportunities to learn, and promoting the sharing of appropriate knowledge artifacts This definition emphasizes the interaction aspect of knowledge management and organizational learning.

      Knowledge management process focuses on knowledge flows and the process of creation, sharing, and distributing knowledge (Figure 3) .Each of knowledge units of capture and creation, sharing and dissemination, and acquisition and application can be facilitated by information technolog.

      As technologies play an important role in KM, technologies stand to be a necessary tool for KM usage. Thus, KM requires technologies to facilitate communication, collaboration, and content for better knowledge capture, sharing, dissemination, and application

      Figure 3 KM Technologies Integrated KM Cycle (Source from Dalkir, K.,2005).

    2. Knowledge Management: Capture and Creation Tools This section provides an overview of a classification of KM technologies as tools and focuses on

      tools for capture and creation knowledge. Liao (2003) classifies KM technologies using seven

      categories:

      1. KM Framework

      2. Knowledge-Based Systems (KBS) 3.Data Mining

      4.Information and Communication Technology 5.Artificial Intelligence (AI)/Expert Systems (ES) 6.Database Technology (DT)

      7.Modeling

      Ruggles et.al. (1997) classify KM technologies as tools that generate knowledge (e.g. data mining), code knowledge, and transfer knowledge. Dalkir (2005) classifies KM tools according to the phase of the KM cycle (Figure 4). We can see that data mining involves in the part of knowledge creation and capture phase.

  1. THE APPLICATIONS OF DATA MINING IN KNOWLEDGE MANAGEMENT

The reviews of ten articles has discussed on the applications of data mining to organizational

knowledge management for effective capturing, storing and retrieving, and transferring knowledge. We divided the reviewed articles into four main groups: (i) knowledge resource; (ii) knowledge types and/or knowledge datasets;

(iii) data mining tasks; and (iv) data mining techniques and applications used in KM. A detailed distribution of the ten articles categorized is shown in Table 1.

Figure 4 Major KM Techniques, Tools, and technologies (Source from Dalkir, K.,2005).

    1. Knowledge Resources

      In the study, we divided knowledge resources into eight groups as that which knowledge object to be stored and manipulated in KM and how data mining aids.

      1. Health Care Organization: this domain was a use of the disease knowledge management

        system (KMS) of the hospital case study . Data mining tool was used to explore diseases, operations, and tumors relationships. This tool used to build KMS to support clinical medicine in order to improve treatment quality.

      2. Retailing: this was customer knowledge from household customers for product line and brand extension issues ; data mining can help and propose suggestions and solutions to the firm for product line and brand extensions. This doing by extracting market knowledge of customers, brands, products, and purchase data to fulfill the customers demands behavior.

      3. Financial/Banking: the domain knowledge covered financial and economic data; data mining can assist banking institutions making decision support and knowledge sharing processes to an enterprise bond classification.

        Auth Knowled Knowledge ors ge

        Resource s

        Auth Knowled Knowledge ors ge

        Resource s

        Types

        Types

        Lavra

        c et Healthcar Public Health

        DM

        Tasks DM techniques/ Applications

        DM

        Tasks DM techniques/ Applications

        Classif ication

      4. Small and Middle Businesses (food company and food supply chain): there were two methods and processes to obtain knowledge resources: knowledge seeding-the relative knowledge to the problems; knowledge cultivating- the process to find the key knowledge from knowledge seeding. Data mining and knowledge management

        al. e

        (2007

        )

        Data

        The health- care

        providers

        ; Clustering Methods:

        Cluste

        ring Agglomerative

        integrated can help making better decisions. As Death-On- Arrival (DOA) problem encountered in food supply chain networks (FSCN), Li et al. (2010) aimed to build Early

        database Classification;

        The out-

        patient Principal Component health-care

        statistics Analysis;

        database The Kolmogorov-Smirnov

        The medical

        status Test;

        database The Quantile Range Test and PolarOrdination

        Classification C4.5

        MediMap: Visualization &

        Detection of Outliers Hwa

        Warning and Proactive Control (EW&PC) systems to solve such problems. Knowledge Base was an important part of EW&PC systems. It contained data analysis by managers and organizes in an appropriate way for other managers. Data mining methods were helpful for the EW&PC systems.

      5. Entrepreneurial Science: The knowledge resource was research assets in a knowledge institution; there were three types of the research assets: research products, intellectual capital, and research programs. Data mining facilitated for knowledge extraction and helped guiding managers in determining strategies on knowledge-oriented organization competition.

        ng et

        al. (2008

        Healthcar

        e Knowledge Conversion

        Depen

        dency Data Mining Tool IBM Model

      6. Business: data collected from questionnaire, an intensive literature review, and discussions with four KM experts.

        ) (Clinical

        Diagnosis

        and

        ing Intelligent Miner

        Data mining can discover hidden patterns between KM and

        its performance for better KM implementations.

        ) Transfer Data Mining Techniques

        Knowledge Association Analysis

        Measureme

        nt Sequential Patterns Analysis

        Knowledge Management System (KMS) for Disease

        Classification

        Liao,

      7. Collaboration and Teamwork: Workers log and documents were analyzed each workers referencing behavior and construct workers knowledge flow. Data mining techniques can mine and construct group-based knowledge flows (GKFs) prototype for task-based groups.

      8. Construction Industry: a large part of this enterprise information was available in the form of textual data

      Chen

      & Retailing Wu

      (2008

      )

      Cheng, Financ

      Lu ial

      Knowledge

      Extraction

      Customer Knowledge

      to product line

      and brand extension

      Knowledge Sets

      Depen

      dency Apriori Algorithms

      Model

      ing; (Association Analysis)

      Cluste

      ring K-means (Cluster Analysis)

      Class ificat ion Clust

      formats. This leads to the influence of text mining techniques to handle textual information source for industrial knowledge discovery and management solutions.

    2. Knowledge Types

      This section described knowledge types in 8 organization domains for data mining collaboration

      process in the knowledge creation.

      • Health-care System domain: The dataset composed of three databases: the health-care providers database; the out-patient health-care statistics database; and the medical

        &Sheu strings of data,

        models,

        ering

        status database. Another data source was from hospital inpatient medical records.

        (2009)

        parameters,

        and reports Knowledge Sharing Processes to a Corporate Bond Classification

      • Construction Industry domain: A sample data set was in the form of Post Project Reviews (PPRs) as defining good or bad information. Multiple Key Term Phrasal Knowledge sequences (MKTPKS) formation was generated through applications of text mining and was used an essential part of the text analysis in the text documents classification.

      • Retailing domain: Customer data and the products

        Li, Zhu & Small & Middle Knowledge Seeding Classificatiotion

        purchased have been collected and stored in databases to

        Pan (2010) Businesses &Knowledge

        (SMBs): Food Cultivating

        • Extension Theory – Extenics

        • Extension Data Mining (EDM) is combining

        mine whether the customers purchase habits and behavior affect the product line and brand extensions or not.

      • Financial domain: There were two datasets posed in

        Table 1Distribution of articles according to data mining and its applications

        financial domain:(i) to identify bond ratings, knowledge sets contained strings of data, models, parameters and reports for each analytical study; and (ii) to predict rating

        changes of bonds, cluster data of bond features as well as the model parameters were stored, classified, and applied to rating predictions.

      • Small and Middle Businesses (SMBs) domain: Knowledge types in small and middle businesses in case of Food Company were related to the corporate conditions or goals of the problem among all departments to develop a decision system platform and then formed the knowledge tree to find relations by human-computer interaction method and optimize the process of decision making. To solve food supply chain networks problems, Li et al. (2010) developed EW&PC prototype which composed of major components of: (i) knowledge base, (ii) task classifier and template approaches, (iii) DM methods library with expert system for method selection, (iv) explorer and predictor, and (v) user interface. This system built decision support models and helped managers to accomplish decision- making.

      Research Assets domain: In Cantu & Cellbos (2010) focused on managing knowledge assets by applied acknowledge and information network (KIN) approach. This platform contained three components types of research products, human resources or intellectual capital, and research programs. The various types of research assets were handled on domain ontologies and databases.

      Business domain: There were two types of knowledge attributes conducted: condition attributes and decision attribute. Condition attributes included four independent attributes of the KM purpose, the explicit-oriented degree, the tacit-oriented degree, and the success factor. Decision attribute included one dependent attribute of the KM performance.

      Collaboration and Teamwork domain: A dataset used from a research laboratory in a research institute. It contained 14 knowledge workers, 424 research documents, and a workers log as that recorded the time of document accessed and the documents of workers needed. For the workers log, it was generated to 2 levels of codified-level knowledge flow and topic-level knowledge flow. The two types of knowledge flow were determined to describe a workers needs. To collect the knowledge flow, documents in the dataset were categorized into eight clusters by data mining clustering approach.

    3. Data Mining Techniques/Applications Used in Knowledge Management

      Within the context of articles reviewed, applications of data mining have been widely used in

      various enterprises ranging from public health care , construction industry, food company,retailing to finance. Each field can be supported by different data mining techniques which generally include classification, clustering, and dependency modeling. We provided a brief description of the four most used data mining techniques including its common tools used and some references as follows:

      • Classification: Classification is one of the most common learning in data mining. This task aims at mapping a data item into one of several predefined classes. Examples of classification methods used as part of knowledge

      management include the classifying of the patients from primary health-care centers to specialists; the combination of the data mining and decision support approaches in planning of the regional health-care system; and the implementation of visualization method to facilitate KM and decision making processes. In the financial company, Cheng, Lu & Sheu (2009) implemented an ontology-based approach of KM and knowledge sharing in financial knowledge management system (FKMS) and applied the hybrid SOFM/LVQ classifier of clustering and classification data mining techniques to classify corporate bonds. For small and middle businesses: food company domain, data mining can improve decision-making by knowledge cultivating method namely Extenics and Extension data mining (EDM). This method was the integration of data mining and knowledge management, to develop a decision support system platform for better decisions. To solve the death-on-arrival (DOA) in food supply chain networks, corporate manager selected variables that might have influence on DOA by using decision tree of data mining method; and used neural network to monitor potential DOA for prediction. As knowledge assets played an important role in knowledge economies, Cantu & Ceballos (2010) employed data mining agents for extracting useful patterns to assist decision makers in generating benefits from the knowledge assets and used a knowledge information network (KIN) platform for managing the knowledge asset. In the business organizations with a large volume of works, such companies wanted to better understand what the hidden patterns between the KM and its performance, using the combination of data mining techniques: Bayesian Network (BN) classifier and Rough Set Theory (RST) in their business could help companies producing the KM to be performed effectively and achieve higher efficacy resulted. Common tools used for classification are decision trees, neural network, Bayesian network and rough set theory.

      • Clustering: This involved seeking to identify a finite set of categories and grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters. This technique has been applied in many fields, for example:

      • Healthcare: Clustering categories and attributes used in analyzing the similarities between community health centers.

      • Retailing: Clustering the segmentation for possible product line and brand extension to identify market to customer clusters;

      • Financial/Banking: Identifying groups of corporate bond clusters according to the industry and a specific segment within an industry; then tuning cluster data for each industry as a template for predicting rating changes.

      • Construction Industry: Clustering textual data to discover groups of similar access patterns.

      • Collaboration and Teamwork: Identifying groups of workers with similar task-related information needs based on the similarities of workers knowledge flow.

      Common tools used for clustering include k-means, principal component analysis, the Kolmogorov-Smirnov test and the quantile range test and polar ordination.

      Dependency Modeling: This concerned with finding a model that describes significant relationships between attribute sets. For example, it is widely used in healthcare to develop clinical pathway guidelines and provide an evidence-based medicine platform. In medical records management, it is helpful for clinical decision making. It could give better results in knowledge refinement through a use of this technique on the construction industry dataset; this technique used to mine customer knowledge from household customers. Common tools for dependency modeling are Apriori association rules and sequential pattern analysis. As above, we can see that data mining techniques and applications in literature reveal different solutions to different KM problems in practice.

  1. CONCLUSIONS

    In organization, knowledge is an important resource. Management of knowledge resources has become a strong demand for development. Discovering the useful knowledge has also significant approach for management and decision making. As data mining is a main part of KM, has identified ten articles related to data mining applications in KM, published between 2007 and 2012. This aims to give a research summary on the application of data mining in the KM

    technologies domain. The results presented in this paper have some assumptions:

    On the basis of the publication rates, research on the application of data mining in KM will increase in the future and cover the interest in different areas.

    The classiication of data mining tasks is usually the employed model in organization for description and prediction. However, we will see the hybridization techniques e.g. association rule and clustering; classification and clustering etc. in order to solve different KM problems. This trend will give rising in the future.

    In the context of healthcare, one article used the visualization technique as a supplement to other data mining tasks. This visualization system could enhance and lead to better performance in decision making.

    KM is an interdisciplinary research area. Thus, in the future, KM development may need integration with different technologies and demand more methodologies to solve KM problems.

    KM applications development tends to support expert decision making and will be the application of a problem- oriented domain.

    In this, we have shown that data mining can be integrated into KM framework and enhanced the KM process with better knowledge. It is clear that the data mining techniques will have a major impact on the practice of KM, and will present significance challenges for future knowledge and information systems research.

    REFERENCES

      1. An, X. & Wang, W. (2010). Knowledge management technologies and applications: A literature review. IEEE, 138- 141. doi:10.1109/ICAMS.2010.5553046

      2. Berson, A., Smith, S.J. &Thearling, K. (1999). Building Data Mining Applications for CRM. New York: McGraw-Hill.

      3. Cantú, F.J. & Ceballos, H.G. (2010). A multi agent knowledge and information network approach for managing research assets. Expert Systems with Applications, 37(7), 5272-5284.

      4. Cheng, H., Lu, Y. & Sheu, C. (2009). An ontology-based business intelligence application in a financial knowledge management.

      5. Dalkir, K. (2005). Knowledge Management in Theory and Practice. Boston: Butterworth-Heinemann.

      6. Dawei, J. (2011). The Application of Date Mining in Knowledge Management.2011 International Conference on Management of e-Commerce and e-Government, IEEE Computer Society, 7-9 Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases.AI Magazine, 17(3), 37-54.

      7. Gorunescu, F. (2011). Data Mining: Concepts, Models, and Techniques. India: Springer.

      8. Han, J. &Kamber, M. (2012). Data Mining: Concepts and Techniques. 3rd.ed. Boston: Morgan Kaufmann Publishers.

      9. Hwang, H.G., Chang, I.C., Chen, F.J. & Wu, S.Y. (2008). Investigation of the application of KMS for diseases classifications: A study in a Taiwanese hospital. Expert Systems with Applications, 34(1),725-733 Lavrac, N., Bohanec, M., Pur, A., Cestnik, B., Debeljak, M. &Kobler, A. (2007).Data mining and visualization for decision support and modeling of public health-care resources.Journal of BiomedicalInformatics, 40, 438-447. Li, X., Zhu, Z. & Pan, X. (2010). Knowledge cultivating for intelligent decision making in small & middle businesses.Procedia Computer Science, 1(1), 2479-2488. doi:10.1016/j.procs.2010.04.280

      10. Li, Y., Kramer, M.R., Beulens, A.J.M., Van Der Vorst,

        J.G.A.J. (2010). A framework for early warning and proactive control systems in food supply chain networks. Computers in Industry, 61,852862.

      11. Liao, S.H., Chen, C.M., Wu, C.H. (2008). Mining customer knowledge for product line and brand extension in retailing.Expert Systems with Applications, 34(3), 1763-1776.

      12. Liao, S. (2003). Knowledge management technologies and applications-literature review from 1995 to 2002. Expert Systems with Applications, 25, 155-164.

Leave a Reply

Your email address will not be published. Required fields are marked *