Medical Domain Based Feature Selection Using Rough Set Reduct Algorithm

T. Keerthika; Dr. K. Premalatha

doi:10.17577/IJERTCONV3IS15017

NCACS - 2015 (Volume 3 - Issue 15)

Medical Domain Based Feature Selection Using Rough Set Reduct Algorithm

DOI : 10.17577/IJERTCONV3IS15017

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 113
Total Downloads : 20
Authors : T. Keerthika, Dr. K. Premalatha
Paper ID : IJERTCONV3IS15017
Volume & Issue : NCACS – 2015 (Volume 3 – Issue 15)
Published (First Online): 24-04-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Medical Domain Based Feature Selection Using Rough Set Reduct Algorithm

T. Keerthika,

Assistant professor, Department of Information

Technology,

Sri Krishna College of Engineering and Technology.

Dr. K. Premalatha,

Professor ,Department of Computer Science and Engineering,

Bannari Amman Institute of Technology, Sathyamangalam , Erode.

AbstractThe real time data will dynamically increase in size. To achieve it effectively and efficiently an incremental technique has to be proposed, which stimulates the result. Feature selection refers to the problem of selecting those input features that are most predictive of a given outcome. In particular, this has found successful application in tasks that involve datasets containing huge numbers of features, which would be impossible to process further. Recent examples include text processing and web content classification. Rough set theory has been used as such a dataset preprocessor with much success, but current methods are inadequate at finding minimal reductions.

Index TermsDynamic data sets, incremental algorithm, feature selection, rough set theory

INTRODUCTION

The problem of reducing dimensionality has been investigated for a long time in a wide range of fields, e.g., statistics, pattern recognition, machine learning, and knowledge discovery. In order to reduce the input dimensionality, there exist two main approaches, i.e., feature extraction and feature selection (FS). Feature extraction maps the primitive feature space into a new space with a lower dimensionality. Two of the most popular feature extraction approaches include Principal Components Analysis, and Partial Least Squares. There are numerous applications of feature extraction in the literature, such as image processing, visualization, and signal processing. In contrast, the FS approach chooses the most informative features from the original features according to a selection method, e.g., t- statistic, fstatistic, correlation, reparability correlation measure, or information gain. The irrelevant and redundant features in the dataset lead to slow learning and low accuracy. Finding the subset of features that are enough informative is NP complete. Some heuristic algorithms are proposed to search through the feature space. The selected subset can be evaluated from some issues, such as the complexity of the learning algorithm and the accuracy.

The Rough Set (RS) theory can be used as a tool to reduce the input dimensionality and to deal with vagueness and uncertainty in datasets. The reduction of attributes is based on data dependencies. The RS theory partitions a dataset into some equivalent (indiscernibility) classes, and approximates uncertain and vague concepts based on the partitions. The measure of dependency is calculated by a function of the approximations. The dependency measure is employed as a heuristic to guide the FS process. In order to

obtain a significant measure, proper approximations of the concepts are required. Hence, the initial partitions play an important rule. Given a discrete dataset, it is possible to find the indiscernibility classes; however, in case of datasets with real-valued attributes, it is impossible to say whether two objects are the same, or to what extent they are the same, using the indiscernibility relation. A number of research groups extended the RS theory using the tolerant or similarity relation (termed tolerance-based Rough Set).

The similarity measure between two objects is delineated by a distance function of all attributes. Two objects are considered to be similar when their similarity measure exceeds a similarity threshold value. Finding the best threshold boundary is both important and challenging. Used genetic algorithms to find the best similarity threshold. Used fuzzy similarity to cope with real-valued attributes.
The algorithm of a genetic programming begins with a population that is a set of randomly created individuals. Each

individual represents a potential solution that is represented as a binary tree. Each binary tree is constructed by all possible compositions of the sets of functions and terminals. A fitness value of each tree is calculated by a suitable fitness function. According to the fitness value, a set of individuals having better fitness will be selected. These individuals are used to generate new population in next generation with genetic operators. Genetic operators generally also include reproduction, crossover, mutation and others that are used to evolve functional expressions. After the evolution of a number of generations, we can obtain an individual with good fitness value. If the fitness value of such individual still does not satisfy the specified conditions of the solution, the process of evolution will be repeated until the specified conditions are satisfied.
LITERATURE REVIEW
In this paper, based on the rough set theory, the concept of -indiscernibility relation is put forward in order to transform an inconsistent decision table to one that is consistent, called -decision table, as an initial preprocessing step. Then, the -decision matrix is constructed. On the basis of this, by means of a decision function, an algorithm for incremental learning of rules is presented. The algorithm can also incrementally modify some numerical measures of a rule.
FEATURE SELECTION

The main aim of feature selection (FS) is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In real world problems FS is a must due to the abundance of noisy, irrelevant or misleading features. For instance, by removing these factors, learning from data techniques can benefit greatly. Given a feature set size n, the task of FS can be seen as a search for an "optimal" feature subset through the competing 2n candidate subsets. The definition of what an optimal subset is may vary depending on the problem to be solved. Although an exhaustive method may be used for this purpose, this is quite impractical for most datasets.

Usually FS algorithms involve heuristic or random search strategies in an attempt to avoid this prohibitive complexity. However, the degree of optimality of the final feature subset is often reduced.

The usefulness of a feature or feature subset is determined by both its relevancy and redundancy. A feature is said to be relevant if it is predictive of the decision feature(s), otherwise it is irrelevant. A feature is considered to be redundant if it is highly correlated with other features. Hence, the search for a good feature subset involves finding those features that are highly correlated with the decision feature(s), but are uncorrelated with each other.

Figure 1.1 Aspects of feature selection

Determining subset optimality is a challenging problem. There is always a trade-off in non-exhaustive techniques between subset minimality and subset suitability – the task is to decide which of these must suffer in order to benefit the other. For some domains (particularly where it is costly or impractical to monitor many features), it is much more desirable to have a smaller, less accurate feature subset. In other areas it may be the case that the modeling accuracy (e.g. the classification rate) using the selected features must be extremely high, at the expense of a non-minimal set of features.

Figure 1.2 Filter and wrapper methods

Feature selection algorithms may be classified into two categories based on their evaluation procedure (see Figure 1.2). If an algorithm performs FS independently of any learning algorithm (i.e. it is a completely separate preprocessor), then it is a filter approach. In effect, irrelevant attributes are filtered out before induction. Filters tend to be applicable to most domains as they are not tied to any particular induction algorithm.

If the evaluation procedure is tied to the task (e.g. classification) of the learning algorithm, the FS algorithm employs the wrapper approach. This method searches through the feature subset space using the estimated accuracy from an induction algorithm as a measure of subset suitability. Although wrappers may produce better results, they are expensive to run and can break down with very large numbers of features. This is due to the use of learning algorithms in the evaluation of subsets, some of which can encounter problems when dealing with large datasets.

ROUGH SET-BASED FEATURE SELECTION Rough set theory (RST) can be used as a tool to discover

data dependencies and to reduce the number of attributes

contained in a dataset using the data alone, requiring no additional information. Over the past ten years, RST has become a topic of great interest to researchers and has been applied to many domains. Given a dataset with discretized attribute values, it is possible to find a subset (termed a reduct) of the original attributes using RST that are the most informative; all other attributes can be removed from the dataset with minimal information loss. From the dimensionality reduction perspective, informative features are those that are most predictive of the class attribute.

There are two main approaches to finding rough set reducts: those that consider the degree of dependency and those that are concerned with the discernibility matrix. This section describes the fundamental ideas behind both approaches. To illustrate the operation of these, an example dataset (Table 1.1) will be used.

Table 1.1 An example dataset

xU	A	b	c	d	e
0	1	0	2	2	0
1	0	1	1	1	2
2	2	0	0	1	1
3	1	1	0	2	2
4	1	0	2	0	1
5		2	0	1	1
6	2	1	1	1	2
7	0	1	1	0	1

Rough Set Attribute Reduction

Central to Rough Set Attribute Reduction (RSAR) is the concept of indiscernibility. Let I = (U, A) be an information system, where U is a non-empty set of finite objects (the universe) and A is a non-empty finite set of attributes such that a:U Va for every A. Va is the set of values that attribute a may take. With any P A there is an associated equivalence relation IND(P):

Figure 1.3 A Rough Set
Information and Decision Systems

An information system can be viewed as a table of data, consisting of objects (rows in the table) and attributes (columns). In medical datasets, for example, patients might be represented as objects and measurements such as blood pressure, form attributes. The attribute value for a particular patient is their specific reading for that measurement. Throughout this paper, the terms attribute, feature and variable are used interchangeably.

An information system may be extended by the inclusion of decision attributes. Such a system is termed a decision system. For example, the medical information system mentioned previously could be extended to include patient classification information, such as whether a patient is ill or healthy. A more abstract example of a decision system can be found in table 1. Here, the table consists of four conditional features (a; b; c; d), a decision feature (e) and eight objects. A decision system is consistent if for every set of objects whose attribute values are the same, the corresponding decision attributes are identical.

The partition of U generated by IND(P) is denoted U/IND(P) (or U/P). If (x, IND(P) , then x and y are indiscernible by attributes from P. The equivalence classes of the P-indiscernibility relation are denoted [x]P. For the illustrative example, if P = {b,c}, then objects 1, 6 and 7 are indiscernible; as are objects 0 and 4. IND(P) creates the following partition of U :
Lower and Upper Approximations

Let X U .X can be approximated using only the information contained within P by constructing the P-lower and P-upper approximations of X:
Positive, Negative and Boundary Regions

Let P and Q be equivalence relations over U, then the positive region can be defined as:

The positive region contains all objects of U that can be classified to classes of U/Q using the information in attributes P. For example, let P = {b,c} and Q ={e}, then

Using this definition of the positive region, the rough set degree of dependency of a set of attributes Q on a set of attributes P is defined in the following way:

For P,Q A, it is said that Q depends on P in a degree k (0 k 1), denoted P k Q, if

In the example, the degree of dependency of attribute {e} from the attributes {b,c} is:

The reduction of attributes is achieved by comparing equivalence relations generated by sets of attributes. Attributes are removed so that the reduced set provides the same predictive capability of the decision feature as the original. A reduct R is defined as a subset of minimal cardinality of the conditional attribute set C such that

.

CONCLUSIONS

Feature Selection is an important research direction of rough set application. However, this technique often fails to find better reducts. This project starts with the fundamental concepts of rough set theory and explains basic techniques: Quick Reduct. These methods can produce close to the minimal reduct set. The swarm intelligence methods have been used to guide this method to find the minimal reducts. Here three different computational intelligence based reducts are used: Genetic algorithm, Ant colony optimization and PSO. Though these methods are performing well, there is no consistency since they are dealing with more random parameters. All these methods are analyzed using medical datasets. As shown in the results, our proposed method exhibits consistent and better performance than the other methods.

REFERENCES

H.M. Chen, T.R. Li, D. Ruan, J.H. Lin, and C.X. Hu, A Rough-Set Based Incremental Approach for Updating Approximations under Dynamic Maintenance Environments, IEEE Trans. Knowledge and Data Eng., vol. 25, no. 2, pp. 274-284, Feb. 2013.
J.Y. Liang, F. Wang, C.Y. Dang, and Y.H. Qian, An Efficient Rough Feature Selection Algorithm with a Multi-Granulation View, Intl J. Approximate Reasoning, vol. 53, pp. 912-926, 2012.
J.F. Pang and J.Y. Liang, Evaluation of the Results of Multi- Attribute Group Decision-Making with Linguistic Information, Omega, vol. 40, pp. 294-301, 2012.
Q.H. Hu, D.R. Yu, W. Pedrycz, and D.G. Chen, Kernelized Fuzzy Rough Sets and Their Applications, IEEE Trans. Knowledge and Data Eng., vol. 23, no. 11, pp. 1649-1667, Nov. 2011.
N. Parthalain, Q. Shen, and R. Jensen, A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction, IEEE Trans. Knowledge and Data Eng., vol. 22, no. 3, pp. 305-317, Mar. 2010.
Y.H. Qian, J.Y. Liang, W. Pedrycz, and C.Y. Dang, Positive Approximation: An Accelerator for Attribute Reduction in Rough Set Theory, Artificial Intelligence, vol. 174, pp. 597-618, 2010.
W. Wei, J.Y. Liang, Y.H. Qian, F. Wang, and C.Y. Dang, Comparative Study of Decision Performance of Decision Tables Induced by Attribute Reductions, Intl J. General Systems, vol, 39, no. 8, pp. 813-838, 2010.
S.Y. Zhao, E.C.C. Tsang, D.G Chen, and X.Z. Wang, Building a Rule- Based Classifier-a Fuzzy-Rough Set Approach, IEEE Trans. Knowledge and Data Eng., vol. 22, no. 5, pp. 624-638, May 2010.
M. Kryszkiewicz and P. Lasek, FUN: Fast Discovery of Minimal Sets of Attributes Functionally Determining a Decision Attribute, Trans. Rough Sets, vol. 9, pp. 76-95, 2008.
M.Z. Li, B. Yu, O. Rana, and Z.D. Wang, Grid Service Discovery with Rough Sets, IEEE Trans. Knowledge and Data Eng., vol. 20, no. 6, pp. 851-862, June 2008.

Medical Domain Based Feature Selection Using Rough Set Reduct Algorithm

Leave a Reply