 Open Access
 Total Downloads : 13
 Authors : Chithra C Sekhar, Jomy George, Meera Krishna G. H
 Paper ID : IJERTCONV3IS13030
 Volume & Issue : NCICN – 2015 (Volume 3 – Issue 13)
 Published (First Online): 30072018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Information Theoretic Approach on Supervised Feature Extraction for Tensor Objects
Chithra C Sekhar1*, Jomy George2, Meera Krishna G. H3 1*PG Scholar, Dept. of Computer Science and Engineering 2Asst.Professor, Dept. of computer science and engineering
3Asst.Professor, Dept. of computer science and engineering
TKM Institute of Technology, Kollam, India
AbstractImage Classification is an important problem in several areas such as recognition of faces, handwritten digits, objects etc. In the existing methods for feature extraction and classification, the objects are considered as vectors but in modern applications the input data are usually treated as tensors. Extracting maximum discriminating features from the tensor objects and its classification are challenging problems in machine learning and pattern recognition. This work proposes a novel scheme for supervised feature extraction of tensor objects based on maximization of Tsallis mutual entropy. Several experiments show that the proposed approach results superior accuracy in both feature extraction and classification.
Index TermsImage Classification, Feature Extraction, Tensor decomposition, Tsallis Mutual information, KNN Classifier.

INTRODUCTION
Classifying face images, handwritten digits and images of objects find immense application in several fields. Classical methods for feature extraction treat inputs as vectors. But this may lead to several problems such as increasing dimensionality, small sample size and computational burden. In most of the modern applications, image data are usually represented by multiway arrays (tensors) [1][3]. In many applications the input image data may be too large and may consist of redundant information. In order to design optimal classifiers, we need to extract discriminating features from the input data. Several supervised feature extraction algorithms have been recently proposed for tensors [4][7].These algorithms are generalization of Linear Discriminant Analysis (LDA) to tensor objects and which uses only the second order statistics of the data. Extracting features by maximizing the mutual information (MMI) overcomes this problem and provides highly discriminating features [8, 9].
Shannons definition of mutual information is used in [8] but it has some inherent limitations. The traditional use of Shannon entropy in Information theory may not be well applied in some situations. Tsallis has proposed a new concept of entropy which extends the preceding traditional Shannons theory. This new concept, called non extensive entropy, was used recently for image segmentation and other related areas [10][12]. In this paper, our primary goal is to study the usefulness of the Tsallis entropy by comparing it to the classic
maximization of parametric Tsallis mutual entropy for extracting the most discriminative features from tensor objects. Shannon entropy is a particular case of Tsallis entropy and varying its parameter results different objective functions for optimization.
A series of experiments were carried out for the problem of classifying image patterns under different values of the entropy parameters. For classification we use KNN classifier for the sake of simplicity. Our goal is to assess how well the different entropies can be used for feature extraction and hence to determine the class of a new test sample. The experiments show that the Tsallis entropy has great advantages over Shannon entropy for pattern classification.
The rest of the paper is organized as follows. Section II will provide some notations and basic concepts of feature extraction by MMI. Section III describes the proposed method. Section IV provides performance analysis of the proposed method. Section V contains conclusions.

BASIC CONCEPTS AND RELATED WORKS
In this section, we will provide some basic notations of tensor objects and also introduce some methods such as maximization of mutual information for feature extraction using Shannons entropy.

Notations
Tensors are geometric objects and it is a multiway generalization of vector and matrix. The order of tensor is the dimensionality or number of indices needed to represent it. For example, tensor l1l2 …lN is an Nway tensor. A tensor can be decomposed by Tucker decomposition and can be expressed as F {A}, where A are factor matrices and F is the core tensor [8].

Maximization of mutual information
Maximization of mutual information is considered as the more general criteria for extracting the most discriminative
features from tensor objects [8]. Let denote a threeway
random tensor and y denote its corresponding class label. Then can be represented through Tucker decomposition in
terms of projection matrices and core tensor as
Shannon entropy in the context of image classification. MMI
F U (1)T
U (2)T
U (3)T
based on Shannon entropy discussed by [8] provides a single discrimination measure for optimization. In order to provide a wide class of measures we propose a method called
1 2 3
(1)
The elements of the core tensor are gives the features that can be used for classification. But our aim is to find out the most discriminative features for classification. In order to get such elements of the core tensor, we need to find out the projection matrices which maximize the mutual information measure. Jukic and Filipovic [8] proposed an iterative method for obtaining the moden projection matrix by solving the following optimization problem


PROPOSED METHOD
In order to extract the most discriminating features in a tensor objects we have to use generalized mutual information criteria. This will provide a range of measures depending on the entropy parameter. In this paper we propose a supervised feature extraction algorithm for tensor objects by maximizing the Tsallis mutual information. Approach is similar to [8] but Tsallis entropy is used instead of Shannon entropy.
U (n) arg max
U (n)T U(n) 1
~
I n (f, y)
(2)
Performance of the classification algorithm using the extracted features is examined by varying the entropy parameters including the Shannon counterpart.
where
In is the mutual information based on Shannon
Now we will discuss the estimation of Tsallis mutual
measure of entropy

Estimation of Mutual Entropy via Shannons entropy and Tsallis entropy
information and its gradient of a scalar random variables. Negentropy of a random variable f is defined as
I T ( f ) = HT ( f ) HT ( f )
Gauss
(8)
For a continuous random variable X with probability density function f (x) with finite or infinite support X
Tsallis Mutual information between scalar random variable f and y can be expressed as
.The Shannon entropy H(X) of a random variable X is
defined by
IT ( f , y) =
(2 e 2 )(1 2 )
f –
I T ( f )
(9)
H ( X )
f (x) log ( f (x))dx
1
2 c
(2 e 2
)(1 2 )
xX
Ã¥ P( y = k )[ f  y= k –
I T ( f  y = k )]
(3) k = 1
1
The entropy measure H( X) quantifies the average uncertainty
where I T ( f ) is the negative entropy, 2
is the variance and
f
associated with the random variable X. The conditional entropy measures the average uncertainty associated with X, if we know the outcome of Y, which is defined as,
P( y = k) being the probability of y belonging t class k. Gradient of IT ( f , y) with respect to W is given by
H ( X  Y )
xX yY
f (x, y) log2 ( f (x  y))dxdy
W
W
W
W
Ã‘ IT ( f , y) = Ã‘
c
IT (W T X , y)=
W
W
(4) Ã¥
k = 1
P( y = k )Ã‘W
I T ( f  y = k )] Ã‘
I T ( f )
where, f (x, y) is the joint probability density
c
and f (x  y) is the conditional probability.
(2 e 2
)( (1+ ) 2
) C W
Ã¥ P( y = k )[ f  y= k X  y= k
The mutual information (MI) between X and Y is defined by
k = 1
1
(10)
I (X ;Y ) H (X ) H ( X  Y ) H(Y ) H(Y  X )
(5)
Mutual information quantifies the information gain or the
where C is the covariance matrix estimated using the training data set. Lets derive expression for negative entropy and its gradient based on nonpolynomial approximation discussed by [15]. Following the similar steps of [15] we have
shared information between X and Y.
H (x) Â»
1 {1
( p
(u)) du}
Generalized Shannon entropy was given by Tsallis and can be expressed as [13]
– 1 Ã² x
(11)
1
Cumulants in Equation (5.30) of [15] are very small and thus
HT ( X )
1
1
xX
f (x) dx
(6)
we can use an approximation
(1+ ) = 1+ ( –
2 2)
(12)
where, is the entropy parameter and when 1Tsallis entropy reduces to Shannon Entropy.
Mutual information can be generalized by Tsallis mutual
Following Equation (5.33) and (5.34) of [15] with Tsallis entropy we get
I T ( f ) Â» I ( f )
entropy. The Tsallis mutual entropy is defined for 1 as [14]
IT ( X ;Y ) HT ( X ) HT ( X  Y )
HT (Y ) HT (Y  X )
W
W
W
W
Ã‘ I T ( f ) Â» Ã‘
I ( f )
(13)
(14)
HT ( X ) HT (Y ) HT ( X ,Y ). (7)
A. System Architecture
Output: New projection matrix U(n)
Algorithm 2: Classification
Input:. D {(x1, c1),…, (xN , cN )}
X p (xp1,…, xpm ) new instance to be classified

Start
2..For each (xi , ci ) calculate the Euclidean distance
d (xi , xp )

Order d (xi , xp ) from lowest to highest i 1,…, N

Select k nearest instances to X p
Fig. 1. Image classification

Assign

Stop
X p into the most frequent class in D
Algorithm 1: Feature Extraction
Input: 1. Set of K training samples,
r ,….,r
r ,….,r
{k 1 n }, k {1,…..C}
2. Class labels yk
Parameters: Features matrix (tensor) size for each mode( ( p1,…, pN ) ), entropy parameter with different
values
Output: Class label of the most frequent class


PERFORMANCE EVALUATION
This work mainly focuses on supervised feature extraction from tensor objects. Here we take images of objects and faces as inputs. Optimal features are extracted from these input images by maximization of mutual information criteria using Tsallis entropy. A comparative performance analysis of the feature extraction method is evaluated in the context of
Initialize Repeat
U (n)
rn pn , n {1,…., N}
classification by varying the entropy parameter from 1.25 to 3 with an increment of 0.25. Shannon entropy is the special
For n = 1 to N Compute
Z n
{U}T
case when the parameter tends to 1. One of the simple well
known classifiers such as KNN are used for classification purpose .Performance evaluation under different images and
k n
Find the moden matrix using the optimization procedure
U arg maxIT (f, y)
feature dimensions in object recognition and face recognition applications using KNN is given in Tables I to IV and Figures 4 and 5 .
End
(n)
U (n) T U(n) 1
In order to assess the performance of the proposed work several experiments are performed on the standard datasets
Until (convergence) Output: Projection matrices Optimization procedure
U (n)
rn pn , n {1,…., N}
with images of objects and face images. The Columbia University Image Library (COIL20) dataset consists of gray scale images of 20 objects. Five objects out of 20 are used for
the present study. Each object is represented by 72 gray scale
Input: Feasible initial projection matrix U n
k 0
Repeat
U
U
~
images obtained by rotating the object with step of five degree. Each image is downsampled into 32X32 pixels and 16X16 pixels, and ten samples per class were randomly selected for training set with remaining samples forming the
Calculate gradient
(n) IT (f,y)
test set. The number of components in each mode was set to (R1,R2) {(5, 5), (10,10)} and no feature selection was performed on the extracted features.
Calculate A, with A : GU (n)T U (n) GT
~
U
U
G
(n) IT (f,y)
Select the step size k
using curvilinear search
Update with
U (n) Q( ) Un
k
k
Q( ) : (I A)1 ((I A)
k ~ 2 2
Until
(n) IT (f,y)
tolerence
Fig. 2. Object images from COIL 20
U
F
Fig. 3. Face images from Sheffield face database
TABLE I .ACCURACY ESTIMATION OF OBJECT RECOGNITION USING KNN. IMAGE DIMENSION: 32X32, FEATURE DIMENSION: 10X10, 5X5
Object Rec
KNN
32×32
5X5
Alpha
Accuracy
1
84.17
1.25
86.67
1.5
86.67
1.75
86.67
2
86.67
2.25
86.67
2.5
86.67
2.75
86.67
3
86.67
Object Rec
KNN
32×32
10×10
Alpha
Accuracy
1
81.67
1.25
92.5
1.5
92.5
1.75
92.5
2
92.5
2.25
92.5
2.5
92.5
2.75
92.5
3
92.5
Object Rec
KNN
32×32
5X5
Alpha
Accuracy
1
84.17
1.25
86.67
1.5
86.67
1.75
86.67
2
86.67
2.25
86.67
2.5
86.67
2.75
86.67
3
86.67
Object Rec
KNN
32×32
10×10
Alpha
Accuracy
1
81.67
1.25
92.5
1.
92.5
1.75
92.5
2
92.5
2.25
92.5
2.5
92.5
2.75
92.5
3
92.5
TABLE II. ACCURACY ESTIMATION OF OBJECT RECOGNITION USING KNN IMAGE DIMENSION: 16X16, FEATURE DIMENSION: 10X10, 5X5
Object Rec
KNN
16×16
5×5
Alpha
Accuracy
1
84.17
1.25
90
1.5
90
1.75
90
2
90
2.25
90
2.5
90
2.75
90
3
90
Object Rec
KNN
16×16
10×10
Alpha
Accuracy
1
91.67
1.25
89.17
1.5
89.17
1.75
89.17
2
89.17
2.25
89.17
2.5
89.17
2.75
89.17
3
89.17
Object Rec
KNN
16×16
5×5
Alpha
Accuracy
1
84.17
1.25
90
1.5
90
1.75
90
2
90
2.25
90
2.5
90
2.75
90
3
90
Object Rec
KNN
16×16
10×10
Alpha
Accuracy
1
91.67
1.25
89.17
1.5
89.17
1.75
89.17
2
89.17
2.25
89.17
2.5
89.17
2.75
89.17
3
89.17
The Sheffield Face database (SFD) consists of 575 images of 20 individuals with mixed race gender and appearance. Four individuals with mixed combinations are considered in the present study. Each individual shown in a range of poses from profile to frontal views with each image cropped to 112 X 92 pixels with 8 bit gray levels per pixels. Prior to feature extraction all images were down sampled to 28 X 23 pixels, and raw images were used as input for feature extraction. Training set was formed by randomly selecting six samples for each class with remaining images forming the test set. The number of components in each mode was set to
(R1,R2) {(5, 5), (10,10)} and no feature selection was
performed on the extracted features.
Fig. 4. Classification Accuracy of Face Recognition under different values of entropy parameter .
Fig. 5. Classification Accuracy of Object Recognition under different values of entropy parameter .
TABLE III. ACCURACY ESTIMATION OF FACE RECOGNITION USING KNN. IMAGE DIMENSION: 16X16, FEATURE DIMENSION: 10X10, 5X5
Face Rec
KNN
16×16
10×10
Alpha
Accuracy
1
80.77
1.25
94.23
1.5
94.23
1.75
94.23
2
92.31
2.25
94.23
2.5
94.23
2.75
94.23
3
94.23
Face Rec
KNN
16×16
5×5
Alpha
Accuracy
1
71.16
1.25
86.55
1.5
82.69
1.75
80.76
2
75
2.25
80.76
2.5
82.69
2.75
80.77
3
80.77
Face Rec
KNN
16×16
10×10
Alpha
Accuracy
1
80.77
1.25
94.23
1.5
94.23
1.75
94.23
2
92.31
2.25
94.23
2.5
94.23
2.75
94.23
3
94.23
Face Rec
KNN
16×16
5×5
Alpha
Accuracy
1
71.16
1.25
86.55
1.5
82.69
1.75
80.76
2
75
2.25
80.76
2.5
82.69
2.75
80.77
3
80.77
TABLE IV. ACCURACY ESTIMATION OF FACE RECOGNITION USING KNN. IMAGE DIMENSION: 28X23, FEATURE DIMENSION: 10X10, 5X5
Face Rec
KNN
28×23
10×10
Alpha
Accuracy
1
78.85
1.25
90.38
1.5
90.38
1.75
90.38
2
90.38
2.25
90.38
2.5
90.38
2.75
90.38
3
90.38
Face Rec
KNN
28×23
5×5
Alpha
Accuracy
1
75
1.25
88.46
1.5
88.46
1.75
86.54
2
86.54
2.25
86.54
2.5
86.54
2.75
86.54
3
80.77
Face Rec
KNN
28×23
10×10
Alpha
Accuracy
1
78.85
1.25
90.38
15
90.38
1.75
90.38
2
90.38
2.25
90.38
2.5
90.38
2.75
90.38
3
90.38
Face Rec
KNN
28×23
5×5
Alpha
Accuracy
1
75
1.25
88.46
1.5
88.46
1.75
86.54
2
86.54
2.25
86.54
2.5
86.54
2.75
86.54
3
80.77

CONCLUSION
This work proposes a novel approach for supervised feature extraction for tensor objects by MMI criteria using Tsallis entropy. The projection matrices are obtained by maximizing an approximation of mutual information between the extracted features and class labels. More discriminative features can be obtained by using higher order statistics of the data rather than using only second order statistics. Several experiments show that the proposed approach can be used to significantly improve discriminative ability of the features extracted from tensor objects. Various linear and nonlinear tensor based classifiers can be used to analyze the performance of the proposed method and an effective comparative study can be done in future.
REFERENCES

Nie, F., Xiang, S., Song, Y., Zhang, C., 2009. Extracting the optimal dimensionality for local tensor discriminant analysis. Pattern Recogn. 42, 105114.

Wang, S. J., Chen, H. L., Yan, W. J., Chen, Y. H., & Fu, X. (2014). Face recognition and microexpression recognition based on discriminant tensor subspace analysis plus extreme learning machine.Neural processing letters, 39(1), 2543.

Lu, G., Halig, L., Wang, D., Chen, Z. G., & Fei, B. (2014, March).Spectralspatial classification using tensor modeling for cancer detection of hyperspectral imaging. In SPIE Medical Imaging (pp. 903413903413).International Society for Optics and Photonics.

Yan, S., Xu, D., Yang, Q., Zhang, L., Tang, X. Z.J., 2005. Discriminant Analysis with Tensor Representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, pp. 526532.

Tao, D., Li, X., Wu, X., Maybank, S.J., 2007. General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29 (10), 17001715.

Zhang, W., Lin, Z., Xiaoou, T., 2009. Tensor linear Laplacian discrimination (TLLD) for feature extraction. Pattern Recogn. 42, 19411948.

Phan, A.H., Cichocki, A., 2010. Tensor decompositions for feature extraction and classification of high dimensional datasets. IEICE Nonlinear Theory Appl. 1, 3768.

Ante Jukic, Marko Filipovic, 2013. Supervised feature extraction for tensor objects based on maximization of mutual information, Pattern Recognition Letters 34, 14761484.

Vergara, J. R., & EstÃ©vez, P. A. (2014). A review of feature selection methods based on mutual information. Neural Computing and Applications, 24(1), 175186.

Marius Vila , Anton Bardera, Miquel Feixas and Mateu Sbert,2011. Tsallis Mutual Information for Document Classification.,Entropy, 13, 16941707.

Ricardo Fabbri, Wesley N. Goncalves, Francisco J. P Lopes, Odemir
M. Bruno, 2012.Multiq Analysis of Image Patterns,PhysicaA.,p110.

Sluga, D., & Lotric, U. (2013). Generalized InformationTheoretic Measures for Feature Selection.In Adaptive and Natural Computing Algorithms (pp. 189197). Springer Berlin Heidelberg.

Tsallis, C. Possible generalization of BoltzmannGibbs statistics. J.Stat.Phys.1988,52,479487.

Furuichi, S. Information theoretical properties of Tsallis entropies.J.Math.Phys.2006,47,023302.

HyvÃ¤rinen A, Karhunen, J., Oja, E., 2001. Independent Component Analysis.Wiley,NewYork,USA.