

- Open Access
- Authors : Shi-Jay Chen
- Paper ID : IJERTV13IS120027
- Volume & Issue : Volume 13, Issue 12 (December 2024)
- Published (First Online): 17-12-2024
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
New Fuzzy Query Processing Method for Document Retrieval Based on Extended Fuzzy Concept Networks
Shi-Jay Chen
Department of Information Management, National United University,
Miaoli, Taiwan
AbstractThis study proposes a new mechanism based on extended fuzzy concept networks for fuzzy query processing of document retrieval and we use a relevance matrix and a relation matrix to model extended fuzzy concept networks. This mechanism combines the document descriptor relevance matrix defined by the expert with the users query descriptor based on different weights for obtaining a matrix called a satisfaction matrix. This mechanism uses the AND operator of the quadratic- mean averaging operators to calculate the AND operation of all components in each row of the satisfaction matrix. Finally, ranking the degrees of satisfaction of each satisfaction matrix obtains documents more suitable for the users needs.
Keywords: Document retrieval, Fuzzy query processing, Quadratic-Mean Averaging (QMA) operators, Extended fuzzy concept networks, Relevance matrix, Relation matrix.
-
INTRODUCTION
Recently, several researchers (Chen and Wang 1995; Her and Ke 1983; Horng and Chen 1999; Kamel et al. 1990; Lucarella and R.Morara 1991; Miyamoto 1990; Moradi et al. 2008; Murai et al. 1989; Radechi 1977; Tadechi 1979; Tahani 1976; Zemankova 1989) dealt with document retrieval processing problems based on fuzzy set theory presented by Zadeh (1965). Lucarella et al. (1991) presented an information retrieval method based on fuzzy concept networks. However, there is only one kind of fuzzy relationship between concepts in concept networks (Lucarella et al. 1991) (i.e., a fuzzy positive association relation). Kracker (1992) presented an extended fuzzy concept network model and its applications that have four kinds of fuzzy relationships between concepts in the concept networks for database queries (i.e., fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization). Furthermore, Horng and Chen (1999) and Moradi et al. (2008) presented information retrieval systems that deal with document retrieval based on extended fuzzy concept networks. However, these methods based on fuzzy concept networks do not satisfy efficiency or effectiveness. For example, the general user cannot define the degree of relevance and fuzzy relationship between concepts and documents as precisely as can an expert.
This paper proposes a new mechanism for dealing with document retrieval based on extended fuzzy concept networks. The rest of the paper is organized as follows. Section 2 briefly reviews the principles of concept networks presented by Lucarella et al. (1991) and extended fuzzy concept networks presented by Kracker (1992). Section 3 proposes a new
mechanism for fuzzy query processing for document retrieval based on extended fuzzy concept networks and we use relevance matrix and relation matrix to model extended fuzzy concept networks. Section 4 discusses the conclusions.
-
PRELIMINARY
-
Concept networks
Lucarella et al. (1991) presented a fuzzy information retrieval method based on concept networks. A concept network consists of nodes and directed links where each node presents a concept or a document; each directed link connects two concepts or directs from one concept to one document and is labeled with a real value between zero and one. If
, it indicates that the degree of relevance from concept
to concept is where [0,1]. If , it indicates
that the degree of relevance of concept with respect to document is where [0,1] . For example, Fig. 1 presents a concept network where 1, 2, …, 7 are concepts;
1, 2, 3, 4 are documents. Fig. 1 shows documents
1, 2, 3, 4 as a fuzzy subset of concepts,
Fig. 1. A concept network
where d1 = {(C1,0.9)}, d2 = {(C1,0.6),(C2,1),(C5,0.8)}, d3 =
{(C7,0.9)}, d4 = {(C6,0.8)}, and 0.6 presents the relevance value of the document 2 with respect to concept1.
-
Extended Fuzzy Concept Networks
There is only one kind of fuzzy relationship between concepts in the concept networks presented by Lucarella et al. (1991) (i.e.,
a fuzzy positive association relation). Kracker (1992) presented an extended fuzzy concept network model and its applications
(4)
(,)
, then concept is more special than concept,
for database queries that have four kinds of fuzzy relationships between concepts in a concept network (i.e., fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization). Horng and Chen (1999) and Moradi et al. (2008) presented information retrieval systems for dealing with document retrieval based on extended fuzzy concept networks. The fuzzy relationships between concepts are described by Kracker (1992) as follows:
-
Fuzzy Positive Association, it relates concepts which in some contexts have a fuzzy similar meaning (e.g., person individual).
-
Fuzzy Negative Association, it relates concepts which are
and the degree of specialization is where [0,1].
(,)
-
, then concept and concept is not defined by the expert explicitly.
(,)
-
, then there is a positive association relationship
between concept and document , and the relevance degree is where [0,1].
(,)
-
, then there is a negative association relationship
between concept and document, and the relevance degree is
where [0,1].
fuzzy complementary (e.g., male female), fuzzy
incompatible (e.g., unemployed freelance) or fuzzy
-
(,)
, then concept and document is not defined
antonyms (e.g., large small).
-
-
Fuzzy Generalization, one concept that is regarded as a fuzzy generalization of another concept if it consists of that concept (e.g., vehicle car).
-
Fuzzy Specialization, the inverse of the fuzzy generalization relationship. That is, one concept that is regarded as a fuzzy specialization of another concept if it parts of that concept (e.g., car vehicle)
The fuzzy relationships between concepts introduced can be formally described by Kracker (1992) as follows:
Definition 2.1: Let C be the universal set of all concepts, then
-
Fuzzy Positive Association (P) is a fuzzy relation ,
: × [0,1] , which is reflexive, sysmmetric, and max-*-transitive.
-
Fuzzy Negative Association (N) is a fuzzy relation ,
: × [0,1] , which is anti-reflexive, sysmmetric, and max-*-nontransitive.
-
Fuzzy Generalization (G) is a fuzzy relation, : × [0,1], which is anti-reflexive, anti-sysmmetric, and max-*- transitive.
-
Fuzzy Specialization (S) is a fuzzy relation, : × [0,1], which is anti-reflexive, anti-sysmmetric, and max-*- transitive
Definition 2.2: An extended fuzzy concept network consists of nodes and directed links. Each node presents a concept of a document. Each directed link connects two concepts or directs from a concept to a doument presented by Kracker (1992) and Horng and Chen (1999). If
(,)
-
, then there is a positive association relationship between concept and concept , and the relevance degree is
where [0,1].
(,)
-
, then there is a negative association relationship between concept and concept , and the relevance degree is
where [0,1].
by the expert explicitly.
For example, Fig. 2 presents an extended concept network where
1, 2, …, 7 are concepts; 1, 2, 3, 4 are documents. Fig. 2 expresses documents 1, 2, 3, 4 as a fuzzy subset of concepts as follows:
Fig. 2. An extended fuzzy concept network.
where d1 = {(c1,0.9,P)}, d2 = {(c1,0.5,P),(c2,1,N),(c5,0.8,P)}, d3 =
{(c7,0.9,P)}, d4 = {(c6,0.8,P)},0.5 presents the relevance value of the document 2 with respect to concept1, and P presents fuzzy positive association of the fuzzy relationship of the document 2 with respect to concept1.
-
-
-
-
A PROPOSED MECHANISM FOR A DOCUMENT RETRIEVAL METHOD BASED ON EXTENDED FUZZY
CONCEPT NETWORKS
This section proposes a new mechanism of fuzzy query processing for document retrieval based on extended fuzzy concept networks. Fig. 3 shows the new mechanism for dealing with document retrieval based on extended fuzzy concept networks. The first step models the matrices (i.e., relevance matrix and relation matrix) between concepts and concepts, and the second step models the matrices (i.e., relevance matrix and relation matrix) between concepts and documents. The third step presents the users query descriptor vectors. The forth step combines the document descriptor relevance matrix defined by the expert with the users query descriptor using different weights to obtain a satisfaction matrix. The last step
-
(,)
ranks the degrees of satisfaction to which each document
, then concept is more general than concept ,
and the degree of generalization is where [0,1].
satisfies the users query descriptor.
Experts Viewpoint
General Users Viewpoint
Modeling the matrices between
concepts and concepts in an extended fuzzy concept network.
Presenting the users query
descriptor by vectors.
Modeling the matrices between concepts and documents in an extended fuzzy concept network.
Combining the document descriptor relevance matrix defined by the expert with the users query descriptor using different weights to obtains a satisfaction matrix.
Ranking the degrees of satisfaction to which each document satisfies the users query descriptor.
where is the number of concepts, ij {, , , , }, 1
and 1 . If ij = , the fuzzy relationship between concept and concept is not defined by the expert explicitly. A positive integer exists where 1 , such that =
+1 = +2 = Let = , then is called the transitive
closure of relevance matrix as follows:
=1,…,
(1 1)
=1,…,
(2 1)
(1 2) …
=1,…,
(2 2) …
(1 in) (2 in)
Fig. 3. A new mechanism of fuzzy query processing for document retrieval based on extended fuzzy concept networks
= =
=1,…, .
.
.
=1,…, .
.
.
. =1,…, .
. .
. .
, (2)
-
Modelling the matrices between concepts and concepts in
[=1,…,(ni 1)
=1,…,
(ni 2) …
=1,…,
(ni in)]
an extended fuzzy concept network
Definition 3.1: A relevance matrix V is a fuzzy matrix presented by Kandel (1986) in which the element vij [0, 1] presents the relevance degree between concept ci and concept cj as follows:
where is the operation of choosing the highest priority fuzzy relationship and is the operation of choosing the combination of two relationships according to Table I presented by Kracker (1992) and Horng and Chen (1999). Moreover, in Table I, we let the five different fuzzy relationships have different priorities (i.e., the negative associations (N) has the highest priority, the
c1
c1 v11
c2 v
. 21
c1
v12 v22
…
…
…
cn
v1n
v2n
positive associations (P) has the second highest priority, the relationships (Z) not defined by the expert explicitly is lower, and the priority of the generalization (G) and the specialization
(S) are the lowest priority). In Table I, the combination of the high priority relationship and the low priority relationship results
V . .
. . .
. . .
v v
. .
,
. .
. .
… v
in a relationship of high priority except that the combination of the generalization (G) and the specialization (S) is a positive association (P), and the combination of the negative associations
(N) with itself is a positive association (P).
n
c n1 n 2 nn
where is the number of concepts, ij
[0, 1] , 1
TABLE I. The combination of fuzzy relationships in a relation matrix
P
N
G
S
Z
P
P
N
P
P
P
N
N
P
N
N
N
G
P
N
G
P
Z
S
P
N
P
S
Z
Z
P
N
Z
Z
Z
and1 . Ifij = 0, the relevance degree between concept
and concept is not defined by the expert explicitly. A positive integer exists where 1 , such that =
+1 = +2 = Let = , then is called the transitive
closure of relevance matrix as follows:
=1,…,
(1 1)
=1,…,
.
(2 1)
(1 2) …
=1,…,
(2 2) …
(1 in) (2 in)
=1,…,
= = .
.
=1,…, .
.
.
. =1,…, .
. .
. .
, (1)
[=1,…,(ni 1)
=1,…,
(ni 2) …
=1,…,
(ni in)]
-
Modelling the matrices between concepts and documents
where is the maximum operator and is the minimum operator.
Definition 3.2: A relation matrix is a fuzzy matrix in which the element ij {, , , , } presents the fuzzy relationship between concept and concept, and P, N, G, S indicated that fuzzy positive association, fuzzy negative association, fuzzy generalization, and fuzzy specialization, respectively as follows:
in an extended fuzzy concept network
Definition 3.3: Let be a set of documents where =
{1, 2, …, } , and let be a set of concepts where =
{1, 2, …, }. The document descriptor relevance matrix as follows:
1
2
1 1
11 12 … 1
21 22 … 2.
3.4. Combining the document descriptor relevance matrix
defined by the expert with the users query descripto
.
= . .
.
. .
. .
. . . ,
. . .
using different weights to obtain a satisfaction matrix
In the following, we use a formula based on weighted power
[1 2 … mn]mean to calculate the degrees of weighted between expert and
where is the number of documents, is the number of concepts, presents the relevance degree between document
general user, and use the document descriptor relevance matrix
defined by the expert and the users query descriptor in
ij
and concept, ij [0,1], 1 and 1 .
extended fuzzy concept networks as follows:
Definition 3.4: Let be a set of documents where =
1 1
{1, 2, …, } , and let be a set of concepts where =
= [
2 =1(2 2 + 1)], (3)
{1, 2, …, }. The document descriptor relation matrix as follows:
1 1
where presents the number of experts and general users,
presents the priority of expert and general user, and presents the relevance value, where expert presents the relevance value of
1
2
.
11 12 1
.
2.1 2.2 2.
the document descriptor relevance matrix defined by the expert, and user presents the relevance value of the users query
descriptor, {expert,user}. For example, assume that there
= . .
. .
. . . ,
. . .
are one expert and one general user, and the expert is first in priority and the general user is second in priority in the document
[1 2 mn]retrieval system. We can understand that the degree of weighted
between expert and general user based on formula (3) as
where is the number of documents, is the number of concepts, ijpresents the fuzzy relationship between concept and document, ij {, , }, 1 and1 .
follows:
2 expert 2
= [ 1 (2 × 2 2 × 1 + 1) × + 1 (2 × 2
2 2
4
4
1 3 1 1
Horng and Chen (1999) indicated that the document descriptor
2 × 2 + 1) × user]2 = [
expert +
user]2.
relevance matrix and the document descriptor relation matrix
are given subjectively by expert. However, the expert may somehow forget to set some relevance degree and fuzzy relationship between concepts and documents. So, we can obtain the implicit relevance degree between concepts and documents by calculating the document descriptor relevance matrix =
and the implicit fuzzy relationship between concepts and documents by calculating the document descriptor relation matrix = .
-
Presenting the users query descriptor by vectors
The users query descriptor Q can be expressed as follows:
= {(1, (1, 1), (2, (2, 2),…, (, (, ),…, (, (, )},
where presents the desired relevance degree of the concept with respect to a document , where [0, 1], and presents the desired fuzzy relationships of the concept with respect to a document where {P,N,-},1 .
The users query descriptor can also be expressed as a query descriptor relevance vector qv and a query descriptor relation vector qr as follows:
qv = (1, 2,…, ,…, ),
qr = (1, 2,…, ,…, ).
In a query descriptor relevance vector qv, if = 0, it indicates
The expert get the degree of weighted is 0.75 and the general user get the degree of weighted is 0.25.
Let (, ) and (, ) be two pairs of values, where [0, 1],
[0, 1], {, , }, and . Assume that the document descriptor relevance vector dr (i.e., the ith row of the document descriptor relevance matrix), the document descriptor relation
vector dr (i.e., the ith row of the document descriptor relation matrix ), the query descriptor relevance vector qv and the query descriptor relation vector qr are presented as follows:
dv = (1, 2,…, in),
dr = (1, 2,…, in),
qv = (1, 2,…),
qr = (1, 2,…, ),
Where ij [0, 1] , [0, 1] , ij {, , } , and , 1
and 1 , is the number of documents, is the number of concepts. qv and qr are defined by the general user. The degree of weighted ((, ), (, )) between (, ) and (, ) as follows:
0 if ,
that desired document by the general user does not possess
((, ), (, )) = { 1
1 (4)
concept If = "-" , it indicates that the relevance degree of the concept with respect to the desired document can be neglected.
[ 2 =1(2 2 + 1)] if = ,where ((, ), (, )) [0,1] , presents the number of experts and general users, presents the priority of expert and general user, and presents the relevance value, where expert presents the relevance value of the document descriptor relevance matrix defined by the expert, and user presents the relevance value of the users query descriptor ,
{expert,user}. If = "-" or = "-" , it indicates that concept is neglected by the users query descriptor. Based on formula (4), we get a matrix called satisfaction matrix SM that combines
the document descriptor relevance matrix defined by the expert with the users query descriptor in extended fuzzy concept networks as follows:
1 1
Fig. 4. An extended fuzzy concept network of Example 3.1..
1
0.7
0.5
0
0.8
0.7
1
0
0
0
,
=
0.5
0
1
0.6
0
0
0
0.6
1
0
]
[0.8 0
0
0
1 ]
…
1 11 12
1
2 …
SM = .
.
.21
.
.
.22 .
. .
. .
2.
.
.
[ …
1
2 mn
ij
where presents the degree of satisfaction between concept and document from the document descriptor relevance matrix
defined by the expert and the users query descriptor
using formula (4) (e.g., ((, ), (, ))= ), [0, 1], 1
=
.
ij ij
and1 , is the number of documents, is the number of concepts.
3.5. Ranking the degrees of satisfaction to which each
document satisfies the users query descriptor
In the following, we use quadratic-mean averaging (QMA) operators presented by Chen and Chu (2010) to deal with AND operation in document retrieval based on extended fuzzy concept networks. Furthermore, we calculate the degree of satisfaction to which each document satisfies the users query descriptor for ranking the desired documents for general user needs. According to satisfaction matrixSM, we can use formula
-
-
to calculate the degree of satisfaction to which document
satisfies the users query descriptor as follows:
[ ]Based on the previous discussion, we can calculate the
transitive closure of relevance matrix and the transitive closure of relation matrix based on formulas (1) and (2) as follows:
1
0.7
0.5
0.5
0.8
0.7/p>
1
0.5
0.5
0.7
=
0.5
0.5
1
0.6
0.5 ,
0.5
0.5
0.6
1
0.5
[0.8 0.7
0.5
0.5
1 ]
=
.
[ ]
(2)2
RSAND
(,qv)
= 2 =1 ij , (5)
Assume that there are five documents in a fuzzy information
ij
where RSAND(,qv) [0, 1] , 1 and 1 , RSAND(,qv) presents the degree of satisfaction to which document satisfies the users query descriptor of AND operation, and presents the degree of satisfaction between
retrieval system, and the document descriptor relevance matrix
and the document descriptor relation matrix as follows:
1 1 1 0 0
0.5 1 0 0.7 0
ij
concept and document from the documentt descriptor relevance matrix defined by the expert and the users query descriptor , where [0, 1], 1 and 1 , is the number of concepts neglected by the users query
descriptor, is the number of documents, and is the number
=
0 0 0 0.6 0 ,
0.8 1 1 1 0
[0.4 0.9 0 0 1]of concepts. The larger the value of RSAND(,qv), the greater the degree of satisfaction to which document satisfies the users query descriptor .
Example 3.1: Assume that there is an extended fuzzy concept network as shown in Fig. 4. We can model the extended fuzzy concept network with relevance matrix and relation matrix as follows:
=
.
[ ]
Then, based on the previous discussion, we can calculate the document descriptor relevance matrix by = and the document descriptor relation matrix by = as follows:
If the users query descriptor 1 presents by the query descriptor
extended fuzzy concept networks. Hence, the proposed method is a more useful fuzzy information retrieval method for dealing with document retrieval because it provides different weights for experts and general users, and coincides with human intuition.
ACKNOWLEDGMENT
This research was funded by the 2024 Annual Key Development Program of National United University (Project No. LC113005). The authors gratefully acknowledge this financial support.
REFERENCES
-
Chen, S. J. and Chu, H. C. A new method for fuzzy information retrieval based on quadratic-mean averaging operators, in Processing of the e-CASE & e-Tech International Conference, (2010): 2487-2513.
-
Chen, S. M. and Horng, Y. J. Finding inheritance hierarchies in interval-valued
relevance vector qv1
as follows:
and the query descriptor relation vector qr1
fuzzy concept-networks, Fuzzy Sets Syst., vol. 84, no. 1 (1996): 7583.
-
Chen, S. M. and Wang, J. Y. Document retrieval using knowledge based fuzzy information retrieval techniques, IEEE Trans. Syst., Man, Cybern., vol. 25 (1995): 793803.
qv1 = {0.6, 1.0, 0.8, , 0.7},
qr1 = {, , , , },
0.94868
1
0.97468
0.88034
[7] 0.82158
1
0.80623
0.83666
SM1 = 0.72457
0.79057
0.80623
0.74162 ,
0.86603
1
0.97468
0.88034
[8] [ 0
0
0
0
]
Based on formula (4), the satisfaction matrix SM1 that combines the document descriptor relevance matrix defined by the expert with the users query descriptor 1 as follows:
Furthermore, based on formula (5), the degree of satisfaction to which document with respect to the users query descriptor
1 can be calculated as follows:
Hence, we can understand that the documents that satisfy the users query descriptor are 1 , 4 , 2 , 3 , 5 . In this case, document 1 is the best choice for the users query descriptor
1, because it has the largest retrieval status value.
-
-
CONCLUSIONS
This paper presents the concepts of extended fuzzy concept networks in which four kinds of fuzzy relationships exist between concepts in the concept networks (i.e., fuzzy positiveassociation, fuzzy negative association, fuzzy generalization, and fuzzy specialization). We also propose a new mechanism for dealing with document retrieval based on
-
Her, G. T. and Ke, J. S. A fuzzy information retrieval system model, in Proc.
1983 National Computer Symp., Taiwan, R.O.C., 147151, 1983.
-
Horng, Y. J. and Chen, S. M. Document retrieval based on extended fuzzy concept networks, in Proc. 4th Nat. Conf. Defense Management, Taipei, Taiwan, R.O.C., vol. 2 (1996): 10391050.
-
Horng, Y. J. and Chen, S. M. Fuzzy query processing for document retrieval based on extended fuzzy concept networks, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 29, no. 1 (1999), 96-104. Horng, Y. J., Chen, S. M. and Lee, C. H. A fuzzy information retrieval method using fuzzy-valued concept networks, Proceedings of the IEEE 10th International Conference on Tools with Artificial Intelligence, Taipei, Taiwan, Republic of China, 104-111, November 1998.
Horng, Y. J., Chen, S. M. and Lee, C. H. Automatically constructing multi- relationship fuzzy concept networks for document retrieval, Applied Artificial Intelligence: An International Journal, vol. 17, no. 4 (2003): 303-328.
-
Kandel, S. Fuzzy Mathematical Techniques with Applications. Reading, MA:
Addison-Wesley, 1986.
-
Kamel, M., Hadfield, B., and Ismail, M. Fuzzy query processing using clustering techniques, Inf. Process. Manage., vol. 26, no. 2 (1990), 279293.
-
Kim, M. H., Lee, J. H. and Lee, Y. J. Analysis of fuzzy operators for high quality information retrieval, Information Processing Letters 46 (6), 251-256, 1993.
-
Kracker, M. A fuzzy concept network model and its applications, in Proc. 1st
IEEE Int. Conf. Fuzzy Systems, 761768, 1992.
-
Kraft, D. H. and Buell, D. A. Fuzzy sets and generalized Boolean retrieval systems, Int. J. Man-Mach. Stud., vol. 19, no. 1 (1983): 4556.
-
Lucarella, D. and Morara, R. FIRST: Fuzzy information retrieval system, J.
Inf. Sci., vol. 17 (1991): 8191.
-
Miyamoto, S. Information rtrieval based on fuzzy associations, Fuzzy Sets
Syst., vol. 38 (1990): 191205.
-
Moradi, P., Ebrahim, M. and Ebadzadeh, M. M.Personalizing resultsof information
retrieval systems using extended fuzzy concept networks, Iran., 2008.
-
Murai, T., Miyakoshi, M. and Shimbo, M. A fuzzy document retrieval method
based on two-valued indexing, Fuzzy Sets Syst., vol. 30 (1989), 103120.
-
Radechi, T.Mathematicalmodeloftimeeffectiveinformationretrievalsystembased
on the theory of fuzzy set, Inf. Process. Manage., vol. 13 (1977): 109116.
-
Radechi, T. Fuzzy set theoretical approach to document retrieval, Inf. Process.
Manage., vol. 15 (1979), 247259.
-
Salton, G. and Mcgill, M. J. Introduction to Modern Information Retrieval.
New York: McGraw-Hill, 1983.
-
Tahani, V. A fuzzy model of document retrieval system, Inf. Process.
Manage., vol. 12 (1976): 177187.
-
Wang, J. Y. and Chen, S. M. A knowledge-based method for fuzzy information
retrieval, in Proc. 1st Asian Fuzzy Systems Symp., Singapore, 1993.
-
Zadeh, L. A. Fuzzy sets, Inf. Contr., vol. 8 (1965): 338353.
-
Zemankova, M. FIIS: A fuzzy intelligent information system, Data Eng., vol.
12, no. 2 (1989): 1120, 1989.