 Open Access
 Total Downloads : 594
 Authors : S Rao Chintalapudi, Katikireddy Srinivas
 Paper ID : IJERTV1IS4188
 Volume & Issue : Volume 01, Issue 04 (June 2012)
 Published (First Online): 01072012
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
An Efficient Algorithm for processing Topk Spatial Preference Queries
S Rao chintalapudi katikireddy srinivas
Asst.Professor Associate Professor CMR Technical Campus B.V.C Engineering College
Abstract
A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example, using a real estate agency database of flats for sale, a customer may want to rank the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, market, hospital, railway station, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different functions. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods on both real and synthetic data reveals that an optimized branchandbound solution is efficient and robust with respect to different parameters.
Index TermsQuery processing, spatial preference query, spatial databases.

INTRODUCTION
Spatial database systems manage large collections of geographic entities, which apart from spatial attributes contain nonspatial information (e.g., name, size, type, price, etc.). In this paper, we study an interesting type of preference queries, which select the best spatial location with respect to the quality of facilities in its spatial neighborhood. Given a set D of interesting objects (e.g., candidate locations), a topk spatial preference query retrieves the k objects in D with the highest scores. The score of an object is defined by the quality of features (e.g., facilities or services) in its spatial neighborhood. As a motivating example, consider a real estate agency office that holds a database with available flats for sale. Here feature refers to a class of objects in a spatial map such as specific facilities or services. A customer may want to rank the contents of this database with respect to the quality of their locations, quantified by aggregating nonspatial characteristics of other features (e.g., restaurants, super market, hospital, railway station, etc.) in the spatial neighborhood of the flat (defined by a spatial range around it). Quality may be subjective and query
parametric. For example, the user (e.g., a tourist) wishes to find a hotel p that is close to a railway station and a highquality restaurant. Fig. 1a illustrates the locations of an object dataset D (hotels) in white, and two feature data sets: the set F1 (restaurants) in gray, and the set F2 (railway stations) in black. For the ease of discussion, the qualities are normalized to values in [0, 1].
Fig. 1.Example of topk spatial preference query

Range Score b)Influence Score
The score T(p) of a hotel p is defined in terms of:

the maximum quality for each feature in the neighborhood region of p

the aggregation of those qualities.

The Range score, binds the neighborhood region to a circular region at p with radius (shown as a circle), and the aggregate function to SUM. For instance, the maximum quality of gray and black points within the circle of p1 are 0.9 and 0.6 respectively, so the score of p1 is T(p1)= 0.9+0.6=1.5. Similarly, we obtain T(p2)=1.0+0.1=1.1 and T(p3)=0.7+0.7=1.4. Hence,the hotel p1 is returned as the top result.
In fact, the semantics of the aggregate function is relevant to the users query. The SUM function attempts to balance the overall qualities of all features. The neighborhood region in the above spatial preference query can also be defined by other score functions. A meaningful score function is the influence score (see Section 4).As opposed to the crisp radius constraint in the range score, the influence score smoothens the effect of and assigns higher weights to railway stations that are closer to the hotel. Fig. 1b shows a hotel p5 and three railway stations s1, s2, s3 (with their quality values). The circles have their radii as multiples of .Now, the
score of a railway station si is computed by multiplying its quality with the weight 2j, where j is the order of the smallest circle containing si. For example, the scores of s1, s2, and s3 are 0.3*21=0.15, 0.9*22=0:225, and 1.0*23=0.125,respectively. The influence score of p5 is taken as the highest value (0.225).
Traditionally, there are two basic ways for ranking objects:

Spatial ranking, which orders the objects according to their distance from a reference
point.

Nonspatial ranking, which orders the objects by an aggregate function on their non. spatial values
Our topk spatial preference query integrates these two types of ranking in an intuitive way. As indicated by our examples, this new query has a wide range of applications in service recommendation and decision support systems.
To our knowledge, there is no existing efficient solution for processing the topk spatial reference query. A brute force approach (to be elaborated in Section 3.2) for evaluating it is to compute the scores of all objects in D and select the topk ones. This method, however, is expected to be very expensive for large input data sets. In this paper, we propose alternative techniques that aim at minimizing the I/O accesses to the object and feature datasets, while being also computationally efficient. Specifically, we contribute the branchandbound (BB)algorithm for efficiently processing the topk spatial preference query.
Furthermore, this paper studies one relevant extension that have not been investigated in our preliminary work [1].The extension (Section 3.4) is an optimized version of BB that exploits a more efficient technique for computing the scores of the objects. The second extension (Section 3.6) studies adaptations of the proposed algorithms for aggregate functions other than SUM, e.g., the functions MIN and MAX. The third extension (Section 4) develops solutions for the topk spatial preference query based on the influence score. The rest of this paper is structured as follows: Section 2 provides background on basic and advanced queries on spatial databases, as well as topk query evaluation in relational databases. Section 3 defines the topk spatial preference query and presents our solutions. Section 4 studies the query extension for the influence score. In Section 5, our query algorithms are experimentally evaluated with real and synthetic data. Finally, Section 6 concludes the paper with future research directions.


BACKGROUND AND RELATED WORK
Object ranking is a popular retrieval task in various applications. In relational databases, we rank tuples using an aggregate score function on their attribute values [2]. For example, a real estate agency maintains a database that contains information of flats available for sale. A potential customer wishes to view the top 10 flats with the largest sizes(area) and lowest prices. In this case, the score of each flat is expressed by the sum of two qualities: size and price, after normalization to the domain [0, 1] (e.g., 1 means the largest size and the lowest price and 0 means the smallest size and the highest price). In spatial databases, ranking is often associated to nearest neighbor (NN) retrieval. Given a query location, we are interested in retrieving the set of nearest objects to it that qualify a condition (e.g., restaurants).Assuming that the set of interesting objects is indexed by an Rtree [3], we can apply distance bounds and traverse the inex in a branch andbound fashion to obtain the answer [4].

Spatial Query Evaluation on RTrees
The most popular spatial access method is the Rtree [3], which indexes minimum bounding rectangles (MBRs) of objects. Fig. 2 shows a set D=(p1, . . . , p8) of spatial objects(e.g., points) and an Rtree that indexes them. Rtrees can efficiently process main spatial query types, including spatial range queries, nearest neighbor queries, and spatial joins.
Given a spatial region W, a spatial range query retrieves from D the objects that intersect W. For instance, consider a range query that asks for all objects within the shaded area in Fig. 2. Starting from the root of the tree, the query is processed by recursively following entries, having MBRs that intersect the query region.

(b)


Fig. 2. Spatial queries on Rtrees. a) MBRs b)Rtree representation
For instance, e1 does not intersect the query region, thus the subtree pointed by e1 cannot contain any query result. In contrast, e2 is followed by the algorithm and the points in the corresponding node
are examined recursively to find the query result p7.A nearest neighbor query takes as input a query object q and returns the closest object in D to q. For instance, the nearest neighbor of q in Fig. 2 is p7. Its generalization is the kNN query, which returns the k closest objects to q, given a positive integer k. NN (and kNN) queries can be efficiently processed using the bestfirst (BF) algorithm of [4], provided that D is indexed by an Rtree. A minheap H, which organizes Rtree entries based on the (minimum) distance of their MBRs to q is initialized with the root entries. In order to find the NN of q in Fig. 2, BF first inserts to H entries e1,e2, e3, and their distances to q. Then, the nearest entry e2 is retrieved from H and objects p1, p7, p8 are inserted to H. The next nearest entry in H is p7, which is the nearest neighbor of q. In terms of I/O, the BF algorithm is shown to be no worse than any NN algorithm on the same Rtree [4].
The aggregate Rtree (aRtree) [10] is a variant of the Rtree, where each nonleaf entry augments an aggregate measure for some attribute value (measure) of all points in its subtree. As an example, the tree shown in Fig. 2 can be upgraded to a MAX aRtree over the point set, if entries e1,e2,e3 contain the maximum measure values of sets (p2, p3), (p1,p8, p7), (p4, p5, p6), respectively. Assume that the measure values of p4,p5, p6 are 0.2,0.1, 0.4, respectively. In this case, the aggregate measure augmented in e3 would be max(0.2, 0.1,0.4) = 0.4. In this paper, we employ MAX aRtrees for indexing the feature data sets (e.g., restaurants),in order to accelerate the processing of topk spatial preference queries.
Given a feature data set F and a multidimensional region R, the range topk query selects the tuples (from F) within the region R and returns only those with the k highest qualities. Hong et al. [11] indexed the data set by a MAX aRtree and developed an efficient tree traversal algorithm to answer the query. Instead of finding the best k qualities from F in a specified region, our (range score)query considers multiple spatial regions based on the points from the object data set D, and attempts to find out the best k regions (based on scores derived from multiple feature data sets Fc).
3 SPATIAL PREFERENCE QUERIES
Section 3.1 formally defines the topk spatial preference query problem and describes the index structures for the data sets. Section 3.2 studies two baseline algorithms for processing the query. Section
3.3 presents an efficient branchandbound algorithm for the query, and its further
optimization is proposed in Section 3.4. Section 3.5 develops a specialized spatial join algorithm for evaluating the query. Finally, Section 3.6 extends the above algorithms for answering topk spatial preference queries involving other aggregate functions.

Definitions and Index Structures
Given an object data set D and m feature data sets F1,F2 . . . Fm, the topk spatial preference query retrieves the k points in D with the highest score. Here, the overall score of an object point p D is defined as
T(p)=AGG{Tc(p)c [1,m]} (1)
where AGG is an aggregate function (e.g: SUM,MIN,MAX etc)
Tc(p) is the cth component score of p with respect to the neighborhood condition and
m is the number of feature data sets.
The cth component score of p i.e Tc(p) can be computed as follows
Tc(p)=max({w(s)s Fc ^ dist(p,s) }U
{0}). (2)

Algorithms
we develop various algorithms for processing topk spatial preference queries. We first introduce a bruteforce solution that computes the score of every point p D in order to obtain the query results. Then, we propose a group evaluation technique that computes the scores of multiple points concurrently.

Simple Probing Algorithm
For a point p D, where not all its component scores are known, its upper bound score Tu(p) defined as
Tu(p)= Tc(p), if Tc(p) is known (3)
1, otherwise
It is guaranteed that the upper bound Tu(p) is greater than or equals to the actual score T(p).
Algorithm 1 is a pseudocode of the simple probing (SP)algorithm, which retrieves the query results by computing the score of every object point. The algorithm uses two global variables: Wk is a min heap for managing the topk results and represents the topk score so far (i.e., lowest score in Wk). Initially, the algorithm is invoked at the root node of
the object tree (i.e., N =D.root). The procedure is recursively applied (at Line 4) on tree nodes until a leaf node is accessed. When a leaf node is reached, the component score Tc(e) (at Line 8) is computed by executing a range search on the feature tree Fc for range score queries. Lines 68 describe an incremental computation technique, for reducing unnecessary component score computations. In particular, the point e is ignored as soon as its upper bound score Tu(e) (see (3)) cannot be greater than the bestk score . The variables Wk and are updated when the actual score T(e) is greater than .
Algorithm 1. Simple Probing Algorithm algorithm SP(Node N)
1: for each entry e N do 2: If N is nonleaf then
3: read the child node N' pointed by e; 4: SP(N');
5: else
6: for c = 1 to m do 7: If Tu(e) > then
//if upper bound is greater than
8: compute Tc(p) using tree Fc; update Tu(e); 9: If T(e) > then
10: update Wk and by e;
Drawbacks

it is very expensive because it comutes score for all objects.

No concurrency

it is not efficient method for larger input data sets.


Group Probing Algorithm
Due to separate score computations for different objects, SP is inefficient for largeobject data sets. In view of this, we propose the group probing (GP) algorithm, a variant of SP,that reduces I/O cost by computing scores of objects in the same leaf node of the Rtree concurrently. In GP, when a leaf node is visited, its points are first stored in a set V and then their component scores are computed concurrently at a single traversal of the Fc tree.
We now introduce some distance notations for MBRs. Given a point p and an MBR e, the value mindist(p,e) [4] denotes the minimum possible distance between p and any point in e. Similarly, given two MBRs ea and eb, the value mindist(ea, eb) denotes the minimum possible distance between any point in ea and any point in eb.
Algorithm 2 shows the procedure for computing the cth component score for a group of points. Consider a subset Vof D for which we want to compute their component score at feature tree Fc. Initially, the procedure is called with N being the root node of Fc. If e is a nonleaf entry and its mindist
from some point p V is within the range , then the procedure is applied ecursively on the child node of e, since the subtree of Fc rooted at e may contribute to the component score of p. In case e is a leaf entry (i.e., a feature point), the scores of points in V are updated if they are within distance from e.
Algorithm 2. Group Probing Algorithm algorithm GP(Node N, Set V , Value c,Value ) 1: for each entry e N do
2: If N is nonleaf then
3: If p V , mindist(p,e) then
4: read the child node N' pointed by e; 5: GP(N',V ,c, );
6: else
7: for each p V such that dist(p,e) do 8: Tc(p)=max{Tc(p),w(e)};
Drawbacks
1.it is also expensive because it computes score for all objects but concurrently .


BranchandBound Algorithm
GP is still expensive as it examines all objects in D and computes their component scores. We now propose an algorithm that can significantly reduce the number of objects to be examined. The key idea is to compute, for nonleaf entries e in the object tree D, an upper bound Tu(p) of the score T(p) for any point p in the subtree of e. If Tu(e) then we need not access the subtree of e, thus we can save numerous score computations.
Algorithm 3 is a pseudocode of our BB algorithm, based on this idea. BB is called with N being the root node of D. If N is a nonleaf node, Lines 35 compute the scores T(e) for nonleaf entries e concurrently. Recall that Tu(e) is an upperbound score for any point in the subtree of e.. If Tu(e)
,then the subtree of e cannot contain better results
than those in Wk and it is removed from V . In order to obtain points with high scores early, we sort the entries in descending order of T(e) before invoking the above procedure recursively on the child nodes pointed by the entries in V .If N is a leaf node, we compute the scores for all points of N concurrently and then update the set Wk of the topk results. Since both Wk and are global variables, their values are updated during recursive call of BB.
Algorithm 3. BranchandBound Algorithm Wk= new minheap of size k (initially empty); =0;
algorithm BB(Node N) 1: V ={e e N};
2: If N is nonleaf then
3: for c= 1 to m do
4: compute Tc(e) for all e V concurrently; 5: remove entries e in V such that Tu(e) ;
6: sort entries e V in descending order of T(e); 7: for each entry e V such that Tu(e) > do
8: read the child node N' pointed by e; 9: BB(N');
10: else
11: for c = 1 to m do
12: compute Tc(e) for all e V concurrently; 13: remove entries e in V such that Tu(e) ; 14: update Wk and by entries in V ;
Advantages
1.it reduces number of objects to be examined. 2.it is efficient than SP and GP algorithms.
3.3.1 Upper Bound Score Computation
It remains to clarify how the (upper bound) scores Tc(p) of nonleaf entries (within the same node

can be computed concurrently (at Line 4). Our goal is to compute these upperbound scores such that, 1).the bounds are computed with low I/O
cost, and.
2).the bounds are reasonably tight, in order to facilitate effective pruning.
To achieve this, we utilize only level1 entries (i.e., lowest level nonleaf entries) in Fc for deriving upper bound scores because:

there are much fewer level1 entries than leaf entries (i.e., points)

highlevel entries in Fc cannot provide tight bounds.

In our experimental study, we will also verify the effectiveness and the cost of using level 1entries for upper bound score computation. Algorithm 2 can be modified for the above upper bound computation task (where input V corresponds to a set of nonleaf entries), after changing Line 2 to check whether child nodes of N are above the leaf level. The following example illustrates how upper bound range scores are derived. In Fig. 4a, v1 and v2 are nonleaf
entries in the object tree D and the others are level1 entries in the feature tree Fc. For the entry v1, we first define its Minkowski region [21] (i.e., gray region around v1), the area whose mindist from v1 is within . Observe that only entries ei intersecting the Minkowski region of v1 can contribute to the score of some point in v1. Thus, the upper bound score Tc(p) is simply the maximum quality of entries e1,e5, e6, e7, i.e., 0.9. Similarly, Tc(p) is computed as the maximum quality of entries e2, e3, e4, e8, i.e., 0.7. Assuming that v1 and v2 are entries in the same tree
node of D, their upper bounds are computed concurrently to reduce I/O cost.
Fig. 4. Examples of deriving scores. (a) Upper bound scores. (b) Optimized computation.

Optimized BranchandBound Algorithm
This section develops a more efficient score computation technique to reduce the cost of the BB algorithm.

Problem with BB Algorithm
Recall that Lines 1113 of the BB algorithm are used to compute the scores of object points (i.e., leaf entries of the Rtree on D). A leaf entry e is pruned if its upper bound score Tu(e) is not greater than the best score found so far . However, the upper bound score Tu(e) (see (3)) is not tight because any unknown component score is replaced by 1.
1
1
2
Let us examine the computation of Tu(p1) for the point p1 in Fig. 4b. The entry e F1 is a nonleaf entry from the feature tree F1. Its augmented quality value is w(e F1)=0.8. The entry points to a leaf node containing two feature points, whose qualities values are 0.6 and 0.8, respectively. Similarly, e F2 is a nonleaf entry from the tree F2 and it points to a leaf node of feature points.
1
Suppose that the best score found so far in BB is =1.4(not shown in the figure). We need to check whether the score of p1 can be higher than . For this, we compute the first component score T1(p1)= 0.6 by accessing the child node of e F1 . Now, we have the upper bound score of p1 as Tu(p1)
2
= 0.6 + 1.0= 1.6. Such a bound is above =1.4 so we need to compute the second component score T2(p1)= 0.5 by accessing the child node of e F2 . The exact score of p1 is T(p1)=0.6+0.5=1.1 the point p1 is then pruned because T(p1) . In summary, two leaf nodes are accessed during the computation of T(p1) .
2
2
Our observation here is that the point p1 can be pruned earlier, without accessing the child node of e F2 . By taking the maximum quality of level1 entries (from F2) that intersect the range of p1, we derive: T2(p1) w(e F2 )=0.7.With the first component score T1(p1)=0.6, we infer that:T(p1)=
0.6 + 0.7= 1.3. Such a value is below so p1 can be pruned.

Optimized Computation of Scores
Based on our observation, we propose a tighter derivation for the upper bound score of p than the one shown in (3).Let p be an object point in D. Suppose that we have traversed some paths of the feature trees on F1,F2, . . . Fm.Let Âµ be an upper bound of the quality value for any unvisited entry (leaf or nonleaf) of the feature tree Fc. We then define the function T*(p) as
T*(p)=
(4)
In the max function, the first set denotes the upper bound quality of any visited feature point within distance from p.According to (4), the value T*(p) is tight only when every c value is low. In order to achieve this, we access the feature trees in a roundrobin fashion, and traverse the entries in each feature tree in descending order of quality values.
Roundrobin is a popular and effective strategy used for efficient merging of rankings [7], [9].
Algorithm 4 is the pseudocode for computing the scores of objects efficiently from the feature trees F1,F2,. . . ,Fm. The set V contains objects whose scores need to be computed. Here, refers to the distance threshold of the range score, and represents the best score found so far. Foreach feature tree Fc, we employ a maxheap Hc to traverse the entries of Fc in descending order of their qulity values. The root of Fc is first inserted ino Hc. The variable Âµ maintains the upper bound quality of entries in the tree that will be visited. We then initialize each component score Tc(p) of every object p V to 0.
Algorithm 4. Optimized Group Range Score Algorithm
algorithm Optimized_Group_Range(Trees F1;F2;
. . . ;Fm, Set V , Value _, Value _) 1: for c := 1 to m do
2: Hc := new maxheap (with quality score as key);
3: insert Fc.root into Hc; 4: Âµ := 1;
5: for each entry p V do 6: Tc(p) := 0;
7: := 1;
//ID of the current feature tree
8: while V > 0 and there exists a nonempty heap Hc do
9: deheap an entry e from H; 10: Âµ =w(e);
//update threshold
11: if p V , mindist(p,e) > then 12: continue at Line 8;
13: for each p V do
// prune unqualified points
14: if ) then
15: remove p from V ;
16: read the child node CN pointed to by e; 17: for each entry e' of CN do
18: if CN is a nonleaf node then
19: if p V , mindist(p,e') then 20: insert e' into H;
21: else
// update component scores
22: for each p V such that dist(p,e') do 23: T (p)=max{ T (p),w(e')};
24: = next (roundrobin) value where H is not empty;
25: for each entry p V do 26: ;
At Line 7, the variable keeps track of the ID of the current feature tree being processed. The loop at Line 8 is used to compute the scores for the points in the set V. We then deheap an entry e from the current heap H. The property of the maxheap guarantees that the quality value of any future entry deheaped from H is at most w(e). Thus, the bound Âµ is updated to w(e). At Lines 1112, we prune the entry e if its distance from each object point p V is larger than . In case e is not pruned, we compute the tight upper bound score T(p) for each p V (by (4)); the object p is removed from V if T(p) (Lines 1315).
Next, we access the child node pointed to by e, and examine each entry e' in the node (Lines 16 17). A nonleaf entry e' is inserted into the heap H if its minimum distance from some p V is within (Lines 1820); whereas a leaf entry e' is used to update the component score T(p) for any p V within distance from e' (Lines 2223). At Line 24, we apply the roundrobin strategy to find the next value such that the heap H is not empty. The loop at Line 8 repeats while V is not empty and there exists a nonempty heap Hc. At the end, the algorithm derives the exact scores for the remaining points of V .

The BB* Algorithm

Based on the above, we extend BB (Algorithm 3) to an optimized BB* algorithm as follows: First, Lines 1113 of BB are replaced by a call to Algorithm 4, for computing the exact scores for object points in the set V . Second, Lines 35of BB are replaced by a call to a modified algorithm 4, for deriving the upper bound scores for nonleaf
entries (in V ).Such a modified Algorithm 4 is obtained after replacing Line 18 by checking whether the node CN is a nonleaf node above the level1.
4. EXPERIMENTAL EVALUATION
In this section, we compare the efficiency of the proposed algorithms using real and synthetic data sets. Each data set is indexed by an aRtree with 4 K bytes page size. We used an LRU memory buffer whose default size is set to 0.5 percent of the sum of tree sizes (for the object and feature trees used).Our algorithms were implemented in C++ and experiments were run on a Pentium D 2.8 GHz PC with 1 GB of RAM. In all experiments, we measure both the I/O cost (in number of page faults) and the total execution time (in seconds) of our algorithms.

RESULTS
In this section, we conduct experiments on real object and feature data sets in order to demonstrate the application of topk spatial preference queries. We obtained three real spatial data sets from a travel portal,http://www.allstays.com/. Locations in these data sets correspond to (longitude and latitude)coordinates in US. We cleaned the data sets by discarding records without longitude and latitude. In summary, the relative performance between the algorithms in all experiments is consistent to the results on synthetic data.
Fig. Comparision of I/O cost for SP,GP,BB,BB*.
Fig.Comparision of Execution Times for SP,GP,BB,BB*

CONCLUSION
In this paper, we studied topk spatial preference queries, which provide a novel type of ranking for spatial objects based on qualities of features in their neighborhood. The neighborhood of an object p is captured by the scoring function: 1) the range score restricts the neighborhood to a crisp region centered at p, whereas 2) the influence score relaxes the neighborhood to the whole space and assigns higher weights to locations closer to p.We presented four algorithms for processing topk spatial preference queries. The baseline algorithm SP computes the scores of every object by querying on feature data sets. The algorithm GP is a variant of SP that reduces I/O cost by computing scores of objects in the same leaf node concurrently. The algorithm BB derives upper bound scores for non leaf entries in the object tree, and prunes those that cannot lead to better results. The algorithm BB*is a variant of BB that utilizes an optimized method for computing the scores of objects (and upper bound scores of non leaf entries). Based on our experimental findings, BB* is scalable to large data sets and it is the most robust algorithm with respect to various parameters. In the future, we will study the topk spatial preference query on a road network, in which the distance between two points is defined by their shortest path distance rather than their euclidean distance. The challenge is to develop alternative methods for computing the upper bound scores for a group of points on a road network.
.7.REFERENCES

M.L. Yiu, X. Dai, N. Mamoulis, and M. Vaitis, Topk Spatial Preference Queries, Proc. IEEE Intl Conf. Data Eng. (ICDE),2007.

N. Bruno, L. Gravano, and A. Marian, Evaluating Topk Queriesover WebAccessible Databases, Proc. IEEE Intl Conf. Data Eng.(ICDE), 2002.

A. Guttman, RTrees: A Dynamic Index Structure for SpatialSearching, Proc. ACM SIGMOD, 1984.

G.R. Hjaltason and H. Samet, Distance Browsing in SpatialDatabases, ACM Trans. Database Systems, vol. 24, no. 2, pp. 265318, 1999.

R. Weber, H.J. Schek, and S. Blott, A Quantitative Analysis andPerformance Study for SimilaritySearch Methods in HighDimensional Spaces, Proc. Intl Conf. Very Large Data Bases(VLDB), 1998.

K.S. Beyer, J. Goldstein, R. Ramakrishnan, and
U. Shaft, When isNearest Neighbor Meaningful? Proc. Seventh Intl Conf. DatabaseTheory (ICDT), 1999.

R. Fagin, A. Lotem, and M. Naor, Optimal AggregationAlgorithms for Middleware, Proc. Intl Symp. Principles ofDatabase Systems (PODS), 2001.

I.F. Ilyas, W.G. Aref, and A. Elmagarmid, Supporting Topk JoinQueries in Relational Databases, Proc. 29th Intl Conf. Very LargeData Bases (VLDB), 2003.

N. Mamoulis, M.L. Yiu, K.H. Cheng, and D.W. Cheung, EfficientTopk Aggregation of Ranked Inputs, ACM Trans. DatabaseSystems, vol. 32, no. 3, p. 19, 2007.

D. Papadias, P. Kalnis, J. Zhang, and Y. Tao, Efficient OLAPOperations in Spatial Data Warehouses, Proc. Intl Symp. Spatialand Temporal Databases (SSTD), 2001.

S. Hong, B. Moon, and S. Lee, Efficient Execution of Range TopkQueries in Aggregate R Trees, IEICE Trans. Information andSystems, vol. 88D, no. 11, pp. 25442554, 2005.

T. Xia, D. Zhang, E. Kanoulas, and Y. Du, On Computing ToptMost Influential Spatial Sites, Proc. 31st Intl Conf. Very Large DataBases (VLDB), 2005.

Y. Du, D. Zhang, and T. Xia, The Optimal Location Query, Proc.Intl Symp. Spatal and Temporal Databases (SSTD), 2005.

D. Zhang, Y. Du, T. Xia, and Y. Tao, Progessive Computation ofThe MinDist Optimal Location Query, Proc. 32nd Intl Conf. VeryLarge Data Bases (VLDB), 2006.

Y. Chen and J.M. Patel, Efficient Evaluation of AllNearestNeighbor Queries, Proc. IEEE Intl Conf. Data Eng. (ICDE), 2007.

P.G.Y. Kumar and R. Janardan, Efficient Algorithms for ReverseProximity Query Problems, Proc. 16th ACM Intl Conf. Advances inGeographic Information Systems (GIS), 2008.

M.L. Yiu, P. Karras, and N. Mamoulis, Ring Constrained Join:Deriving Fair Middleman Locations from Pointsets via aGeometric Constraint, Proc. 11th Intl Conf. Extending Database
Technology (EDBT), 2008.

M.L. Yiu, N. Mamoulis, and P. Karras, Common Influence Join:A Natural Join Operation for Spatial Pointsets, Proc. IEEE IntlConf. Data Eng. (ICDE), 2008.

Y.Y. Chen, T. Suel, and A. Markowetz, Efficient QueryProcessing in Geographic Web Search Engines, Proc. ACMSIGMOD, 2006.

V.S. Sengar, T. Joshi, J. Joy, S. Prakash, and K. Toyama, RobustLocation Search from Text Queries, Proc. 15th Ann. ACM IntlSymp. Advances in Geographic Information Systems (GIS), 2007.

S. Berchtold, C. Boehm, D. Keim, and H. Kriegel, A Cost Modelfor Nearest Neighbor Search in HighDimensional Data Space,Proc. ACM Symp. Principles of Database Systems (PODS), 1997.

E. Dellis, B. Seeger, and A. Vlachou, Nearest Neighbor Search on Vertically Partitioned High Dimensional Data, Proc. Seventh Intl Conf. Data Warehousing and Knowledge Discovery (DaWaK), pp. 243253, 2005.

N. Mamoulis and D. Papadias, Multiway Spatial Joins, ACMTrans. Database Systems, vol. 26, no. 4, pp. 424475, 2001.

A. Hinneburg and D.A. Keim, An Efficient Approach to Clustering in Large Multimedia Databases with Noise, Proc.Fourth Intl Conf. Knowledge Discovery and Data Mining (KDD), 1998.