A Survey: Different Approaches for Building Ontology and Algorithms used in Cache Optimization

Jaibarathi B; Sowmya Devi L; Hema M S; Dr.   S.   Chandramathi

doi:10.17577/IJERTV3IS111384

Volume 03, Issue 11 (November 2014)

A Survey: Different Approaches for Building Ontology and Algorithms used in Cache Optimization

DOI : 10.17577/IJERTV3IS111384

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 57
Total Downloads : 161
Authors : Jaibarathi B, Sowmya Devi L, Hema M S , Dr. S. Chandramathi
Paper ID : IJERTV3IS111384
Volume & Issue : Volume 03, Issue 11 (November 2014)
Published (First Online): 02-12-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Survey: Different Approaches for Building Ontology and Algorithms used in Cache Optimization

B Jaibarathi,

ME (Computer Science Engg.), Kumaraguru college of Technology, India.

L Sowmya Devi ME (Computer Science Engg.),

Kumaraguru college of Technology, India.

M S Hema Assistant Professor,

Dept of Computer Science Engg, Kumaraguru college of Technology, India.

Dr. S. Chandramathi Dean,

Department of Electrical Science, Sri Krishna College of Technology, India.

Abstract Data integration is an effective approach to combine data that reside in different sources for the purpose of analysis and decision making. Data federation is a data integration strategy used to create integrated virtual data store. Ontology is used to resolve semantic heterogeneity in data integration. This paper focus in different existing system that used ontology based data integration. This paper clearly explains the different existing system along with the comparison. In order to improve the access latency, cache is added in ontology based data integration process. Different cache optimization algorithms also explained along with the comparison. This paper attempts to summarize existing systems and major page replacement algorithms.

Keywords Data integration, semantic heterogeneity, ontology, existing system, KRAFT, cache optimization algorithm, LRU.

INTRODUCTION

Data integration provides the ability to manipulate data transparently across multiple data sources. The main advantage of data integration is to obtain a resultant data source with a complete and concise overview of all existing data without a need to access all sources separately. The data source is complete because no objects are missed out in the result and concise because no object is repeated twice in the result. The three types of data integration are data consolidation, data propagation and data federation. The creation of virtual view among different data sources is a challenging task due to the heterogeneities. Among all the heterogeneities, the semantic heterogeneity is caused by different meaning or interpretation of data which can be resolved by using ontology. Ontology is a formal explicit specification of a shared conceptualization. Conceptualization corresponds to an abstract model of a domain which identifies the relevant concepts and the relationship [1]. The ontology is used to represent domain knowledge to resolve semantic heterogeneities in data federation.

DATA INTEGRATION

Data Integration provides access to large amounts of data from alternative sources; data quality is becoming a first class property increasingly required by end-users.

Types of data integration

The different types of data integration are as follows in the table 1

S.No	Types of Integration	Process
1	Manual Integration	Integration process done manually by user.
2	Virtual data Integration	Provide Unified View.
3	Application Based Integration	Particular applications to implement.
4	Middleware Data Integration	It transfers the integration logic from applications to a new middleware layer.

Table: 1 Different types of data integration

Data integration Strategies

New strategies to integrate data evolve from time to time, owing to the active research work being done in the field. The following are a few popular data integration strategies are compare in the table 2, and diagrams are in figure -1 and figure- 2.

Figure: 1 Global As View

Figure: 2 Local As View

Table: 2 Classifications of Strategies

DEVELOPMENT METHODOLOGY

This paper reviews the use on ontologies for the integration of heterogeneous information sources. Based on an in-depth evaluation of existing approaches to this problem we discuss how ontologies are used to support the integration task [3]. It also evaluates and compares the different existing ontology based projects. Some of the existing projects are discussed below are
Garlic system was built on complex wrapper architecture to describe local source with object oriented language. These complex objects are manually unifies the local source to define global schema. It provide object oriented schema to interprets object queries and execution plans for sending piece of queries to data servers are created. Queries are expressed in SQL like query language It interpreted the objects like image, video, audio and text files.

COMPARISON OF EXISTING METHODOLOGIES

The following table summarizes the different features of different projects are given in the table 3.

S.N o	Projects	Source	Architecture type	Ontology Type	Language	Query Processing	Optimization
1	SIMS	HTML pages	Wrapper/Mediator	Single Ontology	Loom Language	Query transformation	Semantic query optimization
2	MOMIS	Structured and unstructured	Global Schema Builder	Global Virtual Schema	Description Logic	SIM,SLIM and ARTEMIS	Semantic optimization
3	OBSERVER	Repository	Wrapper/ontology Server	Multiple Ontology	Description Logic on classic or Loom	Query Construction	Division of sub queries may or may not loss information
4	TSIMMIS	Structure and semi structured	Wrapper/Mediator	Global Ontology	*OEM-QL	Wrapper/Mediator	Translation of query
5	DOME	Structure and semi structured	Ontology Server/ Mapping server	Multiple/ shared ontology	Description Logic on classic	Query Engine	Query Optimization
6	KRAFT	Norman Database	Wrapper/Mediator/ Facilitator user agent	Hybrid	Description Logic on classic	Query Facilitation	No Mechanism
7	Carnot and Infosleuth	Repository	Broker agent/Resource agent	Infosleuth: Multiple Carnot: Single	Description Logic on classic	Query Graph	Data flow model execution
8	COIN	Semi structured	Wrapper/context mediator	Hybrid	Object Oriented data mode F-logic	Query Rewriting	Semantic query optimizer

Table: 3 Comparison of existing approaches

*OEM-Object Exchange Model, *SIM- Source Integrator Module, *SLIM- Schemata Lessical Integrator Module

ONTOLOGY BASED DATA INTEGRATION USING CACHE

Ontology based data integration combine with cache provides improved access latency. The resultant data are cached and necessary cache optimization techniques are performed to enhance the access speed. There are various cache optimization techniques which work efficient in various situations.
CACHE OPTIMIZATION ALGORITHM Caching and perfetcting are the most popular

techniques that play a key role in [9] improving the performance by keeping object that are likely to be visited in the near future. As cache size is limited, a cache replacement policy is needed to handle the cache content. There are some of the cache replacement policies and they are classified based on certain factors. The important factors are given below in the table 4
1. Recency: time of (since) the last reference to the the object.
2. Frequency: number of requests to an object.
3. Size: size of the object.
4. Cost of fetching the object: cost to fetch an
  
  IJERTV3IS111384
  
  www.ijert.org
  
  object.
  
  1661
5. Function based: Access latency of object.
  
  Table: 4 Types of cache algorithms Strategies

CACHE REPLACEMENT ALGORITHMS Cache replacement algorithms play an extremely

important role in caching. There are some of the

algorithms that are based on above factors. They are

7.1. BELADYS ALGORITHM

Beladys algorithm is one of the most efficient algorithm would be always discard the information that will not be needed in future. It also has lowest page fault rate because it simply replace the page that are not used for long time. It is popularly known as optimal page replacement algorithm as it always yields optimal result. However it is impossible to predict how long the future information is needed, this causes generally not implemented in practice.

LRU (LEAST RECENTLY USED)

Least recently used page replacement used algorithm is simple and easy to use. LRU follow the strategy of rejecting the least recently used object first. It also keep the path of what was used when the substance are used. LRU works on time stamp. It remove recently used object that are not used for longest time when the cache exceeds its maximum size and put a new object in place of evicted object. If the cache is not full then it will insert the object in cache.
LFU (LEAST FREQUENTLY USED)

LFU algorithm removes the objects which are least frequently used and put a new request page in free space. In this, it maintains a counter which will count the frequency of object. Based on the counter value the frequency of each object is determined. The page with least frequency value is replaced first by new incoming page when the cache exceeds its maximum size.

7.4. MRU (MOST RECENTLY USED)

MRU removes the most recently used object from the cache. This algorithm is used where the access of historical information are takes place. It work similar to LRU but it replace the object which has highest frequency.

LVC (LOWEST RELATIVE VALUE)

S.N o	Types	Description	Algorithm
1	Recency based	Recently used object as primary factor	LRU
2/p>	Frequency based	Frequent count as primary factor	LFU
3	Size based	Object size as primary factor	SIZE
4	Function based	It associate each object in the cache with a utility value.	GD-SIZE

As the name suggest that lowest relative object value is first removed from the cache. Relative cost value is calculated for each object. The object with lowest relative value is removed first.

GREEDY DUAL-SIZE (GD SIZE)

Greedy-Dual algorithm was introduced by Young

.It is the general version of LRU. This algorithm concerned when the equal size of object and each object has different cost to fetch. The algorithm associates a value, H, with each cached object p. Initially, when an object is brought to cache, H is defined to be a cost to bring the object into the cache (note that the cost is non-negative). When a replacement needs to be made, the object with the lowest H value, MINH, is replaced, and then all the objects reduce the cost values H by MINH. If an object p is accessed again its current cost value H is restored to the original

cost of bringing this object to the cache.

7.7. SIZE

This strategy tries to minimize a miss ratio by replacing one large object rather than a bunch of small ones. However, some of the small object brought to a cache may never be accessed again. The SIZE strategy does not provide any mechanism to evict such object, which leads to pollution of the cache.

7.8 ARC (ADAPTIVE REPLACEMENT CACHE)

Megiddo and Modha introduced the ARC which has some similarity to 2Q algorithm. ARC gives ghost list to dynamically adapt the size depending on the workload. ARC keeps in track of both commonly used and lately used pages along with some of past data. It maintains two lists which follow LRU and LFU algorithm.

7.9. LRU-MIN

LRU-MIN algorithm minimize the replacement in LRU.LRU-MIN keeps the object which are smaller in size. If any objects with size S in cache, then it follow the LRU algorithm to evict least recently used object from the cache. If there is no object which is having size S in cache, then this algorithm evict the object of size S/2 in LRU order.

7.10 CLOCK

In CLOCK, all page frames are visualized to be arranged in form of a circular list that resembles a clock.

The hand of the clock is used to point to the oldest page in the cache. CLOCK algorithm decreases the overhead cost.

7.12 CAR (CLOCK WITH ADAPTIVE REPLACEMENT)

CAR attempts to merge the adaptive policy of ARC with the implementation efficiency of CLOCK. The algorithm maintains four doubly linked lists C1, C2, L1, and L2. C1 and C2 are CLOCKs while L1 and L2 are simple LRU lists. The size of C1 roughly equals L2 and size of C2 roughly equals L1. It determines both recency and frequency of object.

CLASSIFICATION OF DIFFERENT CACHE REPLACEMENT ALGORITHM

The following table summarizes the different features of cache replacement algorithms are given in table 5.

S.No	Cache Replacement algorithms	Description	Advantage	Disadvantage
1	Beladys Algorithm	Discard information that are not used in future	Remove information that is not used longer.	Not suitable for practical implementation.
2	LRU	Least recently objects are removed first.	Logic is easy to understand, performs better as the cache size increases.	Need larger cache to increase efficiency.
3	LFU	Least frequently objects are removed first.	Lower runtime complexity	Large storage of object.
4	MRU	Most recently objects are removed first.	Simple to implement. Suitable for particular application.	Low access latency
5	GD-SIZE	Equal size of object and each object has different cost to fetch.	Removal of object which are no longer requested	Not taken account of previous frequency of object.
6	SIZE	Replacing large object by smaller object.	Keep small object in cache	Never used small objects are also stored.
7	ARC	Maintains to list for calculating LRU and LFU.	Lower overhead, self-tuning.	Two list need to maintain.
8	CLOCK	Clock hand point to oldest page in cache.	very low overhead on cache, removes serialization problem.	Does not take care of temporal filtering.
10	CAR	Combine both clock and adaptive algorithms.	Adaptively and dynamically captures the "recency" and "frequency" features of a workload.	Difficult to implement.

Table: 5 Classification of different cache replacement algorithms

DATA STRUCTUREUSE IN CACHE REPLACEMENT ALGORITHMS

There are different data structures used to represent the nodes in trie search. The choice of data structure depends on cache characteristics as well as the fan out of the node. Different data structures are used in different implementation of nodes in trie. Three different types of data structures used commonly are
1. Partitioned array
2. B-tree
3. Hash table
4. Queue
9.1 HASH TABLE

Hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found. A good hash function and implementation algorithm is essential for good hash table performance. The basic requirement is that the function should provide a uniform distribution of hash values. The Perfect hash function can be used to create a perfect hash table that has no collisions. The running time complexity of hash table is given in the below in the table 6

Table 6 Running time of hash table
CONCLSION

Integrating existing databases is certainly not an easy task. Still, it is something that enterprises probably cannot avoid if they want to launch new applications or to reorganize the existing information system. In this paper we presented a comparison of several systems that use ontologies to solve the problem involved in data integration. It also discussed various famous cache replacement policies like LRU, LFU, FIFO, ARC, and CAR and so on. It also explains the variant characteristics of different algorithms, which helps to characterize the behavior and development of new cache page replacement techniques in future development.

REFERENCE

Rodrigo Fernandes Calhau, Ricardo de Almedia Falbo (2010),

An Ontology-based Approach for Semantic Integration, IEEE International Enterprise Distributed Object Computing Conference, EDOC 2010.
Hajmoosaei, A., & Abdul-Kareem, S. (2007, December), An ontology-based approach for resolving semantic schema conflicts in the extraction and integration of query-based information from heterogeneous web data sources, In Proceedings of the Third Australasian Workshop on Advances in Ontologies ,Volume 85 (pp. 35-43).
Wache, H., Voegele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., & HÃ¼bner, S. (2001, august), Ontology- based integration of information-a surve of existing approaches, In IJCAI-01 workshop: Ontologies and information sharing (vol. 2001, pp. 108-117).(Wache H et al 2001)
Preece, A., Hui, K. Y., Gray, A., Marti, P., Bench-Capon, T., Jones, D., & Cui, Z. (2000). The KRAFT architecture for knowledge fusion and transformation.Knowledge-Based Systems, 13(2), 113-120.
Bayardo Jr, R. J., Bohrer, W., Brice, R., Cichocki, A., Fowler, J., Helal, A., & Woelk, D. (1997, June). InfoSleuth: agent-based semantic integration of information in open and dynamic environments. In ACM SIGMOD Record (Vol. 26, No. 2, pp. 195-206). ACM.
Arens, Y., Chee, C. Y., Hsu, C. N., & Knoblock, C. A. (1993). Retrieving and integrating data from multiple information sources. International Journal of Intelligent and Cooperative Information Systems, 2(02), 127-158.
Beneventano, D., Bergamaschi, S., Castano, S., Corni, A., Guidetti, R., Malvezzi, G.,& Vincini, M. (2000, September). Information integration: the MOMIS project demonstration. In VLDB (pp. 611-614).
Mena, E., Illarramendi, A., Kashyap, V., & Sheth, A. P. (2000). OBSERVER: An approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distributed and parallel Databases, 8(2), 223-271.
Amit S. Chavan, Kartik R. Nayak, Keval D. Vora, Manish D. Purohit and Pramila M. Chawan, A Comparison of Page Replacement Algorithms, IACSIT , Vol.3, No.2, April 2011, pp (171-174). (Amit S et al 2011)
Bansal, S., & Modha, D. S. (2004, March). CAR: Clock with Adaptive Replacement. In FAST, Vol. 4, pp. 187-200.
Megiddo, N., & Modha, D. S. (2004). Outperforming LRU with an adaptive replacement cache algorithm. Computer, 37(4), 58- 65.
Kavar, C. C., & Parmar, S. S. (2013, February) , Performance Analysis of LRU Page Replacement Algorithm with Reference to different Data Structure, International Journal of Engineering Research and Applications (IJERA), Vol. 3, Issue 1, January – February2013,pp(2070-2076).

A Survey: Different Approaches for Building Ontology and Algorithms used in Cache Optimization

Leave a Reply