🏆
Trusted Engineering Publisher
Serving Researchers Since 2012

Using JSON and SPARQL to map Insurance Domain Data from Spreadsheets to OWL

DOI : https://doi.org/10.5281/zenodo.18659684
Download Full-Text PDF Cite this Publication

Text Only Version

 

Using JSON and SPARQL to map Insurance Domain Data from Spreadsheets to OWL

Sakshi Gupta, Lalit Sen Sharma, Neha Jain

Department of Computer Science & IT, University of Jammu, Jammu and Kashmir, India

Abstract – Semantic Web tools are beneficial for the smooth transfer, access and interpretation of information in the traditionally complex insurance industry, which is all-inclusive of local standards, metadata elements and isolated data models. Auto ontology introduces a new data model for digital insurance. Standardised products and relationships with well-defined properties ensure a common understanding of information and make explicit domain assumptions, thus allowing organizations to make better sense of their data and establish interoperability of data between insurers, partners and customers.

The ontology itself is realized using Protégé (Protégé OWL Web Ontology Editor) and is implemented in OWL and queried using SPARQL query language.

Keywords: Semantic Web, knowledge representation, Protégé, OWL (Web Ontology Language), RDF, Cellfie, SPARQL (SPARQL and RDF Query Language).

  1. INTRODUCTION:

    Auto ontology introduces a new data model for digital automobile insurance. Standardised products and relationships with well- defined properties ensure a common understanding of information and make explicit domain assumptions, thus allowing organisations to make better sense of their data and establish interoperability between insurers, partners and customers. Even though the automobile industry has to standardize data related to the insurance sector, there is still no unified accepted ontology for this. Due to the presence of different companies, there are different policy terms, which can lead to ambiguities and different interpretations of the terms in the industry. Due to this, we feel there is a need of universal ontology that can lead to a unified platform for all the companies together. Auto ontology is a unified ontology that describes a set of concepts within the automobile insurance domain and the relationships among them, providing a uniform way to enable communication among the associated stakeholders. Knowledge differs from data and information as data is present in raw and unprocessed form, information is the processed form of data, whereas knowledge is the self-learning through experiences and skills [1]. The transition of concepts from data to knowledge is taking place around the world, as is the transition of the web from a document model to a data model. The Semantic Web, which is an extension of the current web [2], aims to enrich the Web with a layer of machine-interpretable metadata so that computer programs can predictably derive new information and knowledge, furnishing better data integration, interoperation and more intelligent support for the end users.

    Since the ontology development in the insurance domain is underexplored, we created the ontology from scratch. Auto ontology is a unique ontology that systematically maps spreadsheet-based insurance data into the ontology with the help of JSON transformation rules. The ontology is evaluated using SPARQL. It helps to improve interoperability, reusability, and intelligent querying in the insurance sector.

    Ontology creation in any domain requires a syntax for representing metadata and Vocabularies for expressing the metadata. The W3C (World Wide Web Consortium) has defined such open standards for metadata syntax as RDF (Resource Description Framework), OWL (Web Ontology Language), SPARQL, SQWRL, etc.

    In this paper, a Knowledge Representation (KR) of the automobile insurance sector domain is presented using Semantic Web tools. Ontology development is an evolving process; there is surely a scope for further enhancement. Section 2 presents a discussion on related work in multiple domains, followed by a brief overview of the tools and techniques used for knowledge representation,

    which is outlined in Section 3. Section 4 contains the evaluation of the ontology using SPARQL queries, which ends with the conclusion and future scope of our research in Section 5.

  2. RELATED WORK

    An extensive amount of work has been done in the field of knowledge representation in various domains using SW technologies. A thorough literature review was done with the motive to study the different Knowledge Representation (KR) methodologies using SW technologies.

    Research in the Semantic Web has created a robust framework for knowledge representation and interoperability that allows various concepts, relations and constraints to be machine-interpreted across numerous domains [2, 12]. An ontology-based framework has become a part of various knowledge-based industrial business applications. The oil and Gas industry has used semantic web technology to detect the cause of external corrosion. The data of the state of corroding assets was collected in CSV files and transformed into RDF, and to address the heterogeneity, an ontology model was developed, and knowledge was inferred from the ontological data [3]. The technology supported the design of a system that could detect corrosion efficiently and precisely. The data of a city in an urban space domain was also represented using OWL 2. A knowledge base was created to include the public, private buildings and open spaces in the street. The street knowledge was represented using Urban Morphology Ontology (UMO) and UrbanGen tool, and the alternatives for street designers were also generated to design street patterns [4]. A complete overview of SW and KR was given in [5]. The researchers, through their experimentations, extended the capability of ontologies to represent the temporal knowledge, that is, quantitative and qualitative. The extension of quantitative temporal knowledge was implemented using Semantic Web Rule Language (SWRL) and OWL axioms [6]. To reason the temporal knowledge, an interval-based ontology, TL-OWL, was created [7]. Ontology-based knowledge representation and reasoning techniques were also used to provide knowledge about the environment to the robots [8]. The multiple geometric knowledge representations were performed through turtle, RDF API JENA, and ontology in the geo-visualisation domain [9]. A KB is created using SPIN (SPARQL Inferencing Notation) rule engine, and knowledge from multiple rescue missions was stored in different formats, like OWL, RDF triples, etc. using JSON rules [10].

    Before the creation of Auto ontologies, a search was performed for ontologies about Automobile Insurance Data. However, no available ontologies were found, with the adhering characteristics to the required ontology to be extended and reused. So, the next step was to create an Automobile Insurance ontology from a starting point using rules and data from the domain of the automobile insurance sector. The aim is to provide frameworks for representing shareable and reusable knowledge across the insurance domain, providing interconnectedness and suitable relationship representation leading to interoperable, linked and coherent data.

  3. TOOLS AND TECHNIQUES USED FOR REPRESENTATION

    An ontology is a formal description of knowledge as a collection of concepts within a domain and the connections between them. It guarantees a shared interpretation of the data and explicitly states the domain assumptions, enabling companies to better understand their data.

    Ontologies make it easier for the domain to connect individuals, groups, and application systems. A number of formal languages have evolved for ontology development. These languages make it easier to encode knowledge in specified domains, and they usually provide reasoning features that allow for the processing of the stored knowledge. Typically, these languages are declarative in nature, based on description logic or first-order logic [11]. Auto is created using the W3C Web Ontology Language (OWL) [12] and Protégé 5.5.0 which is an ontology development environment. OWL is a Semantic Web language designed to render complicated knowledge about objects, their groupings, and the relationships among them.

    Data was entered in Protégé tool (version 5.5.0), with the purpose of creating an ontology called Auto containing the automobile domain insurance classes and their associated object properties. The hierarchical representation is as implemented in Figure 1. The plugin works with the Java Script Object Notation (JSON) rules. These rules were created while maintaining the integrity of the data. The rules then generate instances on the basis of the dataset, as shown in Figure 2.

    Figure 1: Hierarchical representation of the automobile domain

    The instances are created using Cellfie plugin, such that it includes data into the ontology. The editor included with the Cellfie plugin is used to construct the transformation rules. The rules are kept as JSON files. According to the transformation rules, axioms are generated.

    Figure 2: Instances created using JSON rules

    These instances are further represented using RDF triples. The representation of the dataset in the form of triples is depicted in the table given below.

    Subject Predicate Object Aadhya hasAge 41

    Aadhya hasPolicyNumber 227811

    A customer dataset pertaining to insurance sector been taken from an online platform, Kaggle.com, and then mapped to the ontology using Cellfie protégé plugin [13]. The dataset has been cleaned by removing the unnecessary outliers, values and duplicate values. It includes 500 values which are represented using Cellfie plugin of the Protégé application. Each entity of the dataset has the values such as customer name, gender, age, policy number, etc. The knowledge of these entities is represented using JSON transformation rules. A snapshot of the dataset is shown in Figure 3.

    Figure 3: Glimpse of the dataset

    The resulting mapping of the dataset in the protégé application is the form of knowledge representation of the domain. An example of the JSON transformation rule has been shown in Table 1. The mapping through the Cellfie 2.1.1 plugin represented the knowledge in the form of rdf, rdf/s, and ontology. These rules were created for the knowledge representation of 500 individuals.

    Using the dataset and knowledge gathered from the domain experts and other sources,4595 axioms and 500 instances were generated pertaining to 34 classes and 30 data properties.

    {“Collections”: [{“sheetName”:”Sheet1″, “startColumn”:”A”,

    “endColumn”:”A”,

    “startRow”:”2″,

    “endRow”:”+”,

    “comment”: “Creating individuals for the policy holders”, “rule”: “Individual: @A*\n

    Types: Person\n

    Facts: hasName @A*(xsd: string), \n hasAge @D*(xsd: integer), \n hasGender @B*(xsd: string), \n

    hasMonthsAsCustomer @C* (xsd:decimal),\n hasPolicyNumber @E*(xsd: double), \n hasEducation @G* (xsd:string),\n

    hasZip @F* (xsd:double)”, “active”:true}]}

    Table 1: Transformation rule example

  4. EVALUATION USING SPARQL QUERY

    RDF is a data model for defining resources on the World Wide Web and how they relate to each other, as it stores data in the form of triples. RDF presents data as a data graph, and there is no hierarchy in a data graph. No root node exists. Resources are interrelated with one another in a graph, with no particular resource taking priority over another. This data model’s structure is flexible, and relationships are not clearly defined. Therefore, a more adaptable query language is required to efficiently query such a data model. The Semantic Web’s SPARQL (SPARQL and RDF Query Language) [14] query language is capable of accessing and modifying data that has been saved in the RDF format.

    The Auto ontology has been evaluated using SPARQL language. To run SPARQL query, Apache Jena Fuseki version 4.5.0 has been installed and queries executed using command prompt. The snapshot of the Fuseki server is attached via Figure 4.

    Figure 4: Snapshot of the Fuseki server

    The SPARQL query has been executed for the Auto ontology. The auto ontology containing 4596 triples have been uploaded on the server. We have developed different scenarios to validate our ontology and presented results. The SPARQL query to extract the names, policy number and age of the policy holders, and another query to find out the persons having age above 40 years, is shown in the table below:

    PREFIX owl: http://www.semanticweb.org/saksh/ontologies/2023/7/untitled- ontology 13#

    PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# SELECT ?Name ?PolicyNumber ?Age

    {

    ?Person owl:hasName ?Name.

    ?Person owl:hasPolicynumber ?PolicyNumber.

    ?Person owl:hasAge ?Age.

    }

    SELECT ?Person WHERE

    {

    ?Person owl:hasAge ?Age. FILTER (?Age >= 40)

    }

    Table 2: SPARQL Query

    The result of the above query to extract the fields of the dataset, and by applying a Filter condition, is shown in the snapshots below:

    Figure 5: Snapshots of the result of SPARQL query

    The query is executed for the dataset of 500 values. So, it has generated 500 results for the first query and 225 results for the second query.

  5. CONCLUSION AND FUTURE SCOPE

The vastness of data generated and ever-enduring data requirements have been propulsive in Semantic Technology research to progress and to meet the industrial expectations by improving query access, performance, and inferencing competence. An insurance ontology can play the role of a common, standardized vocabulary that helps communication and knowledge exchange between insurance partners, in the form of Linked Data. By taking advantage of the Auto ontology, we have shown that it is possible to make query formulation and execution more comprehensible and to enable reasoning-based queries facilitating industries facing ever- increasing datasets in making meaningful use out of them, and use semantic analytics as an advantage.

Protégé 5.5.0 served as a reliable tool for the representation the undertaken work for its aptness towards creating interoperable, reusable and unambiguous ontologies.

REFERENCES:

  1. S. Bakarada and A. Koronios, A Semiotic Theoretical and Empirical Exploration of the Hierarchy and its Quality Dimension. [Online]. Available: https://ssrn.com/abstract=2304010
  2. T. Berners-Lee, J. Hendler, and O. Lassila, The semantic web, Scientific American, vol. 284, no. 5. 2001. doi: 10.1038/scientificamerican0501-34.
  3. M. Saeed, C. Chelmis, V. Prasanna, R. House, and J. Blouin, SPE-174042-MS Semantic Web Technologies for External Corrosion Detection in Smart

    Oil Fields, 2015.

  4. M. Berta, L. Caneparo, A. Montuori, and D. Rolfo, Semantic urban modelling: Knowledge representation of urban space, Environ Plann B Plann Des, vol. 43, no. 4, pp. 610639, Jul. 2016, doi: 10.1177/0265813515609820.
  5. N. Ayari, A. Chibani, Y. Amirat, and E. Matson, A semantic approach for enhancing assistive services in ubiquitous robotics, Rob Auton Syst, vol. 75,

    pp. 1727, Jan. 2016, doi: 10.1016/j.robot.2014.10.022.

  6. S. Batsakis, E. G. M. Petrakis, I. Tachmazidis, and G. Antoniou, Temporal representation and reasoning in OWL 2, Semant Web, vol. 8, no. 6 pp. 981 1000, 2017, doi: 10.3233/SW-160248.
  7. S.-K. Kim, M.-Y. Song, C. Kim, S.-J. Yea, H. C. Jang, and K.-C. Lee, LNCS 5367 – Temporal Ontology Language for Representing and Reasoning Interval-Based Temporal Knowledge.
  8. R. Gayathri and V. Uma, Ontology based knowledge representation technique, domain modeling languages and planners for robotic path planning: A

    survey, ICT Express, vol. 4, no. 2. Korean Institute of Communications Information Sciences, pp. 6974, Jun. 01, 2018. doi: 10.1016/j.icte.2018.04.008.

  9. W. Huang and L. Harrie, Towards knowledge-based geovisualisation using Semantic Web technologies: a knowledge representation approach coupling

    ontologies and rules, Int J Digit Earth, vol. 13, no. 9, pp. 976997, Sep. 2020, doi: 10.1080/17538947.2019.1604835.

  10. F. Yazdani, S. Blumenthal, N. Huebel, A. K. Bozcuolu, M. Beetz, and H. Bruyninckx, Query-based integration of heterogeneous knowledge bases for

    search and rescue tasks, Rob Auton Syst, vol. 117, pp. 8091, Jul. 2019, doi: 10.1016/j.robot.2019.03.013.

  11. F. Baader, I. Horrocks, and U. Sattler, LNAI 2605 – Description Logics as Ontology Languages for the Semantic Web.
  12. P.F. Patel-Schneider, P. Hayes, and I. Horrocks, OWL Web Ontology Language semantics and abstract syntax, W3C Recommendation, 2004.
  13. M. J. OConnor, C. Halaschek-Wiener, and M. A. Musen, Mapping master: A flexible approach for mapping spreadsheets to OWL, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, 2010, pp. 194208. doi: 10.1007/978-3-642-17749-1_13.
  14. E. Prudhommeaux, A. Seaborne, SPARQL Query Language for RDF, Jan., 2008, Available: http://www.w3.org/TR/2008/REC-rdf-sparql-query- 20080115/

Declarations:

Ethical Approval: This study did not require formal ethical approval as it involved the use of publicly available data and did not involve human participants or animals. All procedures adhered to ethical research standards, including respect for privacy and confidentiality. The research was conducted in accordance with applicable institutional guidelines and regulations. No sensitive or personal data were collected, and the study did not pose any risks to individuals or groups. Ethical considerations were carefully observed throughout the research process to ensure the integrity and reliability of the findings.

Consent to publish: I, Sakshi Gupta, hereby give my consent for the publication of the research paper titled ‘Using JSON and SPARQL to map Insurance Domain Data from Spreadsheets to OWL’ in Multimedia Tools and Applications. I confirm that the work is original, and all necessary approvals, including ethical and institutional, have been obtained. I have contributed significantly to the research and the writing process, and all co-authors, if any, have given their consent as well. I understand that the paper will be publicly available and may be subject to peer review. I agree to abide by the publications terms and conditions, including copyright policies.

Consent to participate: I, Sakshi Gupta, give my consent to participate in the research work titled ”Using JSON and SPARQL to map Insurance Domain Data from Spreadsheets to OWL’ conducted by me. I understand the objectives of the study, the procedures involved, and any possible risks or benefits. I am aware that my participation is entirely voluntary, and I can withdraw at any time without any consequence. I also understand that my data will be kept confidential and used only for the purposes of this research.

Author Contributions: The present study done by Sakshi Gupta, Lalit Sen Sharma and Neha Jain, introduces the ontological framework for the representation of insurance domain data. The current insurance sector is quite messy and unreliable as it is driven by agents of the companies. Our work incorporates the semantic view to the knowledge of the insurance domain. The ontology is the explicit representation of the concepts used. An ontology is a reliable way to incorporate the conceptual knowledge of concepts used in this domain.

Funding Information: No Funding Available

Availability of data and materials: The data has been taken from online platform Kaggle.com. It can be accessed from https://1drv.ms/x/s!AvDStW2wttluoAu6HmB8iRLoDDdr?e=1ZIcX9.

Competing Interests Statement: The authors declare that they have no competing interests. There are no financial, professional, or personal conflicts that could have influenced the research, analysis, or conclusions presented in this manuscript.