Semantic Web Related Work

Related Work

            The web document’s body is used to define the Semantic web during the early stages of online web based information systems.  The researchers of that age are familiar with the body of the web where appropriate Semantic Web ontologies reside and can track them with the help of explicit URLs.

             Traditional text based search can be improved with the inference of semantic web and that text based search is used for the facilitation purpose of augmented Semantic Web inference. While implementing this vision, there are some difficulties encountered in order to carry out the above mentioned procedure. For the indexing of semantic markup, the straightforward web search approach and techniques cannot be used as well as not for the indexing purposes. When SGML based document is being indexed, the markup by the search engines are ignored. This is invisible to the search engines as Semantic Web documents are based on markups, most probably after detection by the search engines and embedded markups are being indexed they are not processed as the search engines has to distinguish between the text and the markup. The semantic markups are not sufficient for the text retrieval with the help of web search techniques.

           Traditional search engines identify the relevance of the documents on the basis of term statistics. Other techniques like thesaurus expansion and simple relevance calculation are also used in the information retrieval process. This process focuses only on the possibility of the occurrence of the terms and their instances, it only calculates the similarity, but size of the document and irrelevant part is not considered. That’s why they are less effective.

             Similarly, simple text based searches are less effective during inference. It is to be identified that the plain text sufficiently converted to the semantic representation automatically, consequently these types of conversions are used for inference. It is considered very difficult to optimize the semantic interpretation.

             Standardized process for manipulating documents having html text and semantic markups are not yet developed which may be acceptable by all the means. For manipulation of these hybrid types of documents, there are two candidates. First html page can contain semantic markup directly embedded. But the semantic markup approaches like RDF and OWL, which are used just as standalone languages to represent knowledge and not directly tied to text.

           The second approach is by creating document pairs and binds them with HTML semantic markups, one document is designed to contain HTML and second one is containing semantic markup. Pointer is used to bind these files that point to the URI of the 2nd document. Using the same method causes some other difficulties like associating semantic markups with isolated portions of the HTML page and other components of the page, but due to the advancement in the technology, this can be implemented with the help of currently available standards.

            Due to the lack of standardized binding process, it is very difficult to develop and maintain the relationship between these 2 documents. It is the need of the time to bring the people into the process of finding the requisite information and ending the answers to the queries, on the other hand this can be done by using software agents. The theme of the semantic web is in the same capacity. For the integration of two processes i.e. search and inference, a framework is proposed to fulfill the criteria.

-    The framework should be capable of both types of processing which are retrieval-driven and   inference-driven.

-      Words should be used by retrieval process, semantic markup or may be for the both process for     index the terms.

-          Retrieval engines (text based) may be used during the web search process.

-         The processes of inference-driven and retrieval-driven may also be coupled tightly as improvements are required in retrieval and these improvements will also be helpful in improving inference.

 Query Processing and Result Ranking

            The ontology based information retrieval approach is the start of classic technique which is based on keywords, as the semantic knowledge base replaced the keyword based index.

           The proposed system takes RDQL query as input. The input query is either generated by keyword based, natural language, input through form based, UI sophisticated methods are not considered in this paper. After processing, the RDQL query returns number of rows, satisfying the query. Consequently, the documents which are annotated with the similar occurrences are retrieved, ranked and presented to the user.

            The document’s properties involving the instances of domain ontologies can be expressed by RDQL. A record set is generated as a result and returns to the system after execution of the query. If the result set consists of tuples only formed by the domain concepts, the instances of outgoing annotation links are then followed by the retriever and all the annotated documents are collected from the repository.

         Class hierarchies are based on query expansion for implicit inference mechanisms. After preparing the list of annotated documents, the semantic similarity is computed by the search engine between the query and the every document in the result set.

Experimental Testing         

            The proposed system has been tested on a medium sized corpus of 145,316 documents. KIM domain ontology and KB is used which is publicly available with KIM platform. That implementation is considered as compatible with RDF and OWL. The complete version of KB consists of 281 classes, 138 properties, 35689 instances, 465848 sentences stored in a database. KIM Kb provides the feature of concept-keyword mappings.

            The retrieval algorithm is tested on various examples and using the Jakarta Lucence Library, it is compared with keyword only search.

 Using Text

            In the previous section, the text based searches are not used, just few term terms are marked up with swangling. There is no reason for not incorporating the text in the web query. Search result ordering can be influenced by text as input to the system. For swangling markup, the query based on the text can directly be sent to search engines. The text and the markup of retrieved pages can be separated by the extractor.

Post a Comment

0 Comments