Related
Work
The
web document’s body is used to define the Semantic web during the early stages
of online web based information systems.
The researchers of that age are familiar with the body of the web where
appropriate Semantic Web ontologies reside and can track them with the help of
explicit URLs.
The
second approach is by creating document pairs and binds them with HTML semantic
markups, one document is designed to contain HTML and second one is containing
semantic markup. Pointer is used to bind these files that point to the URI of
the 2nd document. Using the same method causes some other
difficulties like associating semantic markups with isolated portions of the
HTML page and other components of the page, but due to the advancement in the
technology, this can be implemented with the help of currently available
standards.
Due to the lack of standardized binding process, it is very difficult to develop and maintain the relationship between these 2 documents. It is the need of the time to bring the people into the process of finding the requisite information and ending the answers to the queries, on the other hand this can be done by using software agents. The theme of the semantic web is in the same capacity. For the integration of two processes i.e. search and inference, a framework is proposed to fulfill the criteria.
- The
framework should be capable of both types of processing which are
retrieval-driven and inference-driven.
- Words
should be used by retrieval process, semantic markup or may be for the both
process for index the terms.
-
Retrieval
engines (text based) may be used during the web search process.
-
The ontology based information retrieval approach is the
start of classic technique which is based on keywords, as the semantic
knowledge base replaced the keyword based index.
The
proposed system takes RDQL query as input. The input query is either generated
by keyword based, natural language, input through form based, UI sophisticated
methods are not considered in this paper. After processing, the RDQL query
returns number of rows, satisfying the query. Consequently, the documents which
are annotated with the similar occurrences are retrieved, ranked and presented
to the user.
The
document’s properties involving the instances of domain ontologies can be
expressed by RDQL. A record set is generated as a result and returns to the
system after execution of the query. If the result set consists of tuples only
formed by the domain concepts, the instances of outgoing annotation links are
then followed by the retriever and all the annotated documents are collected
from the repository.
Experimental Testing
The
proposed system has been tested on a medium sized corpus of 145,316 documents.
KIM domain ontology and KB is used which is publicly available with KIM
platform. That implementation is considered as compatible with RDF and OWL. The
complete version of KB consists of 281 classes, 138 properties, 35689
instances, 465848 sentences stored in a database. KIM Kb provides the feature
of concept-keyword mappings.
The
retrieval algorithm is tested on various examples and using the Jakarta Lucence
Library, it is compared with keyword only search.
In the previous section, the text based searches are not used, just few term terms are marked up with swangling. There is no reason for not incorporating the text in the web query. Search result ordering can be influenced by text as input to the system. For swangling markup, the query based on the text can directly be sent to search engines. The text and the markup of retrieved pages can be separated by the extractor.
0 Comments