THE QUEST FOR INFORMATION RETRIEVAL ON THE SEMANTIC WEB

 Abstract

            As long as the semantic search has gained valuable response, it is being used to develop Semantic Web application. In this paper a review literature is presented where ontology based KBs model has been proposed for enhancing search process for the corpus of large documents. The proposed model for the retrieval of information based on an enhanced version of classical vector space model along with annotation weighting algorithm with the combination of ranking algorithm. To achieve lenience to KB incompleteness, semantic search is combined with keyword base search. The proposed model has been tested on reasonable sizes of corpus, retrieving favorable results particularly for keyword base searches and paving the way for further research and for the analysis to attain the targeted results in desired time.

 Introduction

            Semantic Web gained valuable response in late 90s and at that time semantic search has become the evident benefit. Semantic search engine is considered as one way tool that can attain ontology based queries like RDQL, RQL, SPARQL from a client and to execute them for a knowledge base tuples of ontology values in response to queries. Boolean search mode is normally used that is based on a strict representation of information space that is not ambiguous, non-redundant, formal chunks of knowledge base ontology.

            Chunks of information providing knowledge which may be either true or false answer to the request generated by the system or user. It is assumed that the results of the search may always be specific and there are no chances for the inconsistency for the requisite information which is retrieved as answer.

            While user submits a request for some information by using a search engine, according to the theory of Information Retrieval in semantic web, the search engine starts collecting documents having the requisite values and returns these documents in the result set instead of the pieces of exact values. To enhance the results while representing to the user, the search engine ranks the documents according to their similarity and dissimilarity. The most relevant documents are ranked on the top of the result set for providing convenience to the user.

            The retrieval model based on purely Boolean ontology brings the whole information corpus as a result set. This happens as large amount of information is available to information systems which are available in unstructured text, documents.

             Each document has its own id and values for any information system, which is not equivalent to the pieces of information contained in a document. These pieces of information are formalized or un-formalized, interlinked or not, it doesn’t matter. It would be useful to decompose the documents as smaller pieces of information which may be reused and reassemble as and when required.

            The values that carry free text, Boolean semantic search systems performs text search within the available string values. These values may contain text of long pieces.  

Literature Review

            During the review of literature, I found that the preference is given to knowledge based models for information retrieval rather than to focus on the search on the basis of domain ontologies in order to support semantic search in warehouses.

            It is observed that the size of information being searched and quality for knowledge based searches has direct relation with the ontology based proposed model. With the advancement in technology to retrieve and to present data in the result set based on automating ontology. Semi automatic text interpretations are progressive. As ontologies and metadata are managed by different organizations to provide complete and strictly valid information, if not so the information would not be guaranteed to be accurate. Consequently incomplete KBs are considered as important requirement.

 DISCUSSION

            To explore the documents based on keywords, the body to the requisite documents should contain text and annotation or providing some facilities so these documents may be explored. Consequently the new information retrieval models which are to be launched in the market, paying much attention for interleave document retrieval and inference. As a matter of fact the interleaving process being done today is distributed between the people and the computer which is discussed here that how a traditional information retrieval session takes place typically.

 

-          The semantic query is prepared as a mental effort of a person, seeking information.

-      The query is encoded by the person with the combination of keywords which are important and relevant to the documents being searched by the IR system of develop a resultant answer set.

-          The document are retrieved by finding similarities between the documents and the queries, after that any ranking functions is applied by the IR system.

-          The user of the system reviews the ranked documents presented at top of the result set for the convenience of the user. The user will find these documents most relevant if he/she finds and extract the meanings after reading them.

-          The process will be succeeded if the semantic query meets the requirements by using the current model, satisfying the needs of the user.

-          Otherwise, the text of the query will be reformulated with the help of newly extracted knowledge and facts and the process will go on.

             The objective of the process is to automate it completely by achieving 2 major benefits. First that the system will be initiated with a query injected by the user and the user is not required to read and extract the meanings of the retrieved documents to generate the result as answer of the semantic query. It also needs not to reformulate the query text. Secondly, no intervention of any human being is required at any level to answer the semantic queries while extracting knowledge from the web. This task is handed over to software agents, which are specially designed to automate the process.

Knowledge base and Document Base

            If we talk about semantic information retrieval, we consider that the knowledge based IR system is developed and document based is associated with it. The proposed system is considered worthy to work with arbitrary domain ontology having no restrictions. Somehow it does so to meet the minimal requirements that may have conformation to a class of root ontology.

             The documents are explicitly linked with the instances of KB, which are non-embedded annotations to the documents. We have to face the problem while extracting knowledge from text, which is not addressed in this section. We are using simple method to provide vocabulary aid in semi-automatic annotation of documents.

            The procedure is used for mapping concepts and instances of KB domain to string keywords with automatic annotation as used in KIM and TAP systems. Automatic annotator is used for the mapping purposes to analyze the occurrences of instances and also concepts in resultant documents. There might be still other unidentified complexities while doing automatic annotation and to meet the requirements further techniques are also required.

            Almost every ranking module use the process of annotations and classic vector space model is altered, adopted and used by ranking algorithm. Keywords are assigned with weights as they appear in a document, while using classical vector space model. This method reflects weights assigned to words to better distinguishing between documents than other. The weights assigned to annotations are used to reflect the importance that how the instances are considered fit in order to maintain the meanings of document.

Post a Comment

1 Comments