Abstract
As long as the semantic search has gained valuable response,
it is being used to develop Semantic Web application. In this paper a review
literature is presented where ontology based KBs model has been proposed for
enhancing search process for the corpus of large documents. The proposed model
for the retrieval of information based on an enhanced version of classical
vector space model along with annotation weighting algorithm with the
combination of ranking algorithm. To achieve lenience to KB incompleteness,
semantic search is combined with keyword base search. The proposed model has
been tested on reasonable sizes of corpus, retrieving favorable results particularly
for keyword base searches and paving the way for further research and for the
analysis to attain the targeted results in desired time.
Semantic
Web gained valuable response in late 90s and at that time semantic search has
become the evident benefit. Semantic search engine is considered as one way
tool that can attain ontology based queries like RDQL, RQL, SPARQL from a
client and to execute them for a knowledge base tuples of ontology values in
response to queries. Boolean search mode is normally used that is based on a
strict representation of information space that is not ambiguous, non-redundant,
formal chunks of knowledge base ontology.
Chunks
of information providing knowledge which may be either true or false answer to
the request generated by the system or user. It is assumed that the results of
the search may always be specific and there are no chances for the
inconsistency for the requisite information which is retrieved as answer.
While
user submits a request for some information by using a search engine, according
to the theory of Information Retrieval in semantic web, the search engine
starts collecting documents having the requisite values and returns these
documents in the result set instead of the pieces of exact values. To enhance
the results while representing to the user, the search engine ranks the
documents according to their similarity and dissimilarity. The most relevant
documents are ranked on the top of the result set for providing convenience to
the user.
The retrieval model based on purely Boolean ontology brings the whole information corpus as a result set. This happens as large amount of information is available to information systems which are available in unstructured text, documents.
The values that carry free text, Boolean
semantic search systems performs text search within the available string
values. These values may contain text of long pieces.
Literature Review
During
the review of literature, I found that the preference is given to knowledge
based models for information retrieval rather than to focus on the search on
the basis of domain ontologies in order to support semantic search in
warehouses.
It
is observed that the size of information being searched and quality for
knowledge based searches has direct relation with the ontology based proposed
model. With the advancement in technology to retrieve and to present data in
the result set based on automating ontology. Semi automatic text
interpretations are progressive. As ontologies and metadata are managed by
different organizations to provide complete and strictly valid information, if
not so the information would not be guaranteed to be accurate. Consequently
incomplete KBs are considered as important requirement.
To
explore the documents based on keywords, the body to the requisite documents
should contain text and annotation or providing some facilities so these
documents may be explored. Consequently the new information retrieval models
which are to be launched in the market, paying much attention for interleave
document retrieval and inference. As a matter of fact the interleaving process
being done today is distributed between the people and the computer which is
discussed here that how a traditional information retrieval session takes place
typically.
-
The
semantic query is prepared as a mental effort of a person, seeking information.
- The
query is encoded by the person with the combination of keywords which are
important and relevant to the documents being searched by the IR system of
develop a resultant answer set.
-
The
document are retrieved by finding similarities between the documents and the
queries, after that any ranking functions is applied by the IR system.
-
The
user of the system reviews the ranked documents presented at top of the result
set for the convenience of the user. The user will find these documents most
relevant if he/she finds and extract the meanings after reading them.
-
The
process will be succeeded if the semantic query meets the requirements by using
the current model, satisfying the needs of the user.
-
Otherwise,
the text of the query will be reformulated with the help of newly extracted
knowledge and facts and the process will go on.
Knowledge base and Document Base
If we talk
about semantic information retrieval, we consider that the knowledge based IR
system is developed and document based is associated with it. The proposed
system is considered worthy to work with arbitrary domain ontology having no
restrictions. Somehow it does so to meet the minimal requirements that may have
conformation to a class of root ontology.
The
procedure is used for mapping concepts and instances of KB domain to string
keywords with automatic annotation as used in KIM and TAP systems. Automatic
annotator is used for the mapping purposes to analyze the occurrences of
instances and also concepts in resultant documents. There might be still other
unidentified complexities while doing automatic annotation and to meet the
requirements further techniques are also required.
Almost
every ranking module use the process of annotations and classic vector space model
is altered, adopted and used by ranking algorithm. Keywords are assigned with
weights as they appear in a document, while using classical vector space model.
This method reflects weights assigned to words to better distinguishing between
documents than other. The weights assigned to annotations are used to reflect
the importance that how the instances are considered fit in order to maintain
the meanings of document.
1 Comments
Outstanding
ReplyDelete