Multidimensional term indexing for efficient processing of complex queries

Loading...
Thumbnail Image

Downloads

0

Date issued

Authors

Krátký, Michal
Skopal, Tomáš
Snášel, Václav

Journal Title

Journal ISSN

Volume Title

Publisher

Akademie věd České republiky. Ústav teorie informace a automatizace

Location

Ve fondu ÚK

Signature

Abstract

The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.

Description

Subject(s)

term indexing, complex queries, multidimensional data structures, BUB-forest

Citation

Kybernetika. 2004, vol. 40, no. 3, p. 381-396.