Data compression using grammar transformations

Abstract

Data compression based on statistical algorithms is the basis for many of the compression algorithms used in practice. Statistical compression algorithms are typically used in the last step of a compression scheme where their use precedes the application of various transformation algorithms. One of the concepts of transformation algorithms that has recently been formalized is the concept of data transformation based on grammar transformations. These transformations aim to capture the internal data structure and allow for more efficient compression. The basic approach in grammar transformations is finding repetitive substrings and encoding them into the alphabet of nonterminal symbols. In this work I deal with analysis and design of grammar transformations based algorithms. I distinguish here two groups of algorithms: transformation algorithms and algorithms based on the context-free and the context-sensitive grammars. A group of novel text transformation algorithms is presented, these algorithms do not require the introduction of the alphabet of nonterminal symbols, thus facilitating the storage and subsequent statistical compression of the production rules, but also requires a different definition of the inverse transformation. The theoretical part of the thesis deals with the consequences of applications of transformations and grammatical algorithms on zero order entropy and above all presents relations that exactly describe these consequences. The result of this theoretical analysis is several modifications of the popular Re-Pair algorithm and the design of the MinEnt algorithm. The secondary results of this work are the proposal of DBC algorithm that selects production rules based on delimiter symbols.

Description

Subject(s)

grammars, context-free grammars, data compression, entropy, Re-Pair, MinEnt, DBC

Citation