Různé algoritmy pro porovnávání textových dokumentů

Loading...
Thumbnail Image

Downloads

1

Date issued

Authors

Sýkora, Jiří

Journal Title

Journal ISSN

Volume Title

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Location

Signature

Abstract

Nowadays, when information technology is being quickly developed, we are forced to deal with questions about similarity of documents. As a result, a lot of algorithms which handle these problems have been created. They have a large use, especially in verification of plagiarism. These are the reasons why a lot of algorithms for comparing files have been created. Each type is based on other system, e. g. it verifies the number of word occurences or it sets a similarity using vectors. These algorithms are used not only for verifying plagiarisms, but e. g. the Boolean and the vector models are used in the search systems. In this thesis, the methods which can be used for file comparison are described. In the theoretic part the algorithms using these methods are shown. In the practical part an implementation of the chosen methods is showed – these include the signature method, the Normalized Compression Distance and the Fast Compression Distance. At the end of this thesis the implemented programs are compared and valorised.

Description

Import 05/08/2014

Subject(s)

Java, plagiarisms, signature methods, signature files, signatures, bool model, vector model, Levenshtein distance, Hamming distance, Normalized Compression Distance

Citation