Různé algoritmy pro porovnávání textových dokumentů
Loading...
Downloads
1
Date issued
Authors
Sýkora, Jiří
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská - Technická univerzita Ostrava
Location
Signature
Abstract
Nowadays, when information technology is being quickly developed, we are forced to deal with questions about similarity of documents. As a result, a lot of algorithms which handle these problems have been created. They have a large use, especially in verification of plagiarism.
These are the reasons why a lot of algorithms for comparing files have been created. Each type is based on other system, e. g. it verifies the number of word occurences or it sets a similarity using vectors. These algorithms are used not only for verifying plagiarisms, but e. g. the Boolean and the vector models are used in the search systems.
In this thesis, the methods which can be used for file comparison are described. In the theoretic part the algorithms using these methods are shown. In the practical part an implementation of the chosen methods is showed – these include the signature method, the Normalized Compression Distance and the Fast Compression Distance. At the end of this thesis the implemented programs are compared and valorised.
Description
Import 05/08/2014
Subject(s)
Java, plagiarisms, signature methods, signature files, signatures, bool model, vector model, Levenshtein distance, Hamming distance, Normalized Compression Distance