Identifikácia autorov v textových dokumentoch
Loading...
Files
Downloads
6
Date issued
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská – Technická univerzita Ostrava
Location
Signature
Abstract
The thesis deals with the identification of authors in text documents. The aim of the work is to test machine learning, neural network and deep neural network models and their suitability for natural language processing and author identification tasks in text documents.
The theoretical part discusses the preparation of data before using it as input to the models. It discusses the techniques that are used for preprocessing, and also discusses forms of vectorization and word embedding. Individual models from the fields of machine learning, neural networks and transformer models are described.
In the practical part, several experiments have been conducted and are discussed and evaluated based on 4 metrics - accuracy, recall, precision, f1 score. Balanced and unbalanced datasets were tested against each other, several types of vectorizations and their parameter settings were tested and models were adjusted to achieve the highest accuracy.
Description
Subject(s)
authorship identification, natural language processing, text processing, machine learning, neural networks, convolutional neural networks, recurrent neural networks, transformer architecture, Bert, DistilBert, Electra