Přepis audiozáznamů do textové podoby

This master's thesis focuses on methods for transcribing audio recordings into text, with a particular emphasis on transcription accuracy. The work summarizes the principles of automatic speech recognition, including traditional approaches based on Hidden Markov Models and Gaussian Mixture Models, as well as modern methods using deep neural networks and end-to-end architectures. Special attention is given to the Whisper model, which was implemented and experimentally evaluated. To validate the system’s performance, experiments were conducted involving data processing techniques, model modifications, and training parameter adjustments. The results show that fine-tuning the model, including audio augmentation and the addition of dense or adapter layers, significantly improves transcription accuracy measured by WER and CER metrics. The contribution of the thesis lies in the practical implementation of an efficient Czech speech transcription system and the analysis of the impact of various experimental methods on transcription quality.

Subject(s)

automatic speech recognition, audio-to-text transcription, Whisper model, deep learning, WER and CER metrics, audio augmentation, neural networks, Czech language

Item identifier

http://hdl.handle.net/10084/157035

Collections

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI)

Show full item record

Přepis audiozáznamů do textové podoby

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Delayed publication

Available after

Subject(s)

Citation

Item identifier

Collections