Prepis zvukových nahrávok do textovej podoby

This thesis deals with methods for transcribing spoken speech into text and training the Whisper speech recognition model for Slovak. The goal was to develop a model capable of efficiently processing natural spoken Slovak with varying sentence length and speech rate. Publicly available data from the Common Voice project and our own collection of recordings were used for training. The data was properly preprocessed for training purposes.The training was performed using the Transformers library. The resulting model was evaluated on the basis of recognition accuracy (WER and CER) and shows improvement in the Slovak domain compared to existing pre-trained models.

Subject(s)

speech recognition, Slovak language, Whisper, machine learning, audio processing, Hugging Face Transformers, training, Common Voice

Item identifier

http://hdl.handle.net/10084/156733

Collections

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI)

Show full item record

Prepis zvukových nahrávok do textovej podoby

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Delayed publication

Available after

Subject(s)

Citation

Item identifier

Collections