Prepis zvukových nahrávok do textovej podoby

Abstract

This thesis deals with methods for transcribing spoken speech into text and training the Whisper speech recognition model for Slovak. The goal was to develop a model capable of efficiently processing natural spoken Slovak with varying sentence length and speech rate. Publicly available data from the Common Voice project and our own collection of recordings were used for training. The data was properly preprocessed for training purposes.The training was performed using the Transformers library. The resulting model was evaluated on the basis of recognition accuracy (WER and CER) and shows improvement in the Slovak domain compared to existing pre-trained models.

Description

Subject(s)

speech recognition, Slovak language, Whisper, machine learning, audio processing, Hugging Face Transformers, training, Common Voice

Citation