DeepVoCoder: A CNN model for compression and coding of narrow band speech

Keles, Hacer Yalim

doi:10.1109/ACCESS.2019.2920663

DeepVoCoder: A CNN model for compression and coding of narrow band speech

dc.contributor.author	Keles, Hacer Yalim
dc.contributor.author	Rozhon, Jan
dc.contributor.author	Ilk, Hakki Gokhan
dc.contributor.author	Vozňák, Miroslav
dc.date.accessioned	2019-09-11T09:40:03Z
dc.date.available	2019-09-11T09:40:03Z
dc.date.issued	2019
dc.description.abstract	This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.	cs
dc.description.firstpage	75081	cs
dc.description.lastpage	75089	cs
dc.description.source	Web of Science	cs
dc.description.volume	7	cs
dc.identifier.citation	IEEE Access. 2019, vol. 7, p. 75081-75089.	cs
dc.identifier.doi	10.1109/ACCESS.2019.2920663
dc.identifier.issn	2169-3536
dc.identifier.uri	http://hdl.handle.net/10084/138509
dc.identifier.wos	000473188800001
dc.language.iso	en	cs
dc.publisher	IEEE	cs
dc.relation.ispartofseries	IEEE Access	cs
dc.relation.uri	http://doi.org/10.1109/ACCESS.2019.2920663	cs
dc.rights	Copyright © 2019, IEEE	cs
dc.rights.access	openAccess	cs
dc.subject	convolutional neural network	cs
dc.subject	deep learning	cs
dc.subject	source coding	cs
dc.subject	speech codecs	cs
dc.title	DeepVoCoder: A CNN model for compression and coding of narrow band speech	cs
dc.type	article	cs
dc.type.status	Peer-reviewed	cs
dc.type.version	publishedVersion	cs

Files

Original bundle

Now showing 1 - 1 out of 1 results

Name:: 2169-3536-2019v7p75081.pdf
Size:: 3.55 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Publikační činnost VŠB-TUO ve Web of Science / Publications of VŠB-TUO in Web of Science
OpenAIRE
Publikační činnost IT4Innovations / Publications of IT4Innovations (9600)
Publikační činnost Katedry telekomunikačních technologií / Publications of Department of Telecommunications (440)
Články z časopisů s impakt faktorem / Articles from Impact Factor Journals