DeepVoCoder: A CNN model for compression and coding of narrow band speech

dc.contributor.authorKeles, Hacer Yalim
dc.contributor.authorRozhon, Jan
dc.contributor.authorIlk, Hakki Gokhan
dc.contributor.authorVozňák, Miroslav
dc.date.accessioned2019-09-11T09:40:03Z
dc.date.available2019-09-11T09:40:03Z
dc.date.issued2019
dc.description.abstractThis paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results.cs
dc.description.firstpage75081cs
dc.description.lastpage75089cs
dc.description.sourceWeb of Sciencecs
dc.description.volume7cs
dc.identifier.citationIEEE Access. 2019, vol. 7, p. 75081-75089.cs
dc.identifier.doi10.1109/ACCESS.2019.2920663
dc.identifier.issn2169-3536
dc.identifier.urihttp://hdl.handle.net/10084/138509
dc.identifier.wos000473188800001
dc.language.isoencs
dc.publisherIEEEcs
dc.relation.ispartofseriesIEEE Accesscs
dc.relation.urihttp://doi.org/10.1109/ACCESS.2019.2920663cs
dc.rightsCopyright © 2019, IEEEcs
dc.rights.accessopenAccesscs
dc.subjectconvolutional neural networkcs
dc.subjectdeep learningcs
dc.subjectsource codingcs
dc.subjectspeech codecscs
dc.titleDeepVoCoder: A CNN model for compression and coding of narrow band speechcs
dc.typearticlecs
dc.type.statusPeer-reviewedcs
dc.type.versionpublishedVersioncs

Files

Original bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
2169-3536-2019v7p75081.pdf
Size:
3.55 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: