DeepVoCoder: A CNN model for compression and coding of narrow band speech
| dc.contributor.author | Keles, Hacer Yalim | |
| dc.contributor.author | Rozhon, Jan | |
| dc.contributor.author | Ilk, Hakki Gokhan | |
| dc.contributor.author | Vozňák, Miroslav | |
| dc.date.accessioned | 2019-09-11T09:40:03Z | |
| dc.date.available | 2019-09-11T09:40:03Z | |
| dc.date.issued | 2019 | |
| dc.description.abstract | This paper proposes a convolutional neural network (CNN)-based encoder model to compress and code speech signal directly from raw input speech. Although the model can synthesize wideband speech by implicit bandwidth extension, narrowband is preferred for IP telephony and telecommunications purposes. The model takes time domain speech samples as inputs and encodes them using a cascade of convolutional filters in multiple layers, where pooling is applied after some layers to downsample the encoded speech by half. The final bottleneck layer of the CNN encoder provides an abstract and compact representation of the speech signal. In this paper, it is demonstrated that this compact representation is sufficient to reconstruct the original speech signal in high quality using the CNN decoder. This paper also discusses the theoretical background of why and how CNN may be used for end-to-end speech compression and coding. The complexity, delay, memory requirements, and bit rate versus quality are discussed in the experimental results. | cs |
| dc.description.firstpage | 75081 | cs |
| dc.description.lastpage | 75089 | cs |
| dc.description.source | Web of Science | cs |
| dc.description.volume | 7 | cs |
| dc.identifier.citation | IEEE Access. 2019, vol. 7, p. 75081-75089. | cs |
| dc.identifier.doi | 10.1109/ACCESS.2019.2920663 | |
| dc.identifier.issn | 2169-3536 | |
| dc.identifier.uri | http://hdl.handle.net/10084/138509 | |
| dc.identifier.wos | 000473188800001 | |
| dc.language.iso | en | cs |
| dc.publisher | IEEE | cs |
| dc.relation.ispartofseries | IEEE Access | cs |
| dc.relation.uri | http://doi.org/10.1109/ACCESS.2019.2920663 | cs |
| dc.rights | Copyright © 2019, IEEE | cs |
| dc.rights.access | openAccess | cs |
| dc.subject | convolutional neural network | cs |
| dc.subject | deep learning | cs |
| dc.subject | source coding | cs |
| dc.subject | speech codecs | cs |
| dc.title | DeepVoCoder: A CNN model for compression and coding of narrow band speech | cs |
| dc.type | article | cs |
| dc.type.status | Peer-reviewed | cs |
| dc.type.version | publishedVersion | cs |
Files
Collections
Publikační činnost VŠB-TUO ve Web of Science / Publications of VŠB-TUO in Web of Science
OpenAIRE
Publikační činnost IT4Innovations / Publications of IT4Innovations (9600)
Publikační činnost Katedry telekomunikačních technologií / Publications of Department of Telecommunications (440)
Články z časopisů s impakt faktorem / Articles from Impact Factor Journals
OpenAIRE
Publikační činnost IT4Innovations / Publications of IT4Innovations (9600)
Publikační činnost Katedry telekomunikačních technologií / Publications of Department of Telecommunications (440)
Články z časopisů s impakt faktorem / Articles from Impact Factor Journals