﻿ 基于稀疏回归深度神经网络的单通道语音增强 Single-Channel Speech Enhancement Based on Sparse Regressive Deep Neural Network

Software Engineering and Applications
Vol.06 No.01(2017), Article ID:19861,12 pages
10.12677/SEA.2017.61002

Single-Channel Speech Enhancement Based on Sparse Regressive Deep Neural Network

Haixia Sun, Sikun Li

National University of Defense Technology (NUDT), Changsha Hunan

Received: Feb. 7th, 2017; accepted: Feb. 25th, 2017; published: Feb. 28th, 2017

ABSTRACT

Speech enhancement is a mean to improve the quality and intelligibility by noise suppression and enhancing the SNR at the same time, which has been widely applied in voice communication equipments. In recent years, Deep Neural Network (DNN) has become a research hot point due to its powerful ability to avoid local optimum, which is superior to the traditional neural network. However, the existed DNN costs storage and has a bad generalization. Now, this document puts forward a sparse regression DNN model to solve the above problems. First, we will take two regularization skills called Dropout and sparsity constraint to strengthen the generalization ability of the model. Obviously, in this way, the model can reach the consistency between the pre-training model and the training model. Then network compression by weights sharing and quantization is taken to reduce storage cost. Next, spectral subtraction is used in post-processing to overcome stationary noise. The result proofs that the improved framework gets a good effect and meets the requirement of the speech processing.

Keywords:Speech Enhancement, DNN, Regularization Technique, Network Compression, Spectral Subtraction

1. 引言

2. 基于深度神经网络学习的语音增强原理

2.1. DBN-DNN网络

2.1.1. 预训练与精细调优

1) 预训练

(1)

(2)

Figure 1. The theory of speech enhancement based on regressive DNN

GRBM的可视层和隐含层的条件概率如下：

(3)

(4)

BBRBM的可视层和隐含层的条件概率如下：

(5)

(6)

BBRBM模型参数梯度公式为：

(7)

Figure 2. The process of Gibbs sample

2) 精细调优：

① 前向传递：将最小批语音特征输入神经网络，将每层的激活值前向传递至输出层，获得基于最小均方误差准则的代价函数：

(8)

② 反馈传导：先计算输出层每个节点的残差，再向前传递获得其他隐含层的残差。然后依据残差求取每层权值偏导数。

③ 修改权值：随机梯度下降算法被用来调整网络权值：

(9)

(10)

2.1.2. 增强阶段

(11)

(12)

2.2. 提升模型泛化能力的正则化技术

(13)

(14)

3. 提升深度神经网络语音增强训练的泛化能力

DNN训练过程模型的泛化能力通常是通过对训练得到的深度神经网络语音增强算法模型，在非匹配测试集上做性能评价，评价结果见第6节，可以看出改进模型获得了更好的语音质量评价指标。

4. 降低深度神经网络存储开销

(a)(b)

Figure 3. (a) RBM training model; (b) Fine-tuning training model

(a) (b)

Figure 4. (a) Reconstruction error of original DNN training process; (b) Reconstruction error of improved DNN training process

5. 后处理去除残留稳态噪声

Wei [14] 将谱减法作为深度神经网络语音增强算法的前置环节，通过训练深度神经网络消除谱减法残留的“音乐噪声”，使得在小数据量训练情况下仍然能获得较好的语音增强效果。该算法注重消除谱减法残留的“音乐噪声”，只是构建了浅层神经网络，未充分利用深度神经网络构建带噪语音和纯净语音之间的非线性关系能力，训练模型泛化能力和噪声鲁棒性较差。另外，由原始回归深度神经网络语音增强算法框架重构的增强语音，残留了一部分的稳态噪声，该算法无法消除此类残留稳态噪声。

Figure 5. Weight sharing and scalar quantization and centroids fine-tuning [9]

Figure 6. Reconstruct error based on compressed DNN

Figure 7. Speech enhancement model based on regressive DNN

6. 实验

6.1. 实验设置

6.1.1. 样本数据处理

6.1.2. 参数设置

6.2. 实验结果

6.2.1. 语音质量评价

6.2.2. 时域波形图

Table 1. The SegSNR of the original noisy and the SegSNR of enhanced speech based on improved model

Table 2. The LSD of the original noisy and the LSD of enhanced speech based on improved model

Table 3. The PESQ of the original noisy and the PESQ of enhanced speech based on improved model

Figure 8. Time waveform of noisy (top left), enhanced speech (top right) and speech (bottom) under −5, 0, 5 SNR

7. 结论

Single-Channel Speech Enhancement Based on Sparse Regressive Deep Neural Network[J]. 软件工程与应用, 2017, 06(01): 8-19. http://dx.doi.org/10.12677/SEA.2017.61002

1. 1. Le, T.T and Mason, J.S. (1996) Artificial Neural Network for Nonlinear Time-Domain Filtering of Speech. IEEE Proceedings on Vision, Image and Signal Processing, 3, 433-438.

2. 2. Mohammadina, N., Smaragdis, P. and Leijon, A. (2013) Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization. IEEE Transactions on Audio, Speech, and Language Processing, 21, 2140-2151. https://doi.org/10.1109/TASL.2013.2270369

3. 3. 时文华, 张雄伟, 张瑞昕, 韩伟. 深度学习理论及其应用专题讲座(四)[J]. 军事通信技术, 2016, 37(3): 98-104.

4. 4. Hinton, G.E., Osindero, S. and The, Y.W. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18, 1527-1554. https://doi.org/10.1162/neco.2006.18.7.1527

5. 5. Dahl, G.E., Yu, D., Deng, L., et al. (2012) Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabu- lary Speech Recognition. IEEE Transactions on Audio Speech & Language Processing, 20, 30-42. https://doi.org/10.1109/TASL.2011.2134090

6. 6. Cireşan, D., Meier, U., Gambardella, L., et al. (2010) Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation, 22, 3207-3220. https://doi.org/10.1162/NECO_a_00052

7. 7. Xu, Y., Du, J., Dai, L.R., et al. (2014) An Experimental Study on Speech Enhancement Based on Deep Neural Networks. IEEE Signal Processing Letters, 21, 65-68. https://doi.org/10.1109/LSP.2013.2291240

8. 8. Vu, T.T., Bigot, B. and Chng, E.S. (2016) Combing Non-Negative Matrix Factorization and Deep Neural Network for Speech Enhancement and Automatic Speech Recognition. In: IEEE International Conference on Acoustic Speech and Signal Processing, IEEE Press, Shanghai, 499-503.

9. 9. Han, S., Mao, H.Z. and Dally, W.J. (2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman coding.

10. 10. Hinton, G E. (2010) A Practical Guide to Training Restricted Boltzmann Machines. Momentum, 9, 599-619.

11. 11. Srivastava, N., Hinton, G., Krizhevsky, A., et al. (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929-1958.

12. 12. Nair, V. and Hinton, G.E. (2009) 3D Object Recognition with Deep Belief Nets. Advances in Neural Information Processing Systems 22: Conference on Neural Information Processing Systems 2009, Vancouver, British Columbia, Canada, 7-10 December 2009, 1527-1554.

13. 13. Phan, K.T., Maul, T.H. and Vu, T.T. (2015) A Parallel Circuit Approach for Improving the Speed and Generalization Properties of Neural Networks. International Conference on Natural Computation, 45, 1-7.

14. 14. 魏泉水. 基于深度神经网络的语音增强算法研究[D]: [硕士学位论文]. 南京: 南京大学，2016.

15. 15. Hu, Y. and Loizou, P.C. (2006) Evaluation of Objective Quality Measures for Speech Enhancement. INTERSPEECH 2006-ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 2006, 229-238. https://doi.org/10.1007/11939993