research-article
Authors: Thomas Haubner, Andreas Brendel, and Walter Kellermann
IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume 32
Pages 227 - 238
Published: 19 October 2023 Publication History
- 0citation
- 6
- Downloads
Metrics
Total Citations0Total Downloads6Last 12 Months6
Last 6 weeks2
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
Get Access
- Get Access
- References
- Media
- Tables
- Share
Abstract
The attenuation of acoustic loudspeaker echoes remains to be one of the open challenges to achieve pleasant full-duplex hands free speech communication. In many modern signal enhancement interfaces, this problem is addressed by a linear acoustic echo canceler which subtracts a loudspeaker echo estimate from the recorded microphone signal. To obtain precise echo estimates, the parameters of the echo canceler, i.e., the filter coefficients, need to be estimated quickly and precisely from the observed loudspeaker and microphone signals. For this a sophisticated adaptation control is required to deal with high-power double-talk and rapidly track time-varying acoustic environments which are often faced with portable devices. In this paper, we address this problem by end-to-end deep learning. In particular, we suggest to infer the step-size for a least mean squares frequency-domain adaptive filter update by a Deep Neural Network (DNN). Two different step-size inference approaches are investigated. On the one hand broadband approaches, which use a single DNN to jointly infer step-sizes for all frequency bands, and on the other hand narrowband methods, which exploit individual DNNs per frequency band. The discussion of benefits and disadvantages of both approaches leads to a novel hybrid approach which shows improved echo cancellation while requiring only small DNN architectures. Furthermore, we investigate the effect of different loss functions, signal feature vectors, and DNN output layer architectures on the echo cancellation performance from which we obtain valuable insights into the general design and functionality of DNN-based adaptation control algorithms.
References
[1]
E. Hänsler and G. Schmidt, Acoustic Echo and Noise Control: A Practical Approach. New York, NY, USA: Wiley, 2004.
[2]
G. Enzner, H. Buchner, A. Favrot, and F. Kuech, “Acoustic echo control,” in Academic Press Library in Signal Processing, vol. 4. Florida, USA: Elsevier, 2014, pp. 807–877.
[3]
K. Sridhar et al., “ICASSP 2021 acoustic echo cancellation challenge: Datasets, testing framework, and results,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process.2021, pp. 151–155.
[4]
R. Cutler et al., “Interspeech 2021 acoustic echo cancellation challenge,” in Proc. Interspeech, 2021, pp. 4748–4752.
[5]
R. Cutler et al., “ICASSP 2022 acoustic echo cancellation challenge,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2022, pp. 9107–9111.
[6]
S. Haykin, Adaptive Filter Theory, 4th ed. Upper Saddle River, Englewood Cliffs, NJ, USA: Prentice Hall, 2002.
[7]
A. Mader, H. Puder, and G. U. Schmidt, “Step-size control for acoustic echo cancellation filters–An overview,” Signal Process., vol. 80, no. 9, pp. 1697–1719, 2000.
[8]
T. Gansler, M. Hansson, C.-J. Ivarsson, and G. Salomonsson, “A double-talk detector based on coherence,” IEEE Trans. Commun., vol. 44, no. 11, pp. 1421–1427, Nov. 1996.
[9]
J. Benesty, D. R. Morgan, and J. H. Cho, “A new class of doubletalk detectors based on cross-correlation,” IEEE Speech Audio Process., vol. 8, no. 2, pp. 168–172, Mar. 2000.
[10]
B. H. Nitsch, “A frequency-selective stepfactor control for an adaptive filter algorithm working in the frequency domain,” Signal Process., vol. 80, no. 9, pp. 1733–1745, Sep. 2000.
[11]
G. Enzner and P. Vary, “Frequency-domain adaptive Kalman filter for acoustic echo control in hands-free telephones,” Signal Process., vol. 86, no. 6, pp. 1140–1156, 2006.
[12]
F. Kuech, E. Mabande, and G. Enzner, “State-space architecture of the partitioned-block-based acoustic echo controller,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2014, pp. 1295–1299.
[13]
J. Benesty, H. Rey, L. Vega, and S. Tressens, “A Nonparametric VSS NLMS Algorithm,” IEEE Signal Process. Lett., vol. 13, no. 10, pp. 581–584, Oct. 2006.
[14]
J.-M. Valin, “On adjusting the learning rate in frequency domain echo cancellation with double-talk,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 3, pp. 1030–1034, Mar. 2007.
[15]
F. Nesta, T. S. Wada, and B. Juang, “Batch-online semi-blind source separation applied to multi-channel acoustic echo cancellation,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 19, no. 3, pp. 583–599, Mar. 2011.
[16]
J. Gunther, “Learning echo paths during continuous double-talk using semi-blind source separation,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 20, no. 2, pp. 646–660, Feb. 2012.
[17]
G. Cheng, L. Liao, H. Chen, and J. Lu, “Semi-blind source separation for nonlinear acoustic echo cancellation,” IEEE Signal Process. Lett., vol. 28, pp. 474–478, 2021.
[18]
F. Yang, G. Enzner, and J. Yang, “Frequency-domain adaptive Kalman filter with fast recovery of abrupt echo-path changes,” IEEE Signal Process. Lett., vol. 24, no. 12, pp. 1778–1782, Dec. 2017.
[19]
T. Haubner, A. Brendel, M. Elminshawi, and W. Kellermann, “Noise-robust adaptation control for supervised acoustic system identification exploiting a noise dictionary,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2021, pp. 945–949.
[20]
T. Haubner, M. M. Halimeh, A. Brendel, and W. Kellermann, “A synergistic Kalman and deep postfiltering approach to acoustic echo cancellation,” in Proc. IEEE 29th Eur. Signal Process. Conf., 2021, pp. 990–994.
[21]
A. Ivry, I. Cohen, and B. Berdugo, “Deep adaptation control for acoustic echo cancellation,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2022, pp. 741–745.
[22]
O. Schwartz and A. Schwartz, “RNN-based step-size estimation for the RLS algorithm with application to acoustic echo cancellation,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2023, pp. 1–5.
[23]
T. Haubner, A. Brendel, and W. Kellerman, “End-to-End deep learning-based adaptation control for frequency-domain adaptive system identification,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2022, pp. 766–770.
[24]
H. Zhang, S. Kandadai, H. Rao, M. Kim, T. Pruthi, and T. Kristjansson, “Deep adaptive AEC: Hybrid of deep learning and adaptive acoustic echo cancellation,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2022, pp. 756–760.
[25]
T. Haubner and W. Kellermann, “Deep learning-based joint control of acoustic echo cancellation, beamforming and postfiltering,” in Proc. IEEE 30th Eur. Signal Process. Conf., 2022, pp. 752–756.
[26]
J. Casebeer, N. J. Bryan, and P. Smaragdis, “Meta-AF: Meta-learning for adaptive filters,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 31, pp. 355–370, 2023.
[27]
J. Wu, J. Casebeer, N. J. Bryan, and P. Smaragdis, “Meta-learning for adaptive filters with higher-order frequency dependencies,” in Proc. IEEE Int. Workshop Acoust. Signal Enhancement, 2022, pp. 1–5.
[28]
D. Yang, F. Jiang, W. Wu, X. Fang, and M. Cao, “Low-complexity acoustic echo cancellation with neural Kalman filtering,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2023, pp. 1–5.
[29]
Y. Zhang, M. Yu, H. Zhang, D. Yu, and D. Wang, “NeuralKalman: A learnable Kalman filter for acoustic echo cancellation,” 2023, arXiv:2301.12363.
[30]
W. Kellermann, “Analysis and design of multirate systems for cancellation of acoustical echoes,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 1988, pp. 2570–2573.
[31]
P. S. Diniz, Adaptive Filtering: Algorithms and Practical Implementation, 4th ed. Berlin, Germany: Springer, 2012.
[32]
Y. Avargel and I. Cohen, “System identification in the short-time fourier transform domain with crossband filtering,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1305–1319, May 2007.
[33]
J. Franzen, E. Seidel, and T. Fingscheidt, “AEC in a netshell: On target and topology choices for FCRN acoustic echo cancellation,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2021, pp. 156–160.
[34]
E. Seidel, J. Franzen, M. Strake, and T. Fingscheidt, “Y2-Net FCRN for acoustic echo and noise suppression,” in Proc. Interspeech, 2021, pp. 4763–4767.
[35]
S. Braun and M. L. Valero, “Task splitting for DNN-based acoustic echo and noise removal,” in Proc. IEEE Int. Workshop Acoust. Signal Enhancement, 2022.
[36]
J. E. Greenberg, P. M. Zurek, and M. Brantley, “Evaluation of feedback-reduction algorithms for hearing aids,” J. Acoust. Soc. Amer., vol. 108, no. 5, pp. 2366–2376, Nov. 2000.
[37]
A. Spriet, S. Doclo, M. Moonen, and J. Wouters, “Feedback control in hearing aids,” in Springer Handbook Speech Process. Berlin, Germany: Springer, 2008, pp. 979–1000.
[38]
M. L. Valero, “Acoustic echo reduction for multiple loudspeakers and microphones: Complexity reduction and convergence enhancement” doctoral thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany, 2019.
[39]
A. Schwarz, C. Hofmann, and W. Kellermann, “Spectral feature-based nonlinear residual echo suppression,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2013, pp. 1–4.
[40]
P. Vary and R. Martin, Digital Speech Transmission Hoboken, NJ, USA: Wiley, 2006.
Digital Library
[41]
H. Dubey et al., “ICASSP 2022 deep noise suppression challenge,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2022, pp. 9271–9275.
[42]
J. Barker, R. Marxer, E. Vincent, and S. Watanabe, “The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines,” in Proc. IEEE Workshop Autom. Speech Recognit. Understanding, 2015, pp. 504–511.
[43]
J. Traer and J. H. McDermott, “Statistics of natural reverberation enable perceptual separation of sound and space,” Proc. Nat. Acad. Sci., vol. 113, no. 48, pp. E7856–E7865, 2016.
[44]
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2015, pp. 5206–5210.
[45]
G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Speech Audio Process., vol. 10, no. 5, pp. 293–302, Jul. 2002.
[46]
B. L. Sturm, “An analysis of the GTZAN music genre dataset,” in Proc. Int. ACM Workshop Music Inf. Retrieval User-Centered Multimodal Strategies, 2012, pp. 7–12.
Digital Library
[47]
LibriVox: Free public domain audiobooks. Accessed: Mar. 27, 2023. [Online]. Available: https://librivox.org
[48]
D. Kingma and J. Ba, “ADAM: A method for stochastic optimization,” 2014, arXiv:1412.6980.
[49]
Wideband Extension to Recommendation P.862 for the Assessment of Wideband Telephone Networks and Speech Codecs, ITU Standard P.862.2, ITU Recommendation, Geneva, Switzerland, Nov. 2007.
[50]
A. Briegleb, T. Haubner, V. Belagiannis, and W. Kellermann, “Localizing spatial information in neural spatiospectral filters,” in Proc. Eur. Signal Process. Conf., 2023, pp. 920–924.
[51]
F. Pedregosa et al., “Scikit-learn: Machine learning in python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
Index Terms
End-to-End Deep Learning-Based Adaptation Control for Linear Acoustic Echo Cancellation
Applied computing
Arts and humanities
Sound and music computing
Computing methodologies
Machine learning
Machine learning approaches
Neural networks
Hardware
Communication hardware, interfaces and storage
Signal processing systems
Information systems
Information retrieval
Specialized information retrieval
Multimedia and multimodal retrieval
Index terms have been assigned to the content through auto-classification.
Recommendations
- State-Space Microphone Array Nonlinear Acoustic Echo Cancellation Using Multi-Microphone Near-End Speech Covariance
Nonlinear acoustic echo cancellation AEC is a highly challenging task in a single-microphone; hence, the AEC technique with a microphone array has also been considered to more effectively reduce the residual echo. However, these algorithms track only a ...
Read More
- Deep Neural Network Based Regression Approach for Acoustic Echo Cancellation
ICMSSP '19: Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing
An acoustic echo canceller (AEC) aims to remove the acoustic echo in the mixture signal received by the near-end microphone. The conventional method uses an adaptive finite impulse response (FIR) filter to identify a room impulse response (RIR)which is ...
Read More
- Deep Learning for Acoustic Echo Cancellation and Active Noise Control
Read More
Comments
Information & Contributors
Information
Published In
IEEE/ACM Transactions on Audio, Speech and Language Processing Volume 32, Issue
2024
2883 pages
ISSN:2329-9290
EISSN:2329-9304
Issue’s Table of Contents
2329-9290 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
Publisher
IEEE Press
Publication History
Published: 19 October 2023
Published inTASLPVolume 32
Qualifiers
- Research-article
Contributors
Other Metrics
View Article Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
Total Citations
6
Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Citations
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Article
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderMedia
Figures
Other
Tables