Pre-trained representation-driven and multi-domain feature fusion method for anomalous sound detection

Wei, Jingwen; Sun, Hongjun; Li, Chengyang; Wei, Qian; Xing, Kaiwen

doi:10.1007/s11760-025-04666-8

Pre-trained representation-driven and multi-domain feature fusion method for anomalous sound detection

Original Paper
Published: 10 September 2025

Volume 19, article number 1063, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jingwen Wei¹,
Hongjun Sun¹,
Chengyang Li¹,
Qian Wei¹ &
…
Kaiwen Xing¹

189 Accesses
Explore all metrics

Abstract

With the advancement of Industry 4.0 and intelligent manufacturing, there is an increasing demand for enhanced safety and reliability in equipment operation. As a precursor to mechanical failures, abnormal sounds are critical indicators, and their accurate detection plays a vital role in accident prevention and operational efficiency. To address the limitations of existing methods—such as heavy reliance on handcrafted acoustic features and model structures, insufficient detail representation, and poor cross-device robustness—this paper proposes an end-to-end detection framework based on Pre-trained Representation-driven and Multi-domain Feature Fusion (PReMFF). Specifically, we fine-tune a large-scale pre-trained model, Wav2vec 2.0, to extract generalized acoustic features. To further improve performance, we introduce two specialized modules: an adaptive frequency band enhancement module that highlights key frequency components, and a multi-scale dilated causal temporal modeling module that captures long-range dependencies in the time domain. Finally, the three-way features are gated and fused and jointly supervised by the classifier and loss function, achieving excellent performance of 94.29% and 88.88% pAUC on the DCASE 2020 TASK 2 dataset. The MIMII dataset is used to verify its ability to quickly adapt and robustly generalize under new equipment and complex noise, indicating that it provides an efficient and feasible solution for intelligent monitoring of industrial sites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Deep learning-based noise suppression for industrial environments without clean data

Article 23 November 2025

Machine sound anomaly detection based on dual-channel feature fusion variational auto-encoder

Article 29 March 2025

Machine Anomalous Sound Detection Based on Feature Fusion and Gaussian Mixture Model

Data availability

No datasets were generated or analysed during the current study.

References

Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: A framework for self-supervised learning of speech representations. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 12449–12460 (2020). https://doi.org/10.1016/j.ymssp.2015.09.039
Wang, C., Liu, C., Liao, M., Yang, Q.: An enhanced diagnosis method for weak fault features of bearing acoustic emission signal based on compressed sensing. Math. Biosci. Eng. 18, 1670–1688 (2021). https://doi.org/10.1109/ICASSP40776.2020.9054344
Article MATH Google Scholar
Suefusa, K., Nishida, T., Purohit, H., Tanabe, R., Endo, T., Kawaguchi, Y.: Anomalous sound detection based on interpolation deep neural network. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 271–275 (2020). https://doi.org/10.1109/ICASSP40776.2020.9054344
Jiang, A., Zhang, W.-Q., Deng, Y., Fan, P., Liu, J.: Unsupervised anomaly detection and localization of machine audio: A gan-based approach. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). https://doi.org/10.1109/ICASSP49357.2023.10096813
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992–10002 (2021). https://doi.org/10.1109/ICCV48922.2021.00986
Giri, R., Tenneti, S.V., Cheng, F., Helwani, K., Isik, U., Krishnaswamy, A.: Self-supervised classification for detecting anomalous sounds (2020)
Dohi, K., Endo, T., Purohit, H., Tanabe, R., Kawaguchi, Y.: Flow-based self-supervised density estimation for anomalous sound detection. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 336–340 (2021). https://doi.org/10.1109/ICASSP39728.2021.9414662
Liu, Y., Guan, J., Zhu, Q., Wang, W.: Anomalous sound detection using spectral-temporal information fusion. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 816–820 (2022). https://doi.org/10.1109/ICASSP43922.2022.9747868
Guan, J., Xiao, F., Liu, Y., Zhu, Q., Wang, W.: Anomalous sound detection using audio representation with machine id based contrastive learning pretraining. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). https://doi.org/10.1109/ICASSP49357.2023.10096054
Zhang, Y., Liu, J., Tian, Y., Liu, H., Li, M.: A dual-path framework with frequency-and-time excited network for anomalous sound detection. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1266–1270 (2024). https://doi.org/10.1109/ICASSP48485.2024.10448126
Han, B., Lv, Z., Jiang, A., Huang, W., Chen, Z., Deng, Y., Ding, J., Lu, C., Zhang, W.-Q., Fan, P., Liu, J., Qian, Y.: Exploring large scale pre-trained models for robust machine anomalous sound detection. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1326–1330 (2024). https://doi.org/10.1109/ICASSP48485.2024.10447183
Jiang, A., Han, B., Lv, Z., Deng, Y., Zhang, W.-Q., Chen, X., Qian, Y., Liu, J., Fan, P.: Anopatch: Towards better consistency in machine anomalous sound detection. arXiv preprint arXiv:2406.11364 (2024)
Yang, H., Liu, Z., Ma, N., Wang, X., Liu, W., Wang, H., Zhan, D., Hu, Z.: Csrm-mim: A self-supervised pre-training method for detecting catenary support components in electrified railways. IEEE Trans. Transp. Electrif. (2025)
Yan, J., Cheng, Y., Zhang, F., Li, M., Zhou, N., Jin, B., Wang, H., Yang, H., Zhang, W.: Research on multimodal techniques for arc detection in railway systems with limited data. Struct. Health Monit 14759217251336797 (2025)
Chen, S., Liu, Y., Gao, X., Han, Z.: Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In: Zhou, J., Wang, Y., Sun, Z., Jia, Z., Feng, J., Shan, S., Ubul, K., Guo, Z. (eds.) Biometric Recognition, pp. 428–438. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97909-0_46
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. ICLR 1(2), 3 (2022)
Google Scholar
Koizumi, Y., Kawaguchi, Y., Imoto, K., Nakamura, T., Nikaido, Y., Tanabe, R., Purohit, H., Suefusa, K., Endo, T., Yasuda, M., et al.: Description and discussion on dcase2020 challenge task2: Unsupervised anomalous sound detection for machine condition monitoring. arXiv preprint arXiv:2006.05822 (2020) https://doi.org/10.48550/arXiv.2006.05822
Neri, M., Carli, M.: Low-complexity attention-based unsupervised anomalous sound detection exploiting separable convolutions and angular loss. IEEE Sensors Letters 8(11), 1–4 (2024). https://doi.org/10.1109/LSENS.2024.3480450
Article Google Scholar
Wang, Y., Zhang, Q., Zhang, W., Zhang, Y.: A lightweight framework for unsupervised anomalous sound detection based on selective learning of time-frequency domain features. Appl. Acoust. 228, 110308 (2025)
Purohit, H., Tanabe, R., Ichige, K., Endo, T., Nikaido, Y., Suefusa, K., Kawaguchi, Y.: Mimii dataset: Sound dataset for malfunctioning industrial machine investigation and inspection. arXiv preprint arXiv:1909.09347 (2019) https://doi.org/10.48550/arXiv.1909.09347
Chandrakala, S., Pidikiti, A., Sai Mahathi, P.: Spectro temporal fusion with clstm-autoencoder based approach for anomalous sound detection. Neural Process. Lett. 56(1), 39 (2024). https://doi.org/10.1007/s11063-024-11485-4

Download references

Author information

Authors and Affiliations

College of Artificial Intelligence, China University of Petroleum(Beijing), Beijing, 102249, Beijing, China
Jingwen Wei, Hongjun Sun, Chengyang Li, Qian Wei & Kaiwen Xing

Authors

Jingwen Wei
View author publications
Search author on:PubMed Google Scholar
Hongjun Sun
View author publications
Search author on:PubMed Google Scholar
Chengyang Li
View author publications
Search author on:PubMed Google Scholar
Qian Wei
View author publications
Search author on:PubMed Google Scholar
Kaiwen Xing
View author publications
Search author on:PubMed Google Scholar

Contributions

J.W., H.S., and Q.W. conducted research conceptualization and data collection. J.W., C.L., and K.X. performed formal analysis and investigation. J.W. wrote the main manuscript text. J.W., H.S., and C.L. reviewed and edited the manuscript. All authors reviewed the final version.

Corresponding author

Correspondence to Hongjun Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wei, J., Sun, H., Li, C. et al. Pre-trained representation-driven and multi-domain feature fusion method for anomalous sound detection. SIViP 19, 1063 (2025). https://doi.org/10.1007/s11760-025-04666-8

Download citation

Received: 27 May 2025
Revised: 05 August 2025
Accepted: 16 August 2025
Published: 10 September 2025
Version of record: 10 September 2025
DOI: https://doi.org/10.1007/s11760-025-04666-8

Keywords

Profiles

Jingwen Wei View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price includes VAT (Netherlands)

Instant access to the full article PDF.

Institutional subscriptions

Pre-trained representation-driven and multi-domain feature fusion method for anomalous sound detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning-based noise suppression for industrial environments without clean data

Machine sound anomaly detection based on dual-channel feature fusion variational auto-encoder

Machine Anomalous Sound Detection Based on Feature Fusion and Gaussian Mixture Model

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now