Motivation for launching the project by the client: denoise systems, including those based on neural networks, are actively used in various services for audio and video communication. Most of these systems do a good job of suppressing noise mainly in situations with a high desired signal level and low noise. The goal was to build a real-time system capable of removing noise from an audio recording within the specified limits.
What we had initially:
- most of the existing speech enhancement models perform well at high SNRs;
- there are a small number of generally accepted datasets for speech enhancement;
- in addition to well-known metrics such as SDR and PESQ, subjective assessment of the sound quality of the resulting audio recording is also important;
- for the applicability of real-time simulation results, it is important to minimize the lookahead size that is used to predict the current value.
Project goals: Improving the quality of noise reduction models in case of extremely low SNR (signal-to-noise ratio).
MIL Team's solution: improvement of existing solutions and creation of our own models showing high gains in terms of generally accepted metrics for assessing the quality of audio recordings (PESQ, SDR) and speech recognition error (WER) for audio recordings with a high level of noise compared to speech (SNR from -10).
Tools for building the model:
- open datasets of voice recordings with a speech by Voicebank and Librispeech;
- open datasets of audio recordings with noises DEMAND, MUSAN.
The model results: under NDA
Client: under NDA
Technological stack: Python (PyTorch, scipy, librosa)