User Tools

Site Tools


7a.026.TUT - Improving Speech Recognition Robustness

Project - Summary

Speech recognition for general tasks is widely available in many languages by providers such as Google and Microsoft. However, current level technology has not been able to come up with a good quality general speech recognizer. Therefore, a good quality system requires training on case specific datasets for the task at hand. This is not possible with e.g. Google Speech API. Another aspect of this customization is the ease with which it is possible to take additional aspects of the audio into account on subsequent models. Another concern with these cloud services is confidentiality. In some use cases the data cannot be allowed to leave the organization in question. The costs of continued use of the cloud services can also be considerable. The aim of this project is to help to create an in-house solution for speech recognition. A general deep learning -based speech recognizer is trained on open data and other available sources. The general model is used in the creation of better case specific models, which are trained on client data. The resulting models are portable and can be setup in either cloud environments or local servers. The model can easily be combined with or serve as an input for additional ML models, such as sentence classification.

Robustness to noise and other interference in the environment is a crucial feature of a speech recognition system. The system should also be robust to different speakers, especially in public environments, instead of being adapted specifically to each user. The purpose of this research project is to investigate different approaches to make speech recognition systems robust to noisy environments and different speakers. This project will study speech enhancement techniques with next-generation, data-driven approaches.

Project - Team

Team Member Role Email Phone Number Academic Site/IAB
Moncef Gabbouj PI +358 (400) 736613 Tampere University
Okko Rasanen Co-PI Not available Tampere University
Ali Senhaji Student/Researcher Not available Not available Tampere University
Filip Ginter Project Mentors Not available Not available

Project - Novelty of Approach

Existing methods have studied speech recognition using several machine learning techniques, such as Generative Adversarial Networks. In this project, we wish to make these algorithms less prone to noisy environments and insure its robustness to different speakers (other than the user). To this end, we aim to develop advanced machine learning based on our recently developed GOP and POP neural networks.

Project - Deliverables

1 A new Database based on public domain resources for Automatic Speech Recognition task in Finnish language
2 A baseline for the task and determine appropriate metrics for robustness
3 Augment the merged database to expand the engine training domain and to test its ability in generalizing to unseen environments, speakers and samples
4 Propose a novel data augmentation technique to enhance the deep learning approach for Automatic Speech Recognition Systems in general and for Finnish specifically

Project - Benefits to IAB

The proposed approach can be used to train automatic speech recognition systems with limited datasets and computational resources for a specific use case. We have also defined a way to assess robustness of an ASR model.

Project - Documents

FilenameFilesizeLast modified
7a.026.tut_ip_info_sheet.docx112.2 KiB2019/08/20 11:47
7a.026.tut_final_report.pdf754.7 KiB2019/08/20 09:14
7a.026.tut_confluence_project_page.pdf137.4 KiB2019/08/13 15:07
7a.026.tut_cvdi-mid-year-report-asr.pdf328.1 KiB2019/08/13 15:07
projects/year7/7a.026.tut.txt · Last modified: 2019/08/20 10:33 by sally.johnson