Still, before this can be exploited on a larger scale, richly annotated data sets will have to be created: at present, databases provide only one or a few speaker characteristics in parallel. Self-learning and self-improvement in the iHEARu project will not be limited to iterative data collection.
Rather, iHEARu will consider self-optimising feature extraction and self-organising classifiers: The whole process of speaker characteristics learning and analysis shall be self-optimising, as depicted in the flow chart above. For realising these ambitious goals, deep learning combined with neuroevolutionary methods and nonparametric Bayesian learning will play an essential role. This provides promising means for creating self-optimising statistical models and hierarchical input representations with very little amount of supervision.
The iHEARu project approaches the acoustic feature generation and selection issue by trying to understand human reasoning in challenging conditions, from very low SNR, application of voice conversion algorithms, and speech compression, all the way to deliberate faking of voice or speaker states by the subjects. As a consequence, the iHEARu project will not only address environmental technical robustness, but more importantly also robustness against fraud.
To automatically obtain robust speech detection and segmentation into meaningful units, the iHEARu project aims to improve all of the pre-processing algorithms including speech separation, noise reduction, voice activity detection, and segmentation in a loop with the subsequent analysis algorithms and the confidence scores given by these cf.
Further, dealing with real-life data also means coping with various transmission channels. The iHEARu project addresses the automatic recognition of speaker attributes and speaking styles that can be clearly identified by humans. However, the iHEARu approach to universal analysis is not to simply define more and more new recognition tasks that are chosen 'ad hoc'; conversely, it is aimed at developing data-driven methods for a framework which is able to automatically identify characteristics of interest by looking at crowd-sourced resources, such as tag collections, opinions in textual comments, or explicitly collected annotations from paid click-workers.
Baird, S. Jorgensen, E. Parada-Cabaleiro, S. Hantke, N. Cummins, and B. Coutinho, K. Gentsch, J. Scherer, and B. Cummins, A.
Baird, and B. Deng, Z. Zhang, F. Eyben, and B. Deng, X.
Xu, Z. Zhang, S. Deng, S. Zhang, and B. Eyben, G. Salomo, J. Sundberg, K. Freitag, S. Amiriparian, S. Pugachevskiy, N. Marchi, and B. Han, Z. Zhang, N. Cummins, F.
Bjorn Schuller (Author of Computational Paralinguistics)
Ringeval, and B. Hantke, A. Abstreiter, N. Hantke, F. Weninger, R. Kurle, F. Ringeval, A.
Batliner, A. Mousa, and B. Keren, N. Marchi, F. Vesperini, S. Squartini, and B. Martinelli, F. Ringeval, B. Schuller, and C. Mosciano, A. Mencattini, F. Schuller, E.
Om Computational Paralinguistics
Martinelli, and C. Parada-Cabaleiro, A. Parada-Cabaleiro, G. Costantini, A. Batliner, M. Schmitt, and B. Qian, Z. Zhang, A.
Qian, C. Janott, V. Pandit, Z. Zhang, C. Heiser, W. Hohenhorst, M.
Join Kobo & start eReading today
Herzog, W. Hemmert, and B. Ringeval, F. Eyben, E. Kroupi, A. Yuce, J. Thiran, T.
Ebrahimi, D. Lalanne, and B. Sagha, N.