Director, Speech Technology & Research Lab., SRI International
20 Dec 2018
Dimitra Vergyri is the Director of the Speech Technology and Research (STAR) Laboratory at SRI International. Her group is leading research projects and is transferring technology to clients, addressing needs for speech processing in noisy environments, speech recognition, speaker and language identification, speech translation, spoken dialog systems, language education, speech analytics, and more. She received her diploma in Electrical and Computer Engineering from the National Technical University of Athens, Greece, in 1993, and Masters and Ph.D degrees from Johns Hopkins University, in 1995 and 2000 respectively. In 2000 she joined SRI International where she worked on multiple research projects and published over 50 papers in the areas of information extraction from speech, voice analysis for emotional and cognitive assessment, speech recognition with sparse training data, multilingual audio search, and machine translation. She had a visiting position in LIMSI/CNRS in 2010 and is serving as the Director of the STAR lab since 2015.
From 2009-2012 she also served as an associate editor for the IEEE Transactions on Audio Speech and Language Processing. She has participated in multiple reviewing panels and conference organizing and technical committees.
Personal Speaker webpage: https://www.sri.com/about/people/dimitra-vergyri
Speech-In-The-Wild Analytics in the Era of Deep Learning: Recent Advancements and Remaining Challenges
During the last decade, deep learning has spread in all areas of speech and language processing and has led to big improvements in the usability of the spoken language technology applications. Nevertheless, unseen and mismatched data conditions remain a challenge for this technology.
In this talk we focus on speech analytics tasks that include speech detection, speaker identification and keyword spotting. More specifically we focus on processing speech “in the wild”, which includes data in natural environments, often mismatched to training conditions, exhibiting intrinsic (speaker) and extrinsic (environmental) variability and degradation due, for example, to distance from microphone, transmission noise, channel effects, lossy encodings and other environmental artifacts. We highlight improvements on such data as the technology moved from using Gaussian Mixture Models (GMMs) to Deep Neural Networks (DNNs) and beyond, and from using noise-robust signal processing and i-vectors to embeddings.
For all described speech classification tasks, it is important to know when to trust the system output, especially in varying, unexpected input conditions and short duration speech signals. Successful model score calibration is a key challenge in using the systems on speech-in-the-wild data. We will review the impact of calibration and discuss recent advancements of calibration algorithms.
The talk covers research contributions from current and past members of the STAR laboratory at SRI International, including Mitchell McLaren, Horacio Franco, Martin Graciarena, Aaron Lawson, Diego Castan, Mahesh Nandwana, Julien Van Hout, Colleen Richey, Luciana Ferrer (UBA-CONICET) and Vikramjit Mitra (Apple/UMD).