Keynote Speakers

Barbara Shinn-Cunningham Photo

Barbara Shinn-Cunningham, Boston University

How humans communicate in a noisy world

Download the Presentation Slides [PDF]

Humans have the ability to ignore whatever is irrelevant in a given acoustic scene, including competing voices and speech. Yet, current technologies, be they assistive listening devices or speech recognition systems, often are confounded by interference from nonstationary, amplitude-modulated distracting signals. This talk reviews what we know about why humans still succeed in these cases where machine systems stumble, and discusses how future technologies may be able to incorporate more human-like abilities for processing speech in noisy settings.

Barbara Shinn-Cunningham trained as an electrical engineer (Brown University, Sc.B.; MIT, M.S. and Ph.D.). She is the Director of the Boston University Center for Research in Sensory Communications and Neural Technology, a research center uniting faculty members across four colleges. Her work has been recognized by the Alfred P. Sloan Foundation, the Whitaker Foundation, and the National Security Science and Engineering Faculty Fellows program. She is a Fellow of the Acoustical Society of America (ASA), a Fellow of the American Institute for Medical and Biological Engineers, and a lifetime National Associate of the National Research Council "in recognition of extraordinary service to the National Academies in its role as advisor to the Nation in matters of science." Her research uses behavioral, neuroimaging, and computational methods to understand auditory attention and learning, a topic on which she lectures at conferences and symposia around the world.


Christine Evers Photo

Christine Evers, Imperial College London

Bayesian Learning for Robot Audition

Download the Presentation Slides [PDF]

Recent advances in robotics and autonomous systems are rapidly leading to the evolution of machines that assist humans across the industrial, healthcare, and social sectors. For intuitive interaction between humans and robots, spoken language is a fundamental prerequisite. However, in realistic environments, speech signals are typically distorted by reverberation, noise, and interference from competing sound sources. Acoustic signal processing is therefore necessary in order to provide robots with the ability to learn, adapt and react to stimuli in the acoustic environment. The processed, anechoic speech signals are naturally time varying due to fluctuations of air flow in the vocal tract. Furthermore, motion of a human talker’s head and body lead to spatio-temporal variations in the source positions and orientation, and hence time-varying source-sensor geometries. Therefore, in order to listen in realistic, dynamic multi-talker environments, robots need to be equipped with signal processing algorithms that recognize and exploit constructively the spatial, spectral, and temporal variations in the recorded signals. Bayesian inference provides a principled framework for the incorporation of temporal models capturing prior knowledge of physical quantities, such as the acoustic channel or vocal tract. This keynote therefore explores the theory and application of Bayesian learning for robot audition, addressing novel advances in acoustic Simultaneous Localization and Mapping (aSLAM), sound source localization and tracking, and blind speech dereverberation.

Christine Evers is an EPSRC Fellow at Imperial College London. She received her PhD from the University of Edinburgh, UK, in 2010, after having completed her MSc degree in Signal Processing and Communications at the University of Edinburgh in 2006, and BSc degree in Electrical Engineering and Computer Science at Jacobs University Bremen, Germany in 2005. After a position as a research associate at the University of Edinburgh between 2009 and 2010, she worked until 2014 as a senior systems engineer on RADAR tracking systems at Selex ES, Edinburgh, UK. She returned to academia in 2014 as a research associate in the Department of Electrical and Electronic Engineering at Imperial College London, focusing on acoustic scene mapping for robot audition. As of 2017, she is awarded a fellowship by the UK Engineering and Physical Sciences Research Council (EPSRC) to advance her research on acoustic signal processing and scene mapping for socially assistive robots. Her research focuses on Bayesian inference for speech and audio applications in dynamic environments, including acoustic simultaneous localization and mapping, sound source localization and tracking, blind speech dereverberation, and sensor fusion. She is an IEEE Senior Member and a member of the IEEE Signal Processing Society Technical Committee on Audio and Acoustic Signal Processing.


Paris Smaragdis Photo

Paris Smaragdis, University of Illinois–Urbana Champaign

Title: Striving for computational and physical efficiency in speech enhancement

Download the Presentation Slides [PDF]

As commonplace speech-enabled devices are getting smaller and lighter, we are faced with a need for simpler processing and simpler hardware. In this talk I will present some alternative ways to approach multi-channel and single-channel speech enhancement under these constraints. More specifically, I will talk about new ways to formulate beamforming that are numerically more lightweight, and operate best when using physically compact arrays, and then I will discuss single-channel approaches using a deep network which, in addition to imposing a lightweight computational load, are amenable to aggressive hardware optimizations that can result in massive power savings and reductions in hardware footprint.

Paris Smaragdis is an associate professor at the Computer Science and the Electrical and Computer Engineering departments of the University of Illinois at Urbana-Champaign, as well as a senior research scientist at Adobe Research. He completed his masters, PhD, and postdoctoral studies at MIT, performing research on computational audition. In 2006 he was selected by MIT’s Technology Review as one of the year’s top young technology innovators for his work on machine listening, in 2015 he was elevated to an IEEE Fellow for contributions in audio source separation and audio processing, and during 2016-2017 he is an IEEE Signal Processing Society Distinguished Lecturer. He has authored more than 100 papers on various aspects of audio signal processing, holds more than 40 patents worldwide, and his research has been productized by multiple companies.