FitTech Insider

AI and Audio (3): Is voice the new blood, Dagmar Schuller?

The algorithm only needs to listen for a few seconds and it already knows how I'm feeling. Audio intelligence can help in the early detection of diseases. But, the software still sometimes lacks crucial learning material: the voices of thousands of test subjects.


Klaus W. was in his mid-60s when he received the bad news. He had lost his glasses, his car keys and even forgotten his grandson's name many times over the years. The diagnosis: Alzeihmers. Klaus was now one of almost 50 million people worldwide and more than 1.6 million people in Germany. Every year, about 200,000 new cases are added in the country. "The risk of developing dementia increases up to the highest age. Among the over 90-year-olds, one in two is already suffering from dementia," says medical doctor and scientist Steffi Riedel-Heller.


Series “Earware": How does audio tech help in fitness and health?
This text is part one of our series on the question: How can technology help us live healthier lives via our sense of hearing or voice?

Part 1 - Audio bio-feedback: How audio interfaces make us more aware of the body
Part 2 - Audio tech as therapy: AI-composed music helps you sleep, relax or focus
Part 3 - Audio diagnostics: As I speak, health AI makes a diagnosis


Because there are drugs for Alzheimer's that slow the dementia process, detecting the disease at an early stage is vital. Technology can help with this. Artificial intelligence (AI) can detect diseases such as Alzheimer's or Parkinson's at an early stage through speech recognition.

Listening AI: human speech becomes a biomarker

Dagmar Schuller co-founded audEERING in Gilching, Bavaria, end of 2012. The TU Munich spin-off is the only European company driving innovation in the field of intelligent audio analysis, voice biomarkers and emotional artificial intelligence. The technology uses innovative machine intelligence and deep learning techniques to recognise human emotions, personality traits and health conditions from the audio signal. One field of applied AI in this case is Emotion AI. This is a special science in the area of Affective Computing that deals with the analysis and development of methods, systems and devices that recognise and interpret human affects and physical characteristics.

It is an AI speciality that detects emotional fluctuations, which can be part of symptoms shown exemplary in neurodegenerative diseases such as Alzheimer’s and generally influence the treatment of diseases. For psychological diseases such as burnout or depression it is a vital aspect to focus on. Overall, detecting voice biomarkers from human audio signals can be mapped to different diseases as they often indicate the state of health. „Particularly in the area of neurocognitive and neurodegenerative diseases such as Parkinson's, Alzheimer's, autism or mental illnesses such as burnout or depression, essential insights can be gained quickly and without invasive interventions“ as Dagmar Schuller wrote in a report for Bitkom. Speech analysis with AI methods makes speech disorders, mood swings and stress states in the voice visible, partially already at an early stage. „This makes voice an important biomarker in medicine and, in combination with other features or sensor data, can lead to new insights and improved diagnostic and therapeutic options.“

About Dagmar Schuller
Dagmar Schuller about AI diagnostics: "Especially in the medical field, we expect big leaps." (@IHK; Goran Gajanin)

Dagmar Schuller is CEO and co-founder of audEERING and responsible for the strategy, business development and operations. Schuller studied Economics and Business Administration at Wirtschaftsuniversität Vienna and New York University (L. Stern School of Business) focusing on International Management & Marketing, Finance and Information Technology as well as Law at Ludwig-Maximilians-Universität Munich with emphasis on IP/IT Law.

Back at the Technical University of Munich, the audEERING founders developed an open source feature extraction software called openSMILE, which is now considered the standard tool for emotion recognition from the audio signal. This toolkit enables the automatic extraction of features from audio signals, such as pitch or rhythm, and the classification of speech and music signals. Based on a few seconds of audio material, the technology can recognise over 6,000 features and about 50 emotion classes even in small data sets. "Intelligent speech analysis can automatically detect motoric deficits in the articulation muscles with good accuracy and also cognitive issues, impacting the control of those muscles," explains Schuller. This accumulated knowledge enables the algorithm "to determine the probability of the patient being affected by a certain disease based on previous experience."

Audio AI: needs voice samples to learn to listen

Innovative? Very. But is it sufficiently simple? To make good diagnoses, the algorithm needs learning material in the form of new audio data sets. It's a challenge that has experts all over the world buzzing.

  • Scientists, such as Isabel Trancao of the University of Lisbon, are counting on people's acceptance: "My vision is that collecting speech samples will become as common as a blood test."
  • "If you get 10,000 samples and a computer, you can be much more accurate," says Reza Ghomi, a neuropsychiatrist at US-based EvergreenHealth who conducts research on the same topic.
  • Israeli start-up Vocalis Health recently partnered with the Defence Ministry to call on the public to send in speech samples.
  • "We don't want to validate a speech model with just 300 patients," explains Jim Schwoebel of Boston (Massachusetts)-based speech analysis company Sonde Health, "We need 10,000 or more."


Early detection via voice analysis: these companies are also working on it
Winterlight Labs: “Monitoring cognitive impairment through speech”
Aural Analystics: “Clinical-grade speech analytics”
Peak Profiling: “Understanding mental and physical needs using advanced sound analytics”


Audio AI in the pandemic: Covid-19 should also be able to recognise the software

So, an important part of the data-collection work is integrating the technology into people's everyday lives. Before the Covid-19 epidemic, Israeli company Vocalis Health developed a smartphone app that detects chronic lung disease based on voice analysis. This translated into a new use in the pandemic: people with mild symptoms speak into their smartphones by counting from 50 to 70. The app translates the speech into a spectrogram, compares this image with those of positively tested patients and then determines the probability of suffering from Covid - at least that's the company's promise.



Audio AI: listening diagnostic AI soon to be standard

Where is attentive listening diagnostics AI heading? "Although some experts still believe that we are only at the beginning, as many studies are still small and preliminary, Dagmar Schuller predicts: "Especially in the medical field, we expect big leaps." Her vision of tomorrow: "Intelligent audio analysis will be part of the standard repertoire of medical diagnostics in the not-too-distant future. I am convinced that voice is the new blood." Schuller's latest project confirms her thesis: With the AI Soundlab app, audEERING has developed an application that uses intelligent speech and sound analysis to indicate anomalies in breathing sounds as well as changes in speech/voice production, contributing to early detection and risk assessment of COVID-19. Right at the beginning of the Corona pandemic, the company published a scientific paper together with the University of Tokyo, the University of Augsburg and doctors from Wuhan describing which symptoms are reliably detected by AI-based audio analysis. The work then served as the basis for the app development.

The advantage of the app: By combining COVID detection based on voice production on the one hand and the detection of sound events on the other, the app enables improved detection performance and validation. Thus, the app can conclude with high probability the presence of COVID-19 disease based on the voice and certain sounds such as coughing or wheezing.

"The first step now is to use the platform for extended data collection to validate the models and increase recognition performance. Therefore, audEERING is actively calling for data donors." Once all the necessary data points are collected, the models will be re-trained and go into a clinical trial with an already selected clinical partner and will then be medically certified. "Once we have reached the necessary number of data points, we will be able to make the diagnostic app available to the general public 6-8 weeks later. Surely everyone will be eager to hear progress on this.


Made on
Tilda