Feeling Sad, Excited Or Bored? This Startup Claims Its AI Can (Mostly) Tell

Alan Cowen feigns a dejected expression. “My dog died this morning,” he says, speaking to an AI model from startup Hume that claims to detect more than 24 distinct emotional expressions lacing a person’s voice — from nostalgia to awkwardness to anxiety — and respond to them accordingly.

“I’m so sorry to hear about your loss. Losing a pet is never easy,” the AI responded, in the voice of Matt Forte, Hume’s creative producer, tinged with sympathy and disappointment.

A former Google researcher, Cowen founded Hume in 2021 to build “emotionally intelligent” conversational AI that can interpret emotions based on how people are speaking and generate an appropriate response. Since then, over 1,000 developers and 1,000 companies including SoftBank and Lawyer.com have used Hume’s API to build AI-based applications that can pick up on and measure a vast range of emotional signals in human speech through aspects like the rhythm, tone and timbre of the voice as well as sighs, “umms” and “ahhs.”

“The future of AI interfaces is going to be voice-based because the voice is four times faster than typing and carries twice as much information,” Cowen told Forbes. “But in order to take advantage of that you really need a conversational interface that captures more than just language.”

The New York-based startup announced Wednesday that it has raised $50 million in a series B funding round led by Swedish investment firm EQT Ventures with Union Square Ventures and angel investors Nat Friedman and Daniel Gross participating. The influx of new funding values the startup at $219 million.

The company also announced the launch of “Hume EVI,” a conversational voice API that developers integrate into existing products or build upon to create apps that can detect expressional nuances in audio and text and produce “emotionally attuned” outputs by adjusting the words and tone of the AI. For instance, if the AI picks up on sadness and anxiety in the user’s voice, it replies with hints of sympathy and “empathic pain” in its own verbal response.

These empathetic responses aren’t entirely new. When Forbes tested OpenAI’s ChatGPT Plus with the same prompt — “My dog died this morning” — it gave a nearly identical verbal answer to Hume. But the startup aims to distinguish itself on its ability to identify underlying expressions.

To do that, Hume’s in-house large language model and text-to-speech model is trained on data collected from more than a million participants across 30 countries, which includes millions of human interactions and self-reported data from participants reacting to videos and interacting with other participants, Cowen said. The demographic diversity of the database helps the model learn cultural differences and be “explicitly unbiased,” he said. “Our data is less than 30% Caucasian.”

“The future of AI interfaces is going to be voice-based because the voice is four times faster than typing and carries twice as much information.”

Alan Cowen, CEO and founder of Hume

Hume uses its in-house model to interpret emotional tone, but for more complex content it relies on external LLMs, including OpenAI’s GPT 3.5, Anthropic’s Claude 3 Haiku and Microsoft’s Bing Web Search API generates responses within 700 milliseconds. The 33-year-old CEO said Hume’s technology is built to mimic the style and cadence of human conversations and can detect when a person interrupts the AI to stop the conversation as well as knows when it’s its turn to speak. It also occasionally pauses when speaking, and will even chuckle — which is slightly disconcerting to hear coming from a computer.

Even though Hume’s technology seems to be more sophisticated than previous types of emotional detection AI, which relied more on facial expressions, using any kind of AI to detect complex and multidimensional emotional expressions through voice and text is an imperfect science and one that Hume’s AI admits is one of its biggest challenges. Emotional expressions are highly subjective and are influenced by a range of factors including gender and social and cultural norms. Even if the AI is trained on diverse data, using it to interpret human expressions could give biased results, studies have shown.

When asked about the obstacles AI has to overcome to have human-like conversations, the AI said it’s difficult to respond to “the nuances of emotion and context and language.” “It’s a complex task to interpret tone, intent and emotional cues accurately in real time.”

Hume’s AI isn’t always accurate, either. When Forbes tested Hume’s AI, asking it questions like “what should I eat for lunch”, the AI detected “boredom” and five other expressions like “interest” and “determination.”

Cowen, who has published more than 30 research papers on AI and emotion science, said he first realized the need for tools that can detect and measure human expressions in 2015 while advising Facebook on how to make changes to its recommendation algorithms that would prioritize people’s well-being.

Hume’s AI has been integrated into applications in industries like health and wellness, customer service and robotics, Cowen said. For instance, online attorney directory Lawyer.com is using Hume’s AI to measure the quality of their customer service calls and train their agents.

In the healthcare and wellness space, the use cases are more nascent. Stephen Heisig, a research scientist at Icahn School of Medicine, the medical school for New York-based Mount Sinai Health System, said he’s using Hume’s expression AI models to track mental health conditions like depression and borderline personality disorder for patients in an experimental study called “deep brain stimulation,” a treatment in which patients have electrodes implanted inside their brain. (The study only accepts patients for whom no other treatments or therapies have worked, he said.) Hume’s AI models are used to help detect how patients are feeling and whether the treatment is working on a day-to-day basis. Heisig said Hume’s AI can be used by psychiatrists to give them more context on emotions that may not be easy to detect.

“The patients we have in the DBS study, they do two video diaries a day. They have sessions with the psychologist and psychiatrist, and we record those, and we use Hume’s models to characterize facial expression and vocal prosody,” Heisig told Forbes.

Hume’s models have also been integrated into Dot, a productivity chatbot that helps people plan and reflect on their day. Samantha Whitmore, cofounder of New Computer, an OpenAI-backed early stage startup that’s building the chatbot, said that Hume’s AI offers “expanded context” on how a person is feeling.

“If it detects levels of stress or frustration, it might say ‘it sounds like there’s a lot on your plate, should we try to figure out how to make this seem more manageable,’” she said. “It helps meet them where they are in their state of mind.”

MORE FROM FORBES:

MORE FROM FORBESWhy $4.6 Billion Health Records Giant Epic Is Betting Big On Generative AIMORE FROM FORBESAI Investors Are Wooing Startups With Massive Computing ClustersMORE FROM FORBESMeet The Bryan Johnson-Approved AI Full Body MRI Startup That Just Raised $21 MillionMORE FROM FORBESIs Your Tinder Date Actually An AI Chat Bot? Maybe.

Previous post No force can stop the pace of China’s tech advance
Next post Bitcoin price falters as macroeconomic and regulatory headwinds mount