Technology
8
min read

Speech-to-text name accuracy - solving AI's biggest weakness

Published on
November 1, 2025
by
Vincent Theeten

When AI gets your name wrong

You've probably been there.

You spell out your name on a call slowly, clearly, the way you've practised over the years.

"Stephan. That's S-T-E-P-H-A-N."

The bot thanks you politely.

Then it stores your name as Stefan.

That's the moment everything breaks.

The follow-up email never arrives.

Your support ticket can't be found.

The sales team never calls you back – because they never saw your name at all.

And it happens more than you'd think.

The problem with names in speech

Most voice systems today use general-purpose Speech-to-Text (STT) tools. They're fine for natural conversations, but they struggle when someone starts spelling. Especially with names, where even one wrong letter changes everything.

Here's where they break down:

- Tiny variations that completely change identity (Stephan vs Stefan)

- Long or compound last names (Van Den Broeck vs Vanden Brouck)

- Common confusions like B vs P, M vs N, V vs F

- Email addresses where "at" is skipped or "dot" becomes "dog"

A common workaround in other countries is using the NATO or military alphabet – "B as in Bravo," "F as in Foxtrot."

My last name is Desmet. That's D as in Delta, E as in Echo, S as in Sierra, M as in Mike, E as in Echo, T as in Tango.

But here in Belgium, almost no one does that. It's lengthy, cumbersome, unfamiliar, unnatural, and simply wouldn't work. So we don't force it.

Instead, when our voice agent isn't sure, it might just ask:

"Is that Stephan with P-H, or just an F?"

Because sometimes, spelling everything out isn't the best path – asking the right follow-up is.

Why we're doing this at Ringtime

At Ringtime, we're building voice agents that feel human. Not just in tone, but in how well they listen. That means being able to get your name and email right on the first try, or asking for clarification in a way that feels natural, not robotic.

Because when names or emails are wrong:

- Leads disappear

- Tickets get stuck

- Automation fails

- Users lose patience

It's a small but critical part of the flow, and solving it fits right into our mission of building voice tech that actually works in the real world.

Introducing our simple demo voice agent

Throughout this series, we'll use a real-world example to illustrate exactly why names and emails matter so much.

We're building a simple voice agent for a solar panel installation company. Its only job is capturing accurate customer contact information from inbound phone leads:

"Hi there! Thanks for calling SolarTech. Our team is currently unavailable, but we'd love to follow up. Can I have your full name and email address?"

Sounds easy? It's not.


If the system mishears or misinterprets even slightly, the lead disappears. We'll test this scenario repeatedly against standard speech-to-text platforms, clearly illustrating where they fall short and how our approach makes a difference.

Where things usually go wrong

Here's what we see again and again in real-world voice inputs:

- "Eveline" can also be "Evelien" or "Evelyne"

- "Stephane" is interpreted as "Stefan""Van Den Broeck" gets compacted or respelled as "Vanden Brouck" or "Vandenbrook"

- "vincent dot theeten at gmail dot com" turns into vincent.theeten.gmail.com – no "at" in sight

- Letters like B and P, M and N, F and V are often mistaken for each other

These aren't always "mistakes" in the traditional sense. Names like Evelien and Evelyne are perfectly valid too. The issue is that spoken names are often phonetically ambiguous. Without the right context or asking for clarification, the system just makes a best guess. And often, it guesses wrong.

For high-stakes inputs like names and emails, "close enough" isn't good enough.

Try this sample input:

"My name is Eveline Vanden Brouck – E-V-E-L-I-N-E V-A-N-D-E-N B-R-O-U-C-K at gmail dot com"

- Raw STT result: evelimevandenbrook.gmail.com

- Corrected version: eveline.vandenbrook@gmail.com

Only a few letters off. But the result? Completely unusable.

Not replacing speech-to-text, just making it work

We're not rebuilding STT from scratch. The models out there are powerful – but they're general-purpose. What we've built is a layer that comes after.

It does what no off-the-shelf tool does today:

- Understands the context behind names and emails.

- Handles the mistakes that matter.

- Decides when to ask the user for clarification instead of guessing.

We're not trying to reinvent the wheel. We're just making it roll properly.


Why we're starting with names and emails?

They're short. They're simple (on paper). But they're often the first thing a user says, and the easiest to get wrong. They also happen to be high-stakes: You can't afford a misspelling in an email. You can't assume a name.

We're starting here – because this is where trust is won or lost.

This goes way beyond 🇧🇪 Belgium

Belgium is a great place to start: it's messy. You've got Dutch, French, and English. Hybrids, spacing, silent letters and pronunciation chaos.

But this problem is global. It shows up in every country where people spell out their names over the phone, where emails get dictated, where letters sound just a little too close.

We're starting here because it's the hardest test case. But we're building with scale in mind.

What's coming in the series

n the next posts, we'll share:

- How we're testing a mix of STT engines and voice agent platforms – from open models to commercial tools – to see how well they handle names and emails.

- How well they handle names and emails out of the box.

- What we measure (correct name identification, email accuracy, spelling consistency). 

Final thoughts

Speech-to-Text is impressive. But it still misses on something as basic as understanding a name or recording an email address accurately. That's what we're focused on.

Getting the first step right, because it sets the tone for everything that follows.