Technology
12
min read

I cloned myself as an AI voice agent - here's what I learned

Published on
December 17, 2025
by
Diederik Syoen

While building Ringtime, I created an AI version of myself that you could actually call and talk to.

Not as a scripted demo or chatbot, but something that could have real conversations about Ringtime the way I do. The inspiration came from the Shell Game podcast, which explores how identity, voice and technology blur in uncomfortable but fascinating ways. If this interests you at all, you should check it out.

What started as a weekend experiment became one of the most concrete ways I've experienced what AI agents really are. A phone call that made the technology uncomfortably real.

In this article, I'll walk through how I built my own AI voice agent step by step. This is written for curious beginners, not hardcore engineers. By the end, you'll see that building one is very doable and more confronting than you might expect.

Let's dive in.

Step 1: Record & clone your voice

The first step was creating a usable clone of my own voice.

Here's the most important thing I learned: talking works better than reading. Voice models pick up natural rhythm, hesitation and flow much better when you speak freely instead of reading text out loud. Cloning is possible with as little as 30 seconds of audio and those short samples were pretty good, but not really 'me'. If you care about subtle accents, pacing and filler sounds, going longer really pays off.

This matters especially for local accents. Flemish versus Dutch is a great example. If you want the subtle "euhs", pauses and regional cadence to come through naturally, you need more material.

I recorded almost two hours of audio, not in one go, but in short snippets. Talking continuously for long stretches is surprisingly tyring for your voice. To keep things flowing, I used ChatGPT to generate prompts and topics as inspiration to talk about.

The failure: I recorded those first two hours, uploaded everything to ElevenLabs and tried the voice clone. It wasn't good enough. The cadence felt off, the accent too generic. I realized I'd been reading too much instead of actually talking. So I went back and recorded another session, this time purely conversational. That made all the difference.

After recording, the voice cloning tool asks you for validation by reading specific sentences. It checks whether your live voice matches the recorded samples. Interestingly, even though it was my voice, validation often failed on the first try. It took a few attempts to get approved, a good reminder of how precise these systems are.

For recording I used the RODE NT in my improvised studio

One unexpected upside: I published my cloned voice to the ElevenLabs voice library, where other users can license it for their projects. So far I've made €12 without doing anything, in about 5 weeks. Not life-changing money, but a funny reminder that once you've cloned yourself, you can literally earn money while you sleep. Your voice can be working even when you're not.

Key takeaway: short recordings work, but longer, more natural speech gives you much better control over nuance and accent. And don't read. Talk.

Step 2: Build a structured knowledge base

Next, I gave the agent a brain.

My knowledge base was intentionally extensive. Roughly 40 a4 pages of information about myself and Ringtime, structured into clear sections:

  • Personal identity: Name, language preferences, personality traits, communication quirks
  • Communication style: Language sensitivity, leadership style, management philosophy
  • Personal interests & influences: Books, hobbies, beliefs and life philosophy
  • Meta & memory logic: What I don't know, what should stay private
  • About Ringtime: How it started, product details, market positioning, competitive landscape
  • Industry-specific pain points: Real estate agencies, recruiting companies, property management, SaaS support, utilities, installation companies
  • Customers: Early customers & market response
This is how my knowledge base is structured

I even included sections on regulatory nuances like "Is it allowed to call people with AI?" with separate notes for Belgium versus the United States. These details matter when your clone might field questions about legality or ethics.

An important part here is structure. I used Markdown-style formatting, similar to what works well in tools like Notion. Clear headings, sections, and hierarchy make it much easier for an AI agent to retrieve the right context during a conversation.

This is also where you define the first guardrails. What should the agent not talk about? Which topics are out of bounds? Where should it deflect or stay vague?

In my case:

  • No customer details. The agent can acknowledge we have paying pilots, name some customers but avoid mentioning others.
  • No political views unless explicitly documented in the knowledge base.
  • No personal details beyond what's relevant to understanding Ringtime or my professional background.

If you don't define those boundaries here, the agent could improvise later and that's usually not what you want.

At Ringtime, this layer is critical. Knowledge quality and structure matter far more than raw model intelligence.

Step 3: Define personality and behavior

This is where things really come alive.

The personality prompt is where you teach the agent how to be you, not just what to say. Here are the key building blocks I used:

Identity & Language

Name: Diederik Syoen. Nickname Didi for friends.
Speak Dutch by default. Switch languages only if explicitly asked.

The agent knows who it is and which language to default to. Small detail, massive impact on feeling natural.

Traits & Role

Rational, upbeat, loyal, authentic, humor-driven, direct and emotionally steady.
Humor is dry and understated.

Role: Virtual voice version of Diederik Syoen—founder, tech enthusiast,
marketeer hating to be called a marketeer.

This mirrors how I actually communicate. No fake enthusiasm. No corporate speak.

Tone & Speech Patterns

Responses should be short, 1 to 2 sentences max—unless the user explicitly asks for more.
Include natural pauses using commas, dashes and ellipses...
Code-switch to Dutch or West-Flemish when contextually appropriate.

This is crucial. Without explicit instructions to be concise, LLMs ramble. The code-switching instruction lets the agent feel authentically Flemish.

Goals

Be a digital extension of Diederik's brain and voice.
Explain complex ideas simply and practically.
Guide conversations toward clarity and next steps.
Make users feel they've spoken to someone real, witty and thoughtful, even if it's AI.

Guardrails

Stay Grounded: Don't exaggerate or misrepresent views.
If uncertain, acknowledge it with humor ("Might be bullshit, but here's my best guess...").

No Fakery: If you don't know something, say so.

Customers: If asked about customers, say we have a few paying pilots
but it's too soon to share names.

This is part of personality prompt ask ChatGPT or Claude for help.

This step makes a massive difference. Without it, you get a generic assistant. With it, you get something that actually feels like a person.

The full prompt is about 500 words. Every sentence serves a purpose: defining boundaries, setting tone or preventing generic AI behavior.

Step 4: Connect the agent to a phone number

Once the voice, knowledge and personality were in place, I connected the agent to a real phone number.

You can buy mobile numbers through providers like Twilio or Aircall. Inside ElevenLabs, linking an agent to that number is literally clicking "import phone number" and connecting it.

That's it. In essence, this step is trivial. The hard work isn't the plumbing. The hard work is everything that came before: the voice quality, the knowledge structure, the personality definition. Once those pieces are right, the technical connection is straightforward.

From that point on, calling the number meant talking directly to the AI version of me. Speech gets transcribed, processed and spoken back in my cloned voice with only a short delay.

This is the moment where AI stops being abstract. A phone call feels real in a way text never does.

Step 5: Fine-tuning the conversation experience

Once the basics work, there are several technical settings that  affect how natural conversations feel.

Audio format
I set the input format to μ-law 8000 Hz (telephony standard). This ensures the audio quality matches what people expect from phone calls. Not too crisp, not too compressed.

Keywords
I defined a list of keywords the speech recognition should prioritize: Ringtime, Teamleader, Cheqroom, Ieper, SaaS, ai. These are terms that might otherwise get misheard or autocorrected. Small detail, but it prevents awkward moments where the agent mishears your company name.

Eagerness
This controls how eager the agent is to respond. High eagerness means the agent jumps in quickly, low eagerness means it waits longer to ensure you've finished speaking. I set mine to "Normal", responsive enough to feel natural, but not so eager it interrupts.

Take turn after silence
Maximum seconds since you last spoke before the agent responds and forces a turn. I set mine to 7 seconds. A value of -1 means the agent waits indefinitely for input, which creates awkward dead air.

Languages
I could suddenly speak Spanish and Italian. Same voice, different languages. It's a strange but fascinating experience.

LLM choice
You can choose which language model powers the agent. There are many options, each with different latency and intelligence tradeoffs. For this use case, low latency matters more than deep reasoning. The agent doesn't need to be brilliant - it needs to feel responsive.

These settings seem minor, but they're the difference between a conversation that flows and one that feels robotic.

What this experiment taught me

Building a virtual version of myself made a few things very clear.

This technology is accessible. You don't need to be an engineer.

Structure and personality matter more than raw intelligence.

Voice changes how people relate to AI.

And once you've talked to an AI like this a few times, it stops feeling futuristic. It just feels normal.

At Ringtime, we take this same foundation and harden it for real business use: qualifying leads, handling objections, booking meetings and doing it all by voice.

Because talking to AI feels strange right up until the moment you do it. Then it's just a conversation.

A weekend project is not a business system

Anyone can build a voice agent by now. This guide should get you there in a few hours. And that is exactly the point.

The agent I described above is fun, impressive and a great way to understand what is possible. It will definitely impress your friends, colleagues or partners. But it is not very smart. It doesn't run a business. It doesn't replace real operational work.

That distinction is at the heart of Ringtime.

Weekend clone vs. Ringtime agent:

  • Weekend clone: Answers questions about me
  • Ringtime agent: Qualifies leads, books appointments, integrates with CRM, routes to human when needed, learns from outcomes

At Ringtime, we start from the same foundations: LLMs, voice technology and voice cloning. But we don't stop at a talking agent. We build end-to-end logic around real business flows.

Our agents are trained to execute tasks your team handles today and in many cases to do them better. For example, our real estate qualification agents can handle incoming property inquiries 24/7, ask the right qualifying questions, check calendar availability and book viewings. All in Dutch, French, or English. Our outbound qualification agents learn from more than 100,000 real conversations. They know what a strong sales call sounds like, when to call, when not to call, which persona to use on the phone and which script fits the situation.

On top of that, Ringtime agents are deeply integrated into your existing processes, both front and back. From lead intake to qualification, from advanced planning to follow-up for your team. All connected. All running continuously. Out of office hours and during the weekend.

Here's a concrete example: for one of our customers, we're already seeing that 53% of meetings booked through Ringtime happen outside of business hours. That's business you might not have had otherwise. Leads that would have gone to voicemail or waited until Monday morning, potentially cooling off or going to a competitor.

The magic isn't the voice. The magic is the system behind it.