Glossary/Voice cloning

Voice cloning

Voice cloning is the use of AI to reproduce a specific voice. The term is ambiguous and means two distinct things: cloning a speaker’s spoken audio voice (for narration, dubbing), or cloning a writer’s written voice (for content).

Audio voice cloning uses neural speech synthesis trained on a sample of someone’s spoken audio to produce new audio in that voice. Tools like ElevenLabs and Resemble fall in this category. The output is sound.

Written voice cloning — sometimes called voice modeling or brand voice AI — uses a language model conditioned on a writer’s past content to produce new text in that writer’s style. The output is text. Marqeting and similar tools fall in this category. Despite the shared "voice cloning" label, the technical approach, the training data, and the use cases are completely different.

Why it matters

The label collision creates real confusion in buying decisions. A founder asking for "voice cloning" sometimes means "make a video reel narrated in my voice" (audio) and sometimes means "make captions that sound like me in writing." Knowing which is being asked is the first sourcing decision.

Marqeting clones written voice, not audio →