Glossary/Lexical diversity

Lexical diversity

Lexical diversity is the ratio of unique words to total words in a piece of writing. Higher diversity signals a wider vocabulary and more idiosyncratic voice; lower diversity signals repetition and a flatter register.

The metric is computed as types divided by tokens — how many distinct words a writer uses versus how many words they wrote in total. A writer with diversity around 0.45 reuses common words; a writer at 0.70 reaches for less common alternatives. Generic AI generation defaults to mid-range diversity (around 0.50) regardless of the brand.

Voice fingerprints capture lexical diversity as one of several quantitative signals about a writer’s voice. The number alone isn’t enough — high diversity that comes from jargon-stuffing reads worse than moderate diversity grounded in clear language — but as one of several signals it helps the generator avoid the AI-flat-vocabulary default.

Why it matters

Vocabulary range is a voice signal readers register without naming. A brand whose AI output collapses to a smaller vocabulary than the founder normally uses will feel "off" even when individual sentences are fine.