Skip to main content

AI video creation tips

Here are some tips to get your avatar's voice and pronunciation exactly as you want it.

AI system that produces our voices is great - but not always perfect. Occasionally it mispronounces words or makes strange pauses. Below are some quick tips to get your avatar's voice just right.

Quick Tips​

  • πŸ”  Use short paragraphs: separate long scripts into smaller paragraphs to avoid errors when generating videos.
  • πŸ“ Spell words correctly: make sure you have used correct spelling in your script.
  • 🚫 Don't mix languages: eg. don't use English words in a Spanish script.
  • πŸ’¬ Insert breaks if needed: you can add additional breaks into the script by inserting break tags - for example . Please see the section below for more info.
  • ✍️ Use punctuation marks: a script without proper use of commas and periods would sound too fast and hard to listen to. Use periods, commas, hyphens, question marks to help our AI system sound as you would want it to sound. More on this below.
  • πŸ—£οΈ Fix pronunciation of words, acronyms and numbers if needed: for example, it's useful to sometimes split words with a hyphen sign to help our AI system pronounce them correctly. So an example would to write "con-tent" instead of normal "content". More tips on this below.

Add additional breaks to the video​

Our voices support a so-called SSML markup language. This markup has quite a few different tags but, for now, the most important one is the ability to instruct the voice to create breaks.

Wherever you want an additional break in your text simply input (2s is an example here, you can specify the time in seconds or milliseconds):

<break time="2s" />

The break can be up to 5 minutes long.

For example, I have the following text:

Hey John! How are you doing today?

Let's say I'm not happy with the default break after "John!". Breaks are especially useful to better separate sentences. I can now simply input the following markup to add a break:

Hey John!<break time="50ms"/>How are you doing today?

Correcting pronunciation​

Pronouncing company names, acronyms, business terms or slang can sometimes be difficult for the AI because they are ambiguous. Getting the pronunciation right is a matter of inserting hyphens or spelling the word phonetically.


Try inserting hyphens to make the word sound like you want. Example:

Content β†’ con-tent

Alternatively, you can also help the system by using the phonetic spelling of the words. You can read more on this below.


If you want them to be pronounced like a word, try spelling it like them as they sound. Examples:

AI β†’ a-eye
AWS β†’ a-"double you"-s

If you want the acronym to be pronounced word by word make space between the letters:

NYC β†’ N Y C


Change how you spell them depending on how you want them to sound. Examples:

Ten eighty-nine β†’ 10 89
Two five eight six β†’ 2 5 8 6
One hundred and forty-eight: β†’ 148

Using punctuation marks​

If you are having issues with the rhythm of the sentence, try adding commas/periods, quotes, or re-arranging the sentence:

  • Commas will add shorter breaks than a period
  • Periods will add a longer break and downwards inflection
  • "Quotes" will add emphasis to that part of the sentence

For example, these two examples will result in different rhythms and pauses:

Here’s a demonstration of how a sentence without any breaks or commas at all compares to a sentence that has as you can see in the video without can be difficult to follow because there are no breaks or pauses in it.


Here’s a demonstration of how a sentence, without any breaks or commas at all, compares to a sentence that has. As you can see, the video can be difficult to follow, because there are no breaks or pauses in it.

Advanced: fix pronunciation by using phonetic spelling​

You can sometimes fix word pronunciation by using their phonetic spelling. Below we've included a handy table to help you replace letters with phonetic alternatives. Example:

Desert β†’ de-zert