Skip to main content
Voice cloning lets you create a synthetic voice modeled on real audio recordings. The output is a custom voice that lives in your workspace and behaves like any catalog voice — you can assign it to agents, set speed and stability, and reference it by its Nexa voice ID. Common use cases: replicating a brand voice recorded by a professional voice actor, creating a voice that matches existing IVR recordings, or building a voice for a specific persona.

Prerequisites

  • An ElevenLabs-backed workspace (voice cloning is ElevenLabs only)
  • Audio samples that meet the quality requirements below
  • Workspace owner or admin role

Audio requirements

The quality of the cloned voice depends entirely on the quality of the input audio. Poor samples produce a voice that sounds thin, robotic, or inconsistent.
RequirementSpecification
Minimum duration10 seconds of clean speech
Recommended duration1 to 3 minutes across multiple files
FormatMP3, WAV, M4A, FLAC
Sample rate16 kHz or higher
ChannelsMono or stereo (mono preferred)
Background noiseNone or near-zero
Music or effectsNot allowed — speech only
Multiple speakersNot allowed — single speaker per clone
Audio with background noise, music, reverb, or multiple speakers significantly degrades clone quality. Record in a quiet room with a close-proximity microphone, or use noise-cleaned audio before uploading.
What good source audio sounds like: natural speech at a conversational pace, covering a range of sentences (not just one repeated phrase), with no long silences, no clipping, and consistent microphone distance throughout.

Clone steps

1

Open Voice Cloning

In your workspace, go to Settings > Voices > Clone Voice. If you do not see this option, your workspace plan does not include voice cloning — contact support.
2

Name the voice

Enter a name for the cloned voice. This name appears in the voice selector and in the API. Choose something descriptive, such as “Brand Voice - Female EN” rather than “Clone 1”.
3

Upload audio samples

Drag and drop your audio files into the upload area, or click Choose Files. You can upload multiple files. The uploader accepts MP3, WAV, M4A, and FLAC up to 25 MB per file.
4

Submit for processing

Click Create Voice. Processing typically takes 30 seconds to 3 minutes depending on the total duration of uploaded audio. You do not need to stay on the page — you will receive a workspace notification when the clone is ready.
5

Preview the clone

Once processing completes, the cloned voice appears in your workspace voice list. Click the preview button to synthesize a test phrase and evaluate quality before assigning it to an agent.

Using the cloned voice in an agent

After cloning, the voice appears in the voice selector under My Voices (or your workspace’s custom voice section). Select it the same way you would any catalog voice. The cloned voice has a Nexa voice ID (vel_...) that you can use in API-configured agents:
{
  "voice": {
    "provider": "elevenlabs",
    "voice_id": "vel_xxxxxxxxxxxxxxxx"
  }
}

Processing time

Audio duration uploadedTypical processing time
Under 1 minute30 to 60 seconds
1 to 3 minutes1 to 3 minutes
Over 3 minutes3 to 5 minutes
If processing takes longer than 10 minutes, the job has likely failed. Check the voice list for an error state and retry with different audio.

Limitations

Language support: cloned voices inherit language capability from the ElevenLabs cloning system. A voice cloned from English audio primarily performs well in English. For non-English synthesis, use the Multilingual v2 model, but expect some accent bleed from the source recordings. Quality expectations: cloning does not produce a perfect replica. The output is a synthetic approximation. Longer and more varied source audio produces better results. A 30-second sample will produce noticeably lower quality than 2 minutes of varied speech. Usage rights: you are responsible for ensuring you have the rights to clone the voice in the audio you upload. DialNexa does not verify consent or ownership of uploaded recordings. Editing or re-cloning: you cannot modify a clone after creation. To improve quality, delete the existing clone and create a new one with better audio. Number of clones: the number of cloned voices per workspace is subject to your plan limits. Check Settings > Billing for your current usage.

Troubleshooting

This usually means the source audio was too short or lacked variety. Upload at least 1 minute of natural, conversational speech covering multiple sentence types.
The most common cause is an unsupported audio format or a file that is corrupt. Convert your audio to WAV at 16 kHz and retry.
Background noise in the source audio causes the model to capture ambient characteristics rather than the speaker. Use a noise-cleaned version of the recording.
Switch the agent’s Voice Model to Multilingual v2. Expect some accent from the source language if the speaker in the source audio was a non-native speaker.