Clone a Voice - DialNexa Documentation

Voice cloning lets you create a synthetic voice modeled on real audio recordings. The output is a custom voice that lives in your workspace and behaves like any catalog voice — you can assign it to agents, set speed and stability, and reference it by its Nexa voice ID. Common use cases: replicating a brand voice recorded by a professional voice actor, creating a voice that matches existing IVR recordings, or building a voice for a specific persona.

Prerequisites

An ElevenLabs-backed workspace (voice cloning is ElevenLabs only)
Audio samples that meet the quality requirements below
Workspace owner or admin role

Audio requirements

The quality of the cloned voice depends entirely on the quality of the input audio. Poor samples produce a voice that sounds thin, robotic, or inconsistent.

Requirement	Specification
Minimum duration	10 seconds of clean speech
Recommended duration	1 to 3 minutes across multiple files
Format	MP3, WAV, M4A, FLAC
Sample rate	16 kHz or higher
Channels	Mono or stereo (mono preferred)
Background noise	None or near-zero
Music or effects	Not allowed — speech only
Multiple speakers	Not allowed — single speaker per clone

Audio with background noise, music, reverb, or multiple speakers significantly degrades clone quality. Record in a quiet room with a close-proximity microphone, or use noise-cleaned audio before uploading.

What good source audio sounds like: natural speech at a conversational pace, covering a range of sentences (not just one repeated phrase), with no long silences, no clipping, and consistent microphone distance throughout.

Clone steps

Open Voice Cloning

In your workspace, go to Settings > Voices > Clone Voice. If you do not see this option, your workspace plan does not include voice cloning — contact support.

Name the voice

Enter a name for the cloned voice. This name appears in the voice selector and in the API. Choose something descriptive, such as “Brand Voice - Female EN” rather than “Clone 1”.

Upload audio samples

Drag and drop your audio files into the upload area, or click Choose Files. You can upload multiple files. The uploader accepts MP3, WAV, M4A, and FLAC up to 25 MB per file.

Submit for processing

Click Create Voice. Processing typically takes 30 seconds to 3 minutes depending on the total duration of uploaded audio. You do not need to stay on the page — you will receive a workspace notification when the clone is ready.

Preview the clone

Once processing completes, the cloned voice appears in your workspace voice list. Click the preview button to synthesize a test phrase and evaluate quality before assigning it to an agent.

Using the cloned voice in an agent

After cloning, the voice appears in the voice selector under My Voices (or your workspace’s custom voice section). Select it the same way you would any catalog voice. The cloned voice has a Nexa voice ID (vel_...) that you can use in API-configured agents:

{
  "voice": {
    "provider": "elevenlabs",
    "voice_id": "vel_xxxxxxxxxxxxxxxx"
  }
}

Processing time

Audio duration uploaded	Typical processing time
Under 1 minute	30 to 60 seconds
1 to 3 minutes	1 to 3 minutes
Over 3 minutes	3 to 5 minutes

If processing takes longer than 10 minutes, the job has likely failed. Check the voice list for an error state and retry with different audio.

Limitations

Language support: cloned voices inherit language capability from the ElevenLabs cloning system. A voice cloned from English audio primarily performs well in English. For non-English synthesis, use the Multilingual v2 model, but expect some accent bleed from the source recordings. Quality expectations: cloning does not produce a perfect replica. The output is a synthetic approximation. Longer and more varied source audio produces better results. A 30-second sample will produce noticeably lower quality than 2 minutes of varied speech. Usage rights: you are responsible for ensuring you have the rights to clone the voice in the audio you upload. DialNexa does not verify consent or ownership of uploaded recordings. Editing or re-cloning: you cannot modify a clone after creation. To improve quality, delete the existing clone and create a new one with better audio. Number of clones: the number of cloned voices per workspace is subject to your plan limits. Check Settings > Billing for your current usage.

Troubleshooting

The clone sounds robotic or flat

This usually means the source audio was too short or lacked variety. Upload at least 1 minute of natural, conversational speech covering multiple sentence types.

Processing failed with no error message

The most common cause is an unsupported audio format or a file that is corrupt. Convert your audio to WAV at 16 kHz and retry.

The voice sounds like a different person

Background noise in the source audio causes the model to capture ambient characteristics rather than the speaker. Use a noise-cleaned version of the recording.

Non-English synthesis sounds heavily accented

Switch the agent’s Voice Model to Multilingual v2. Expect some accent from the source language if the speaker in the source audio was a non-native speaker.

​Prerequisites

​Audio requirements

​Clone steps

​Using the cloned voice in an agent

​Processing time

​Limitations

​Troubleshooting

​Related

Prerequisites

Audio requirements

Clone steps

Using the cloned voice in an agent

Processing time

Limitations

Troubleshooting

Related