![](/uploads/1/2/5/5/125526599/207556421.jpg)
-->
How to install more voices to Windows Speech? The 2 L&H voices for XP that MS provided as free downloads. Wow6432Node Microsoft Speech Server v11.0 Voices.
In Prepare data for Custom Voice, we described the different data types you can use to train a custom voice and the different format requirements. Once you have prepared your data, you can start to upload them to the Custom Voice portal, or through the Custom Voice training API. Here we describe the steps of training a custom voice through the portal.
Note
This page assumes you have read Get started with Custom Voice and Prepare data for Custom Voice, and have created a Custom Voice project.
Check the languages supported for custom voice: language for customization.
Upload your datasets
When you're ready to upload your data, go to the Custom Voice portal. Create or select a Custom Voice project. The project must share the right language/locale and the gender properties as the data you intent to use for your voice training. For example, select
en-GB
if the audio recordings you have is done in English with a UK accent.Go to the Data tab and click Upload data. In the wizard, select the correct data type that matches what you have prepared.
Each dataset you upload must meet the requirements for the data type that you choose. It is important to correctly format your data before it's uploaded. This ensures the data will be accurately processed by the Custom Voice service. Go to Prepare data for Custom Voice and make sure your data has been rightly formatted.
Note
Free subscription (F0) users can upload two datasets simultaneously. Standard subscription (S0) users can upload five datasets simultaneously. If you reach the limit, wait until at least one of your datasets finishes importing. Then try again.
Note
The maximum number of datasets allowed to be imported per subscription is 10 .zip files for free subscription (F0) users and 500 for standard subscription (S0) users.
![Microsoft Microsoft](https://thewindowsclub-thewindowsclubco.netdna-ssl.com/wp-content/uploads/2010/09/anna-tts-windows7.png)
Datasets are automatically validated once you hit the upload button. Data validation includes series of checks on the audio files to verify their file format, size, and sampling rate. Fix the errors if any and submit again. When the, ') are allowed. Use different names for different voice models.
A common use of the Description field is to record the names of the datasets that were used to create the model.
From the Select training data page, choose one or multiple datasets that you would like to use for training. Check the number of utterances before you submit them. You can start with any number of utterances for en-US and zh-CN voice models. For other locales, you must select more than 2,000 utterances to be able to train a voice.
Note
Duplicate audio names will be removed from the training. Make sure the datasets you select do not contain the same audio names across multiple .zip files.
Tip
Using the datasets from the same speaker is required for quality results. When the datasets you have submitted for training contain a total number of less than 6,000 distinct utterances, you will train your voice model through the Statistical Parametric Synthesis technique. In the case where your training data exceeds a total number of 6,000 distinct utterances, you will kick off a training process with the Concatenation Synthesis technique. Normally the concatenation technology can result in more natural, and higher-fidelity voice results. Contact the Custom Voice team if you want to train a model with the latest Neural TTS technology that can produce a digital voice equivalent to the publically available neural voices.
Click Train to begin creating your voice model.
The Training table displays a new entry that corresponds to this newly created model. The table also displays the status: Processing, Succeeded, Failed.
The status that's shown reflects the process of converting your dataset to a voice model, as shown here.
State | Meaning |
---|---|
Processing | Your voice model is being created. |
Succeeded | Your voice model has been created and can be deployed. |
Failed | Your voice model has been failed in training due to many reasons, for example unseen data problems or network issues. |
Training time varies depending on the volume of audio data processed. Typical times range from about 30 minutes for hundreds of utterances to 40 hours for 20,000 utterances. Once your model training is succeeded, you can start to test it.
Note
Free subscription (F0) users can train one voice font simultaneously. Standard subscription (S0) users can train three voices simultaneously. If you reach the limit, wait until at least one of your voice fonts finishes training, and then try again.
Note
The maximum number of voice models allowed to be trained per subscription is 10 models for free subscription (F0) users and 100 for standard subscription (S0) users.
If you are using the neural voice training capability, you can select to train a model optimized for real-time streaming scenarios, or a HD neural model optimized for asynchronous long-audio synthesis.
Test your voice model
After your voice font is successfully built, you can test it before deploying it for use.
- Navigate to Text-to-Speech > Custom Voice > Testing.
- Click Add test.
- Select one or multiple models that you would like to test.
- Provide the text you want the voice(s) to speak. If you have selected to test multiple models at one time, the same text will be used for the testing for different models.NoteThe language of your text must be the same as the language of your voice font. Only successfully trained models can be tested. Only plain text is supported in this step.
- Click Create.
![Microsoft Speech Voices Free Microsoft Speech Voices Free](http://cdn.windowsreport.com/wp-content/uploads/2014/06/share-to-speech-app-for-windows-8.jpg)
Once you have submitted your test request, you will return to the test page. The table now includes an entry that corresponds to your new request and the status column. It can take a few minutes to synthesize speech. When the status column says Succeeded, you can play the audio, or download the text input (a .txt file) and audio output (a .wav file), and further audition the latter for quality.
You can also find the test results in the detail page of each models you have selected for testing. Go to the Training tab, and click the model name to enter the model detail page.
Create and use a custom voice endpoint
After you've successfully created and tested your voice model, you deploy it in a custom Text-to-Speech endpoint. You then use this endpoint in place of the usual endpoint when making Text-to-Speech requests through the REST API. Your custom endpoint can be called only by the subscription that you have used to deploy the font.
To create a new custom voice endpoint, go to Text-to-Speech > Custom Voice > Deployment. Select Add endpoint and enter a Name and Description for your custom endpoint. Then select the custom voice model you would like to associate with this endpoint.
After you have clicked the Add button, in the endpoint table, you will see an entry for your new endpoint. It may take a few minutes to instantiate a new endpoint. When the status of the deployment is Succeeded, the endpoint is ready for use.
Note
Free subscription (F0) users can have only one model deployed. Standard subscription (S0) users can create up to 50 endpoints, each with its own custom voice.
Note
To use your custom voice, you must specify the voice model name, use the custom URI directly in an HTTP request, and use the same subscription to pass through the authentication of TTS service.
After your endpoint is deployed, the endpoint name appears as a link. Click the link to display information specific to your endpoint, such as the endpoint key, endpoint URL, and sample code.
Online testing of the endpoint is also available via the custom voice portal. To test your endpoint, choose Check endpoint from the Endpoint detail page. The endpoint testing page appears. Enter the text to be spoken (in either plain text or SSML format in the text box. To hear the text spoken in your custom voice font, select Play. This testing feature will be charged against your custom speech synthesis usage.
The custom endpoint is functionally identical to the standard endpoint that's used for text-to-speech requests. See REST API for more information.
Next steps
Balabolka is a Text-To-Speech (TTS) program. All computer voices installed on your system are available to Balabolka. The on-screen text can be saved as a WAV, MP3, MP4, OGG or WMA file. The program can read the clipboard content, view text from documents, customize font and background colour, control reading from the system tray or by the global hotkeys. Balabolka supports text file formats: AZW, AZW3, CHM, DjVu, DOC, DOCX, EML, EPUB, FB2, FB3, HTML, LIT, MOBI, ODP, ODS, ODT, PDB, PRC, PDF, PPT, PPTX, RTF, TCR, WPD, XLS, XLSX.
The program uses various versions of Microsoft Speech API (SAPI); it allows to alter a voice's parameters, including rate and pitch. The user can apply a special substitution list to improve the quality of the voice's articulation. This feature is useful when you want to change the spelling of words. The rules for the pronunciation correction use the syntax of regular expressions. Balabolka can save the synchronized text in external LRC files or in MP3 tags inside the audio files. When an audio file is played with players on a computer or on modern digital audio players, the text is displayed synchronously (at the same way, as lyrics for songs). |
Size: MB | |
Version:Changelog | |
Licence: Freeware | |
Operating System: |
Portable Version: Download ( MB)
Portable Balabolka does not require an installation and can be run from a USB drive.
A computer must have at least one voice installed.
Portable Balabolka does not require an installation and can be run from a USB drive.
A computer must have at least one voice installed.
Command Line Utility: Download ( KB)
The utility contains no graphical user interface and works only from the command line.
The application handles various command line parameters to be able to read text aloud or save as an audio file.
The utility contains no graphical user interface and works only from the command line.
The application handles various command line parameters to be able to read text aloud or save as an audio file.
Text Extract Utility: Download ( MB)
The program allows to extract text from the various types of files. The extracted text can be combined into one file or/and split into few files. The utility works from the command line, without displaying any user interface.
The program allows to extract text from the various types of files. The extracted text can be combined into one file or/and split into few files. The utility works from the command line, without displaying any user interface.
*Balabolka is a Russian word, it can be translated as 'chatterer'.
The program allows to use skins for customizing your window appearance.
Download Skin Pack (6.7 MB, 107 skins)
Balabolka can use the Microsoft Speech API 4.x/5.x voices and the Microsoft Speech Platform text-to-speech engines.
SAPI 4
All voices for SAPI 4 became obsolete; they are not recommended for installing anymore.
Use the fresh versions of speech engines.
Use the fresh versions of speech engines.
RHVoice - free and open source speech synthesizer (it supports English, Esperanto, Georgian, Kyrgyz, Portuguese, Russian, Tatar and Ukrainian):
UkrVox - free Ukrainian voice created by Yaroslav Kozak (Lviv, Ukraine):
Ekho - free TTS engine (it supports Cantonese, Mandarin and Zhaoan Hakka):
To get better voice quality, you can purchase commercial TTS engines.
High Quality Commercial Voices:
- Acapela Group (demo)
- Alfanum (demo)
- Cepstral (demo)
- IVONA (demo)
- Nuance (demo)
- ReadSpeaker (demo)
Microsoft Speech Platform
The Microsoft Speech Platform allows developers to build and deploy Text-to-Speech applications. The Microsoft Speech Platform consists of a Runtime, and Runtime Languages (engines for speech recognition and text-to-speech). There are separate Runtime Languages for speech recognition and speech synthesis. The version of the Runtime Languages must match the version of the Speech Platform that you installed.
Use the following steps to install the Microsoft Speech Platform (version 11.0):
- Download and install the Speech Platform Runtime
(the file 'x86_SpeechPlatformRuntimeSpeechPlatformRuntime.msi'). - Download and install Runtime Languages for use with the Speech Platform
(files with names starting with 'MSSpeech_TTS_').
The Microsoft Speech Platform currently provides support for 26 languages for speech synthesis. XML tags can be used both for SAPI 5 and the Speech Platform.
Balabolka can use Hunspell (hunspell.github.io). Hunspell is the default spell checker of OpenOffice.org, LibreOffice and Mozilla Firefox.
Spell checker dictionaries for Windows:
Balabolka can use language modules from Microsoft Office 97/2000 for spell checking. If Microsoft Office is not installed on your computer, or you use the other version of Microsoft Office, you can download spell checking components from my web-site:
Balabolka allows to use the spell checking built in operating system. The Spell Checking API is available beginning with the Windows 8.
![](/uploads/1/2/5/5/125526599/207556421.jpg)