This commit is contained in:
yaozengwei 2023-11-28 10:50:07 +08:00
parent a983dcd469
commit 5ab142842e
3 changed files with 115 additions and 1 deletions

View File

@ -0,0 +1,7 @@
TTS
======
.. toctree::
:maxdepth: 2
ljspeech/vits

View File

@ -0,0 +1,106 @@
VITS
===============
This tutorial shows you how to train an VITS model
with the `LJSpeech <https://keithito.com/LJ-Speech-Dataset/>`_ dataset.
.. note::
The VITS paper: `Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech <https://arxiv.org/pdf/2106.06103.pdf>`_
Data preparation
----------------
.. code-block:: bash
$ cd egs/ljspeech/TTS
$ ./prepare.sh
To run stage 1 to stage 5, use
.. code-block:: bash
$ ./prepare.sh --stage 1 --stop_stage 5
Build Monotonic Alignment Search
--------------------------------
.. code-block:: bash
$ cd vits/monotonic_align
$ python setup.py build_ext --inplace
$ cd ../../
Training
--------
.. code-block:: bash
$ export CUDA_VISIBLE_DEVICES="0,1,2,3"
$ ./vits/train.py \
--world-size 4 \
--num-epochs 1000 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir vits/exp \
--tokens data/tokens.txt
--max-duration 500
.. note::
You can adjust the hyper-parameters to control the size of the VITS model and
the training configurations. For more details, please run ``./vits/train.py --help``.
.. note::
The training can take a long time (usually a couple of days).
Training logs, checkpoints and tensorboard logs are saved in ``vits/exp``.
Inference
---------
The inference part uses checkpoints saved by the training part, so you have to run the
training part first. It will save the ground-truth and generated wavs to the directory
``vits/exp/infer/epoch-*/wav``, e.g., ``vits/exp/infer/epoch-1000/wav``.
.. code-block:: bash
$ export CUDA_VISIBLE_DEVICES="0"
$ ./vits/infer.py \
--epoch 1000 \
--exp-dir vits/exp \
--tokens data/tokens.txt
--max-duration 500
.. note::
For more details, please run ``./vits/infer.py --help``.
Export models
-------------
Currently we only support ONNX model exporting. It will generate two files in the given ``exp-dir``:
``vits-epoch-*.onnx`` and ``vits-epoch-*.int8.onnx``.
.. code-block:: bash
$ ./vits/export-onnx.py \
--epoch 1000 \
--exp-dir vits/exp \
--tokens data/tokens.txt
You can test the exported ONNX model with:
.. code-block:: bash
$ ./vits/test_onnx.py \
--model-filename vits/exp/vits-epoch-1000.onnx \
--tokens data/tokens.txt

View File

@ -2,7 +2,7 @@ Recipes
======= =======
This page contains various recipes in ``icefall``. This page contains various recipes in ``icefall``.
Currently, only speech recognition recipes are provided. Currently, we provide recipes for speech recognition, language model, and speech synthesis.
We may add recipes for other tasks as well in the future. We may add recipes for other tasks as well in the future.
@ -16,3 +16,4 @@ We may add recipes for other tasks as well in the future.
Non-streaming-ASR/index Non-streaming-ASR/index
Streaming-ASR/index Streaming-ASR/index
RNN-LM/index RNN-LM/index
TTS/index