Update doc

This commit is contained in:
Fangjun Kuang 2024-03-12 22:25:23 +08:00
parent a92c6df76a
commit 4abfdc7f57
2 changed files with 26 additions and 8 deletions

View File

@ -56,7 +56,8 @@ Training
--start-epoch 1 \ --start-epoch 1 \
--use-fp16 1 \ --use-fp16 1 \
--exp-dir vits/exp \ --exp-dir vits/exp \
--tokens data/tokens.txt --tokens data/tokens.txt \
--model-type high \
--max-duration 500 --max-duration 500
.. note:: .. note::
@ -64,6 +65,11 @@ Training
You can adjust the hyper-parameters to control the size of the VITS model and You can adjust the hyper-parameters to control the size of the VITS model and
the training configurations. For more details, please run ``./vits/train.py --help``. the training configurations. For more details, please run ``./vits/train.py --help``.
.. warning::
If you want a model that runs faster on CPU, please use ``--model-type low``
or ``--model-type medium``.
.. note:: .. note::
The training can take a long time (usually a couple of days). The training can take a long time (usually a couple of days).
@ -95,8 +101,8 @@ training part first. It will save the ground-truth and generated wavs to the dir
Export models Export models
------------- -------------
Currently we only support ONNX model exporting. It will generate two files in the given ``exp-dir``: Currently we only support ONNX model exporting. It will generate one file in the given ``exp-dir``:
``vits-epoch-*.onnx`` and ``vits-epoch-*.int8.onnx``. ``vits-epoch-*.onnx``.
.. code-block:: bash .. code-block:: bash
@ -120,4 +126,7 @@ Download pretrained models
If you don't want to train from scratch, you can download the pretrained models If you don't want to train from scratch, you can download the pretrained models
by visiting the following link: by visiting the following link:
- `<https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2024-02-28>`_ - ``--model-type=high``: `<https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2024-02-28>`_
- ``--model-type=medium``: `<https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-medium-2024-03-12>`_
- ``--model-type=low``: `<https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-low-2024-03-12>`_

View File

@ -43,7 +43,7 @@ If you feel that the trained model is slow at runtime, you can specify the
argument `--model-type` during training. Possible values are: argument `--model-type` during training. Possible values are:
- `low`, means **low** quality. The resulting model is very small in file size - `low`, means **low** quality. The resulting model is very small in file size
and runs very fast. The following is a wave file generatd by a `low` model and runs very fast. The following is a wave file generatd by a `low` quality model
https://github.com/k2-fsa/icefall/assets/5284924/d5758c24-470d-40ee-b089-e57fcba81633 https://github.com/k2-fsa/icefall/assets/5284924/d5758c24-470d-40ee-b089-e57fcba81633
@ -52,15 +52,24 @@ argument `--model-type` during training. Possible values are:
The exported onnx model has a file size of ``26.8 MB`` (float32). The exported onnx model has a file size of ``26.8 MB`` (float32).
- `medium`, means **medium** quality. - `medium`, means **medium** quality.
The following is a wave file generatd by a `medium` model The following is a wave file generatd by a `medium` quality model
https://github.com/k2-fsa/icefall/assets/5284924/b199d960-3665-4d0d-9ae9-a1bb69cbc8ac https://github.com/k2-fsa/icefall/assets/5284924/b199d960-3665-4d0d-9ae9-a1bb69cbc8ac
The text is `Ask not what your country can do for you; ask what you can do for your country.` The text is `Ask not what your country can do for you; ask what you can do for your country.`
The exported onnx model has file size of ``70.9 MB`` (float32). The exported onnx model has a file size of ``70.9 MB`` (float32).
- `high`, means **high** quality. This is the default value.
The following is a wave file generatd by a `high` quality model
https://github.com/k2-fsa/icefall/assets/5284924/b39f3048-73a6-4267-bf95-df5abfdb28fc
The text is `Ask not what your country can do for you; ask what you can do for your country.`
The exported onnx model has a file size of ``113 MB`` (float32).
- `high`, means **high** quality
A pre-trained `low` model trained using 4xV100 32GB GPU with the following command can be found at A pre-trained `low` model trained using 4xV100 32GB GPU with the following command can be found at
<https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-low-2024-03-12> <https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-low-2024-03-12>