mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-26 18:24:18 +00:00
Update doc
This commit is contained in:
parent
a92c6df76a
commit
4abfdc7f57
@ -56,7 +56,8 @@ Training
|
|||||||
--start-epoch 1 \
|
--start-epoch 1 \
|
||||||
--use-fp16 1 \
|
--use-fp16 1 \
|
||||||
--exp-dir vits/exp \
|
--exp-dir vits/exp \
|
||||||
--tokens data/tokens.txt
|
--tokens data/tokens.txt \
|
||||||
|
--model-type high \
|
||||||
--max-duration 500
|
--max-duration 500
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
@ -64,6 +65,11 @@ Training
|
|||||||
You can adjust the hyper-parameters to control the size of the VITS model and
|
You can adjust the hyper-parameters to control the size of the VITS model and
|
||||||
the training configurations. For more details, please run ``./vits/train.py --help``.
|
the training configurations. For more details, please run ``./vits/train.py --help``.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
If you want a model that runs faster on CPU, please use ``--model-type low``
|
||||||
|
or ``--model-type medium``.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
The training can take a long time (usually a couple of days).
|
The training can take a long time (usually a couple of days).
|
||||||
@ -95,8 +101,8 @@ training part first. It will save the ground-truth and generated wavs to the dir
|
|||||||
Export models
|
Export models
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
Currently we only support ONNX model exporting. It will generate two files in the given ``exp-dir``:
|
Currently we only support ONNX model exporting. It will generate one file in the given ``exp-dir``:
|
||||||
``vits-epoch-*.onnx`` and ``vits-epoch-*.int8.onnx``.
|
``vits-epoch-*.onnx``.
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
@ -120,4 +126,7 @@ Download pretrained models
|
|||||||
If you don't want to train from scratch, you can download the pretrained models
|
If you don't want to train from scratch, you can download the pretrained models
|
||||||
by visiting the following link:
|
by visiting the following link:
|
||||||
|
|
||||||
- `<https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2024-02-28>`_
|
- ``--model-type=high``: `<https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2024-02-28>`_
|
||||||
|
- ``--model-type=medium``: `<https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-medium-2024-03-12>`_
|
||||||
|
- ``--model-type=low``: `<https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-low-2024-03-12>`_
|
||||||
|
|
||||||
|
@ -43,7 +43,7 @@ If you feel that the trained model is slow at runtime, you can specify the
|
|||||||
argument `--model-type` during training. Possible values are:
|
argument `--model-type` during training. Possible values are:
|
||||||
|
|
||||||
- `low`, means **low** quality. The resulting model is very small in file size
|
- `low`, means **low** quality. The resulting model is very small in file size
|
||||||
and runs very fast. The following is a wave file generatd by a `low` model
|
and runs very fast. The following is a wave file generatd by a `low` quality model
|
||||||
|
|
||||||
https://github.com/k2-fsa/icefall/assets/5284924/d5758c24-470d-40ee-b089-e57fcba81633
|
https://github.com/k2-fsa/icefall/assets/5284924/d5758c24-470d-40ee-b089-e57fcba81633
|
||||||
|
|
||||||
@ -52,15 +52,24 @@ argument `--model-type` during training. Possible values are:
|
|||||||
The exported onnx model has a file size of ``26.8 MB`` (float32).
|
The exported onnx model has a file size of ``26.8 MB`` (float32).
|
||||||
|
|
||||||
- `medium`, means **medium** quality.
|
- `medium`, means **medium** quality.
|
||||||
The following is a wave file generatd by a `medium` model
|
The following is a wave file generatd by a `medium` quality model
|
||||||
|
|
||||||
https://github.com/k2-fsa/icefall/assets/5284924/b199d960-3665-4d0d-9ae9-a1bb69cbc8ac
|
https://github.com/k2-fsa/icefall/assets/5284924/b199d960-3665-4d0d-9ae9-a1bb69cbc8ac
|
||||||
|
|
||||||
The text is `Ask not what your country can do for you; ask what you can do for your country.`
|
The text is `Ask not what your country can do for you; ask what you can do for your country.`
|
||||||
|
|
||||||
The exported onnx model has file size of ``70.9 MB`` (float32).
|
The exported onnx model has a file size of ``70.9 MB`` (float32).
|
||||||
|
|
||||||
|
- `high`, means **high** quality. This is the default value.
|
||||||
|
|
||||||
|
The following is a wave file generatd by a `high` quality model
|
||||||
|
|
||||||
|
https://github.com/k2-fsa/icefall/assets/5284924/b39f3048-73a6-4267-bf95-df5abfdb28fc
|
||||||
|
|
||||||
|
The text is `Ask not what your country can do for you; ask what you can do for your country.`
|
||||||
|
|
||||||
|
The exported onnx model has a file size of ``113 MB`` (float32).
|
||||||
|
|
||||||
- `high`, means **high** quality
|
|
||||||
|
|
||||||
A pre-trained `low` model trained using 4xV100 32GB GPU with the following command can be found at
|
A pre-trained `low` model trained using 4xV100 32GB GPU with the following command can be found at
|
||||||
<https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-low-2024-03-12>
|
<https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-low-2024-03-12>
|
||||||
|
Loading…
x
Reference in New Issue
Block a user