diff --git a/docs/source/recipes/TTS/ljspeech/vits.rst b/docs/source/recipes/TTS/ljspeech/vits.rst index 323d0adfc..d31bf6302 100644 --- a/docs/source/recipes/TTS/ljspeech/vits.rst +++ b/docs/source/recipes/TTS/ljspeech/vits.rst @@ -56,7 +56,8 @@ Training --start-epoch 1 \ --use-fp16 1 \ --exp-dir vits/exp \ - --tokens data/tokens.txt + --tokens data/tokens.txt \ + --model-type high \ --max-duration 500 .. note:: @@ -64,6 +65,11 @@ Training You can adjust the hyper-parameters to control the size of the VITS model and the training configurations. For more details, please run ``./vits/train.py --help``. +.. warning:: + + If you want a model that runs faster on CPU, please use ``--model-type low`` + or ``--model-type medium``. + .. note:: The training can take a long time (usually a couple of days). @@ -95,8 +101,8 @@ training part first. It will save the ground-truth and generated wavs to the dir Export models ------------- -Currently we only support ONNX model exporting. It will generate two files in the given ``exp-dir``: -``vits-epoch-*.onnx`` and ``vits-epoch-*.int8.onnx``. +Currently we only support ONNX model exporting. It will generate one file in the given ``exp-dir``: +``vits-epoch-*.onnx``. .. code-block:: bash @@ -120,4 +126,7 @@ Download pretrained models If you don't want to train from scratch, you can download the pretrained models by visiting the following link: - - ``_ + - ``--model-type=high``: ``_ + - ``--model-type=medium``: ``_ + - ``--model-type=low``: ``_ + diff --git a/egs/ljspeech/TTS/README.md b/egs/ljspeech/TTS/README.md index 8b28193ca..7b112c12c 100644 --- a/egs/ljspeech/TTS/README.md +++ b/egs/ljspeech/TTS/README.md @@ -43,7 +43,7 @@ If you feel that the trained model is slow at runtime, you can specify the argument `--model-type` during training. Possible values are: - `low`, means **low** quality. The resulting model is very small in file size - and runs very fast. The following is a wave file generatd by a `low` model + and runs very fast. The following is a wave file generatd by a `low` quality model https://github.com/k2-fsa/icefall/assets/5284924/d5758c24-470d-40ee-b089-e57fcba81633 @@ -52,15 +52,24 @@ argument `--model-type` during training. Possible values are: The exported onnx model has a file size of ``26.8 MB`` (float32). - `medium`, means **medium** quality. - The following is a wave file generatd by a `medium` model + The following is a wave file generatd by a `medium` quality model https://github.com/k2-fsa/icefall/assets/5284924/b199d960-3665-4d0d-9ae9-a1bb69cbc8ac The text is `Ask not what your country can do for you; ask what you can do for your country.` - The exported onnx model has file size of ``70.9 MB`` (float32). + The exported onnx model has a file size of ``70.9 MB`` (float32). + + - `high`, means **high** quality. This is the default value. + + The following is a wave file generatd by a `high` quality model + + https://github.com/k2-fsa/icefall/assets/5284924/b39f3048-73a6-4267-bf95-df5abfdb28fc + + The text is `Ask not what your country can do for you; ask what you can do for your country.` + + The exported onnx model has a file size of ``113 MB`` (float32). - - `high`, means **high** quality A pre-trained `low` model trained using 4xV100 32GB GPU with the following command can be found at