mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-26 10:16:14 +00:00
Add generated wave
This commit is contained in:
parent
0db831910a
commit
74925e6538
@ -107,7 +107,8 @@ export CUDA_VISIBLE_DEVICES=4,5,6,7
|
|||||||
|
|
||||||
This recipe provides a Matcha-TTS model trained on the LJSpeech dataset.
|
This recipe provides a Matcha-TTS model trained on the LJSpeech dataset.
|
||||||
|
|
||||||
Pretrained model can be found [here](https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28).
|
Checkpoints and training logs can be found [here](https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28).
|
||||||
|
The pull-request for this recipe can be found at <https://github.com/k2-fsa/icefall/pull/1773>
|
||||||
|
|
||||||
The training command is given below:
|
The training command is given below:
|
||||||
```bash
|
```bash
|
||||||
@ -197,21 +198,24 @@ To use the generated onnx files to generate speech from text, please run:
|
|||||||
```bash
|
```bash
|
||||||
python3 ./matcha/onnx_pretrained.py \
|
python3 ./matcha/onnx_pretrained.py \
|
||||||
--acoustic-model ./model-steps-6.onnx \
|
--acoustic-model ./model-steps-6.onnx \
|
||||||
--vocoder ./hifigan_v2.onnx \
|
--vocoder ./hifigan_v1.onnx \
|
||||||
--tokens ./data/tokens.txt \
|
--tokens ./data/tokens.txt \
|
||||||
--input-text "how are you doing?" \
|
--input-text "Ask not what your country can do for you; ask what you can do for your country." \
|
||||||
--output-wav ./generated-2.wav
|
--output-wav ./matcha-epoch-4000-step6-hfigian-v1.wav
|
||||||
```
|
```
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
soxi ./generated-2.wav
|
soxi ./matcha-epoch-4000-step6-hfigian-v1.wav
|
||||||
|
|
||||||
Input File : './generated-2.wav'
|
Input File : './matcha-epoch-4000-step6-hfigian-v1.wav'
|
||||||
Channels : 1
|
Channels : 1
|
||||||
Sample Rate : 22050
|
Sample Rate : 22050
|
||||||
Precision : 16-bit
|
Precision : 16-bit
|
||||||
Duration : 00:00:01.25 = 27648 samples ~ 94.0408 CDDA sectors
|
Duration : 00:00:05.46 = 120320 samples ~ 409.252 CDDA sectors
|
||||||
File Size : 55.3k
|
File Size : 241k
|
||||||
Bit Rate : 353k
|
Bit Rate : 353k
|
||||||
Sample Encoding: 16-bit Signed Integer PCM
|
Sample Encoding: 16-bit Signed Integer PCM
|
||||||
```
|
```
|
||||||
|
|
||||||
|
https://github.com/user-attachments/assets/b7c197a6-3870-49c6-90ca-db4d3776869b
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user