From 1653b76deb5bffec958d17cf5440ace4f776732f Mon Sep 17 00:00:00 2001 From: root Date: Tue, 4 Mar 2025 01:23:43 +0000 Subject: [PATCH] update readme and requirements --- egs/emilia/TTS/README.md | 18 ------------------ egs/wenetspeech4tts/TTS/README.md | 14 -------------- 2 files changed, 32 deletions(-) diff --git a/egs/emilia/TTS/README.md b/egs/emilia/TTS/README.md index 367a41899..d55ff10c3 100644 --- a/egs/emilia/TTS/README.md +++ b/egs/emilia/TTS/README.md @@ -12,22 +12,6 @@ hours of speech across six languages, covering a wide range of speaking styles t See https://arxiv.org/pdf/2407.05361. -> [!CAUTION] -> The next-gen Kaldi framework provides tools and models for generating high-quality, synthetic speech (Text-to-Speech, TTS). -> While these recipes has the potential to advance various fields such as accessibility, language education, and AI-driven solutions, it also carries certain ethical and legal responsibilities. -> -> By using this framework, you agree to the following: -> 1. Legal and Ethical Use: You shall not use this framework, or any models derived from it, for any unlawful or unethical purposes. This includes, but is not limited to: Creating voice clones without the explicit, informed consent of the individual whose voice is being cloned. Engaging in any form of identity theft, impersonation, or fraud using cloned voices. Violating any local, national, or international laws regarding privacy, intellectual property, or personal data. -> -> 2. Responsibility of Use: The users of this framework are solely responsible for ensuring that their use of voice cloning technologies complies with all applicable laws and ethical guidelines. We explicitly disclaim any liability for misuse of the technology. -> -> 3. Attribution and Use of Open-Source Components: This project is provided under the Apache 2.0 license. Users must adhere to the terms of this license and provide appropriate attribution when required. -> -> 4. No Warranty: This framework is provided “as-is,” without warranty of any kind, either express or implied. We do not guarantee that the use of this software will comply with legal requirements or that it will not infringe the rights of third parties. - - - - # Llasa (cosyvoice2 token) ./llasa_cosyvoice2_token contains the code for training qwen2.5-0.5b models to predict cosyvoice2 semantic tokens. @@ -48,9 +32,7 @@ The training command is given below: ``` # docker: ghcr.io/swivid/f5-tts:main -# pip install k2==1.24.4.dev20241030+cuda12.4.torch2.4.0 -f https://k2-fsa.github.io/k2/cuda.html # pip install -r llasa_cosyvoice2_token/requirements.txt -# pip install -r icefall/egs/wenetspeech4tts/TTS/f5-tts/requirements.txt WANDB_KEY=$your_wandb_key wandb login ${WANDB_KEY} diff --git a/egs/wenetspeech4tts/TTS/README.md b/egs/wenetspeech4tts/TTS/README.md index 9a48bd196..f1c57d853 100644 --- a/egs/wenetspeech4tts/TTS/README.md +++ b/egs/wenetspeech4tts/TTS/README.md @@ -9,20 +9,6 @@ [**WenetSpeech4TTS**](https://huggingface.co/datasets/Wenetspeech4TTS/WenetSpeech4TTS) is a multi-domain **Mandarin** corpus derived from the open-sourced [WenetSpeech](https://arxiv.org/abs/2110.03370) dataset. -> [!CAUTION] -> The next-gen Kaldi framework provides tools and models for generating high-quality, synthetic speech (Text-to-Speech, TTS). -> While these recipes has the potential to advance various fields such as accessibility, language education, and AI-driven solutions, it also carries certain ethical and legal responsibilities. -> -> By using this framework, you agree to the following: -> 1. Legal and Ethical Use: You shall not use this framework, or any models derived from it, for any unlawful or unethical purposes. This includes, but is not limited to: Creating voice clones without the explicit, informed consent of the individual whose voice is being cloned. Engaging in any form of identity theft, impersonation, or fraud using cloned voices. Violating any local, national, or international laws regarding privacy, intellectual property, or personal data. -> -> 2. Responsibility of Use: The users of this framework are solely responsible for ensuring that their use of voice cloning technologies complies with all applicable laws and ethical guidelines. We explicitly disclaim any liability for misuse of the technology. -> -> 3. Attribution and Use of Open-Source Components: This project is provided under the Apache 2.0 license. Users must adhere to the terms of this license and provide appropriate attribution when required. -> -> 4. No Warranty: This framework is provided “as-is,” without warranty of any kind, either express or implied. We do not guarantee that the use of this software will comply with legal requirements or that it will not infringe the rights of third parties. - - # [VALL-E](https://arxiv.org/abs/2301.02111) ./valle contains the code for training VALL-E TTS model.