mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 10:02:22 +00:00
deploy: 13daf73468da70f5db49c928969fce6c6edc041a
This commit is contained in:
parent
e5fed5060b
commit
a90f45427e
@ -0,0 +1,140 @@
|
||||
Finetune from a supervised pre-trained Zipformer model
|
||||
======================================================
|
||||
|
||||
This tutorial shows you how to fine-tune a supervised pre-trained **Zipformer**
|
||||
transducer model on a new dataset.
|
||||
|
||||
.. HINT::
|
||||
|
||||
We assume you have read the page :ref:`install icefall` and have setup
|
||||
the environment for ``icefall``.
|
||||
|
||||
.. HINT::
|
||||
|
||||
We recommend you to use a GPU or several GPUs to run this recipe
|
||||
|
||||
|
||||
For illustration purpose, we fine-tune the Zipformer transducer model
|
||||
pre-trained on `LibriSpeech`_ on the small subset of `GigaSpeech`_. You could use your
|
||||
own data for fine-tuning if you create a manifest for your new dataset.
|
||||
|
||||
Data preparation
|
||||
----------------
|
||||
|
||||
Please follow the instructions in the `GigaSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR>`_
|
||||
to prepare the fine-tune data used in this tutorial. We only require the small subset in GigaSpeech for this tutorial.
|
||||
|
||||
|
||||
Model preparation
|
||||
-----------------
|
||||
|
||||
We are using the Zipformer model trained on full LibriSpeech (960 hours) as the intialization. The
|
||||
checkpoint of the model can be downloaded via the following command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15
|
||||
$ cd icefall-asr-librispeech-zipformer-2023-05-15/exp
|
||||
$ git lfs pull --include "pretrained.pt"
|
||||
$ ln -s pretrained.pt epoch-99.pt
|
||||
$ cd ../data/lang_bpe_500
|
||||
$ git lfs pull --include bpe.model
|
||||
$ cd ../../..
|
||||
|
||||
Before fine-tuning, let's test the model's WER on the new domain. The following command performs
|
||||
decoding on the GigaSpeech test sets:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
./zipformer/decode_gigaspeech.py \
|
||||
--epoch 99 \
|
||||
--avg 1 \
|
||||
--exp-dir icefall-asr-librispeech-zipformer-2023-05-15/exp \
|
||||
--use-averaged-model 0 \
|
||||
--max-duration 1000 \
|
||||
--decoding-method greedy_search
|
||||
|
||||
You should see the following numbers:
|
||||
|
||||
.. code-block::
|
||||
|
||||
For dev, WER of different settings are:
|
||||
greedy_search 20.06 best for dev
|
||||
|
||||
For test, WER of different settings are:
|
||||
greedy_search 19.27 best for test
|
||||
|
||||
|
||||
Fine-tune
|
||||
---------
|
||||
|
||||
Since LibriSpeech and GigaSpeech are both English dataset, we can initialize the whole
|
||||
Zipformer model with the checkpoint downloaded in the previous step (otherwise we should consider
|
||||
initializing the stateless decoder and joiner from scratch due to the mismatch of the output
|
||||
vocabulary). The following command starts a fine-tuning experiment:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ use_mux=0
|
||||
$ do_finetune=1
|
||||
|
||||
$ ./zipformer/finetune.py \
|
||||
--world-size 2 \
|
||||
--num-epochs 20 \
|
||||
--start-epoch 1 \
|
||||
--exp-dir zipformer/exp_giga_finetune${do_finetune}_mux${use_mux} \
|
||||
--use-fp16 1 \
|
||||
--base-lr 0.0045 \
|
||||
--bpe-model data/lang_bpe_500/bpe.model \
|
||||
--do-finetune $do_finetune \
|
||||
--use-mux $use_mux \
|
||||
--master-port 13024 \
|
||||
--finetune-ckpt icefall-asr-librispeech-zipformer-2023-05-15/exp/pretrained.pt \
|
||||
--max-duration 1000
|
||||
|
||||
The following arguments are related to fine-tuning:
|
||||
|
||||
- ``--base-lr``
|
||||
The learning rate used for fine-tuning. We suggest to set a **small** learning rate for fine-tuning,
|
||||
otherwise the model may forget the initialization very quickly. A reasonable value should be around
|
||||
1/10 of the original lr, i.e 0.0045.
|
||||
|
||||
- ``--do-finetune``
|
||||
If True, do fine-tuning by initializing the model from a pre-trained checkpoint.
|
||||
**Note that if you want to resume your fine-tuning experiment from certain epochs, you
|
||||
need to set this to False.**
|
||||
|
||||
- ``--finetune-ckpt``
|
||||
The path to the pre-trained checkpoint (used for initialization).
|
||||
|
||||
- ``--use-mux``
|
||||
If True, mix the fine-tune data with the original training data by using `CutSet.mux <https://lhotse.readthedocs.io/en/latest/api.html#lhotse.supervision.SupervisionSet.mux>`_
|
||||
This helps maintain the model's performance on the original domain if the original training
|
||||
is available. **If you don't have the original training data, please set it to False.**
|
||||
|
||||
After fine-tuning, let's test the WERs. You can do this via the following command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ use_mux=0
|
||||
$ do_finetune=1
|
||||
$ ./zipformer/decode_gigaspeech.py \
|
||||
--epoch 20 \
|
||||
--avg 10 \
|
||||
--exp-dir zipformer/exp_giga_finetune${do_finetune}_mux${use_mux} \
|
||||
--use-averaged-model 1 \
|
||||
--max-duration 1000 \
|
||||
--decoding-method greedy_search
|
||||
|
||||
You should see numbers similar to the ones below:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
For dev, WER of different settings are:
|
||||
greedy_search 13.47 best for dev
|
||||
|
||||
For test, WER of different settings are:
|
||||
greedy_search 13.66 best for test
|
||||
|
||||
Compared to the original checkpoint, the fine-tuned model achieves much lower WERs
|
||||
on the GigaSpeech test sets.
|
15
_sources/recipes/Finetune/index.rst.txt
Normal file
15
_sources/recipes/Finetune/index.rst.txt
Normal file
@ -0,0 +1,15 @@
|
||||
Fine-tune a pre-trained model
|
||||
=============================
|
||||
|
||||
After pre-training on public available datasets, the ASR model is already capable of
|
||||
performing general speech recognition with relatively high accuracy. However, the accuracy
|
||||
could be still low on certain domains that are quite different from the original training
|
||||
set. In this case, we can fine-tune the model with a small amount of additional labelled
|
||||
data to improve the performance on new domains.
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Table of Contents
|
||||
|
||||
from_supervised/finetune_zipformer
|
@ -17,3 +17,4 @@ We may add recipes for other tasks as well in the future.
|
||||
Streaming-ASR/index
|
||||
RNN-LM/index
|
||||
TTS/index
|
||||
Finetune/index
|
||||
|
@ -22,7 +22,7 @@
|
||||
<link rel="index" title="Index" href="../genindex.html" />
|
||||
<link rel="search" title="Search" href="../search.html" />
|
||||
<link rel="next" title="Contributing to Documentation" href="doc.html" />
|
||||
<link rel="prev" title="VITS" href="../recipes/TTS/vctk/vits.html" />
|
||||
<link rel="prev" title="Finetune from a supervised pre-trained Zipformer model" href="../recipes/Finetune/from_supervised/finetune_zipformer.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
@ -135,7 +135,7 @@ and code to <code class="docutils literal notranslate"><span class="pre">icefall
|
||||
</div>
|
||||
</div>
|
||||
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
||||
<a href="../recipes/TTS/vctk/vits.html" class="btn btn-neutral float-left" title="VITS" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||||
<a href="../recipes/Finetune/from_supervised/finetune_zipformer.html" class="btn btn-neutral float-left" title="Finetune from a supervised pre-trained Zipformer model" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||||
<a href="doc.html" class="btn btn-neutral float-right" title="Contributing to Documentation" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||||
</div>
|
||||
|
||||
|
@ -163,6 +163,10 @@ speech recognition recipes using <a class="reference external" href="https://git
|
||||
<li class="toctree-l3"><a class="reference internal" href="recipes/TTS/vctk/vits.html">VITS</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="recipes/Finetune/index.html">Fine-tune a pre-trained model</a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="recipes/Finetune/from_supervised/finetune_zipformer.html">Finetune from a supervised pre-trained Zipformer model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
BIN
objects.inv
BIN
objects.inv
Binary file not shown.
270
recipes/Finetune/from_supervised/finetune_zipformer.html
Normal file
270
recipes/Finetune/from_supervised/finetune_zipformer.html
Normal file
@ -0,0 +1,270 @@
|
||||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Finetune from a supervised pre-trained Zipformer model — icefall 0.1 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css?v=fa44fd50" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js?v=e031e9a9"></script>
|
||||
<script src="../../../_static/doctools.js?v=888ff710"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js?v=4825356b"></script>
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
<link rel="next" title="Contributing" href="../../../contributing/index.html" />
|
||||
<link rel="prev" title="Fine-tune a pre-trained model" href="../index.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../../index.html" class="icon icon-home">
|
||||
icefall
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../for-dummies/index.html">Icefall for dummies tutorial</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../docker/index.html">Docker</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../faqs.html">Frequently Asked Questions (FAQs)</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
|
||||
</ul>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Fine-tune a pre-trained model</a><ul class="current">
|
||||
<li class="toctree-l3 current"><a class="current reference internal" href="#">Finetune from a supervised pre-trained Zipformer model</a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="#model-preparation">Model preparation</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="#fine-tune">Fine-tune</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
|
||||
</ul>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../../index.html">icefall</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Fine-tune a pre-trained model</a></li>
|
||||
<li class="breadcrumb-item active">Finetune from a supervised pre-trained Zipformer model</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Finetune/from_supervised/finetune_zipformer.rst" class="fa fa-github"> Edit on GitHub</a>
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<section id="finetune-from-a-supervised-pre-trained-zipformer-model">
|
||||
<h1>Finetune from a supervised pre-trained Zipformer model<a class="headerlink" href="#finetune-from-a-supervised-pre-trained-zipformer-model" title="Permalink to this heading"></a></h1>
|
||||
<p>This tutorial shows you how to fine-tune a supervised pre-trained <strong>Zipformer</strong>
|
||||
transducer model on a new dataset.</p>
|
||||
<div class="admonition hint">
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
|
||||
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>We recommend you to use a GPU or several GPUs to run this recipe</p>
|
||||
</div>
|
||||
<p>For illustration purpose, we fine-tune the Zipformer transducer model
|
||||
pre-trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> on the small subset of <a class="reference external" href="https://github.com/SpeechColab/GigaSpeech">GigaSpeech</a>. You could use your
|
||||
own data for fine-tuning if you create a manifest for your new dataset.</p>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<p>Please follow the instructions in the <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR">GigaSpeech recipe</a>
|
||||
to prepare the fine-tune data used in this tutorial. We only require the small subset in GigaSpeech for this tutorial.</p>
|
||||
</section>
|
||||
<section id="model-preparation">
|
||||
<h2>Model preparation<a class="headerlink" href="#model-preparation" title="Permalink to this heading"></a></h2>
|
||||
<p>We are using the Zipformer model trained on full LibriSpeech (960 hours) as the intialization. The
|
||||
checkpoint of the model can be downloaded via the following command:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-zipformer-2023-05-15/exp
|
||||
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">"pretrained.pt"</span>
|
||||
$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>../data/lang_bpe_500
|
||||
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span>bpe.model
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>../../..
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Before fine-tuning, let’s test the model’s WER on the new domain. The following command performs
|
||||
decoding on the GigaSpeech test sets:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer/decode_gigaspeech.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>icefall-asr-librispeech-zipformer-2023-05-15/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-averaged-model<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">1000</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span>greedy_search
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You should see the following numbers:</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">For</span> <span class="n">dev</span><span class="p">,</span> <span class="n">WER</span> <span class="n">of</span> <span class="n">different</span> <span class="n">settings</span> <span class="n">are</span><span class="p">:</span>
|
||||
<span class="n">greedy_search</span> <span class="mf">20.06</span> <span class="n">best</span> <span class="k">for</span> <span class="n">dev</span>
|
||||
|
||||
<span class="n">For</span> <span class="n">test</span><span class="p">,</span> <span class="n">WER</span> <span class="n">of</span> <span class="n">different</span> <span class="n">settings</span> <span class="n">are</span><span class="p">:</span>
|
||||
<span class="n">greedy_search</span> <span class="mf">19.27</span> <span class="n">best</span> <span class="k">for</span> <span class="n">test</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="fine-tune">
|
||||
<h2>Fine-tune<a class="headerlink" href="#fine-tune" title="Permalink to this heading"></a></h2>
|
||||
<p>Since LibriSpeech and GigaSpeech are both English dataset, we can initialize the whole
|
||||
Zipformer model with the checkpoint downloaded in the previous step (otherwise we should consider
|
||||
initializing the stateless decoder and joiner from scratch due to the mismatch of the output
|
||||
vocabulary). The following command starts a fine-tuning experiment:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">use_mux</span><span class="o">=</span><span class="m">0</span>
|
||||
$<span class="w"> </span><span class="nv">do_finetune</span><span class="o">=</span><span class="m">1</span>
|
||||
|
||||
$<span class="w"> </span>./zipformer/finetune.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>zipformer/exp_giga_finetune<span class="si">${</span><span class="nv">do_finetune</span><span class="si">}</span>_mux<span class="si">${</span><span class="nv">use_mux</span><span class="si">}</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--base-lr<span class="w"> </span><span class="m">0</span>.0045<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--do-finetune<span class="w"> </span><span class="nv">$do_finetune</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-mux<span class="w"> </span><span class="nv">$use_mux</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--master-port<span class="w"> </span><span class="m">13024</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--finetune-ckpt<span class="w"> </span>icefall-asr-librispeech-zipformer-2023-05-15/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">1000</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The following arguments are related to fine-tuning:</p>
|
||||
<ul class="simple">
|
||||
<li><dl class="simple">
|
||||
<dt><code class="docutils literal notranslate"><span class="pre">--base-lr</span></code></dt><dd><p>The learning rate used for fine-tuning. We suggest to set a <strong>small</strong> learning rate for fine-tuning,
|
||||
otherwise the model may forget the initialization very quickly. A reasonable value should be around
|
||||
1/10 of the original lr, i.e 0.0045.</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</li>
|
||||
<li><dl class="simple">
|
||||
<dt><code class="docutils literal notranslate"><span class="pre">--do-finetune</span></code></dt><dd><p>If True, do fine-tuning by initializing the model from a pre-trained checkpoint.
|
||||
<strong>Note that if you want to resume your fine-tuning experiment from certain epochs, you
|
||||
need to set this to False.</strong></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</li>
|
||||
<li><dl class="simple">
|
||||
<dt><code class="docutils literal notranslate"><span class="pre">--finetune-ckpt</span></code></dt><dd><p>The path to the pre-trained checkpoint (used for initialization).</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</li>
|
||||
<li><dl class="simple">
|
||||
<dt><code class="docutils literal notranslate"><span class="pre">--use-mux</span></code></dt><dd><p>If True, mix the fine-tune data with the original training data by using <a class="reference external" href="https://lhotse.readthedocs.io/en/latest/api.html#lhotse.supervision.SupervisionSet.mux">CutSet.mux</a>
|
||||
This helps maintain the model’s performance on the original domain if the original training
|
||||
is available. <strong>If you don’t have the original training data, please set it to False.</strong></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</li>
|
||||
</ul>
|
||||
<p>After fine-tuning, let’s test the WERs. You can do this via the following command:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">use_mux</span><span class="o">=</span><span class="m">0</span>
|
||||
$<span class="w"> </span><span class="nv">do_finetune</span><span class="o">=</span><span class="m">1</span>
|
||||
$<span class="w"> </span>./zipformer/decode_gigaspeech.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>zipformer/exp_giga_finetune<span class="si">${</span><span class="nv">do_finetune</span><span class="si">}</span>_mux<span class="si">${</span><span class="nv">use_mux</span><span class="si">}</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-averaged-model<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">1000</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span>greedy_search
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You should see numbers similar to the ones below:</p>
|
||||
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>For dev, WER of different settings are:
|
||||
greedy_search 13.47 best for dev
|
||||
|
||||
For test, WER of different settings are:
|
||||
greedy_search 13.66 best for test
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Compared to the original checkpoint, the fine-tuned model achieves much lower WERs
|
||||
on the GigaSpeech test sets.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
||||
<a href="../index.html" class="btn btn-neutral float-left" title="Fine-tune a pre-trained model" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||||
<a href="../../../contributing/index.html" class="btn btn-neutral float-right" title="Contributing" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||||
</div>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2021, icefall development team.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
152
recipes/Finetune/index.html
Normal file
152
recipes/Finetune/index.html
Normal file
@ -0,0 +1,152 @@
|
||||
<!DOCTYPE html>
|
||||
<html class="writer-html5" lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" /><meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Fine-tune a pre-trained model — icefall 0.1 documentation</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css?v=fa44fd50" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css?v=19f00094" />
|
||||
|
||||
|
||||
<!--[if lt IE 9]>
|
||||
<script src="../../_static/js/html5shiv.min.js"></script>
|
||||
<![endif]-->
|
||||
|
||||
<script src="../../_static/jquery.js?v=5d32c60e"></script>
|
||||
<script src="../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
|
||||
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js?v=e031e9a9"></script>
|
||||
<script src="../../_static/doctools.js?v=888ff710"></script>
|
||||
<script src="../../_static/sphinx_highlight.js?v=4825356b"></script>
|
||||
<script src="../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../search.html" />
|
||||
<link rel="next" title="Finetune from a supervised pre-trained Zipformer model" href="from_supervised/finetune_zipformer.html" />
|
||||
<link rel="prev" title="VITS" href="../TTS/vctk/vits.html" />
|
||||
</head>
|
||||
|
||||
<body class="wy-body-for-nav">
|
||||
<div class="wy-grid-for-nav">
|
||||
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
||||
<div class="wy-side-scroll">
|
||||
<div class="wy-side-nav-search" >
|
||||
|
||||
|
||||
|
||||
<a href="../../index.html" class="icon icon-home">
|
||||
icefall
|
||||
</a>
|
||||
<div role="search">
|
||||
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
|
||||
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
|
||||
<input type="hidden" name="check_keywords" value="yes" />
|
||||
<input type="hidden" name="area" value="default" />
|
||||
</form>
|
||||
</div>
|
||||
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
|
||||
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../for-dummies/index.html">Icefall for dummies tutorial</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../docker/index.html">Docker</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../faqs.html">Frequently Asked Questions (FAQs)</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li>
|
||||
</ul>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current">
|
||||
<li class="toctree-l2"><a class="reference internal" href="../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2 current"><a class="current reference internal" href="#">Fine-tune a pre-trained model</a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="from_supervised/finetune_zipformer.html">Finetune from a supervised pre-trained Zipformer model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
|
||||
</ul>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</nav>
|
||||
|
||||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
|
||||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
||||
<a href="../../index.html">icefall</a>
|
||||
</nav>
|
||||
|
||||
<div class="wy-nav-content">
|
||||
<div class="rst-content">
|
||||
<div role="navigation" aria-label="Page navigation">
|
||||
<ul class="wy-breadcrumbs">
|
||||
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
|
||||
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li>
|
||||
<li class="breadcrumb-item active">Fine-tune a pre-trained model</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Finetune/index.rst" class="fa fa-github"> Edit on GitHub</a>
|
||||
</li>
|
||||
</ul>
|
||||
<hr/>
|
||||
</div>
|
||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||
<div itemprop="articleBody">
|
||||
|
||||
<section id="fine-tune-a-pre-trained-model">
|
||||
<h1>Fine-tune a pre-trained model<a class="headerlink" href="#fine-tune-a-pre-trained-model" title="Permalink to this heading"></a></h1>
|
||||
<p>After pre-training on public available datasets, the ASR model is already capable of
|
||||
performing general speech recognition with relatively high accuracy. However, the accuracy
|
||||
could be still low on certain domains that are quite different from the original training
|
||||
set. In this case, we can fine-tune the model with a small amount of additional labelled
|
||||
data to improve the performance on new domains.</p>
|
||||
<div class="toctree-wrapper compound">
|
||||
<p class="caption" role="heading"><span class="caption-text">Table of Contents</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="from_supervised/finetune_zipformer.html">Finetune from a supervised pre-trained Zipformer model</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="from_supervised/finetune_zipformer.html#data-preparation">Data preparation</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="from_supervised/finetune_zipformer.html#model-preparation">Model preparation</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="from_supervised/finetune_zipformer.html#fine-tune">Fine-tune</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
||||
<a href="../TTS/vctk/vits.html" class="btn btn-neutral float-left" title="VITS" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||||
<a href="from_supervised/finetune_zipformer.html" class="btn btn-neutral float-right" title="Finetune from a supervised pre-trained Zipformer model" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||||
</div>
|
||||
|
||||
<hr/>
|
||||
|
||||
<div role="contentinfo">
|
||||
<p>© Copyright 2021, icefall development team.</p>
|
||||
</div>
|
||||
|
||||
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
|
||||
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
|
||||
provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
||||
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
<script>
|
||||
jQuery(function () {
|
||||
SphinxRtdTheme.Navigation.enable(true);
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
@ -69,6 +69,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -69,6 +69,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -69,6 +69,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -69,6 +69,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -64,6 +64,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -72,6 +72,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -72,6 +72,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -72,6 +72,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -72,6 +72,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -72,6 +72,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -72,6 +72,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -72,6 +72,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -68,6 +68,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -68,6 +68,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -68,6 +68,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -67,6 +67,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -67,6 +67,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -61,6 +61,7 @@
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -61,6 +61,7 @@
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -62,6 +62,7 @@
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -66,6 +66,7 @@
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -67,6 +67,7 @@
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -67,6 +67,7 @@
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -67,6 +67,7 @@
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -67,6 +67,7 @@
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -62,6 +62,7 @@
|
||||
<li class="toctree-l3"><a class="reference internal" href="vctk/vits.html">VITS</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -70,6 +70,7 @@
|
||||
<li class="toctree-l3"><a class="reference internal" href="../vctk/vits.html">VITS</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -21,7 +21,7 @@
|
||||
<script src="../../../_static/js/theme.js"></script>
|
||||
<link rel="index" title="Index" href="../../../genindex.html" />
|
||||
<link rel="search" title="Search" href="../../../search.html" />
|
||||
<link rel="next" title="Contributing" href="../../../contributing/index.html" />
|
||||
<link rel="next" title="Fine-tune a pre-trained model" href="../../Finetune/index.html" />
|
||||
<link rel="prev" title="VITS" href="../ljspeech/vits.html" />
|
||||
</head>
|
||||
|
||||
@ -70,6 +70,7 @@
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
@ -219,7 +220,7 @@ by visiting the following link:</p>
|
||||
</div>
|
||||
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
||||
<a href="../ljspeech/vits.html" class="btn btn-neutral float-left" title="VITS" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||||
<a href="../../../contributing/index.html" class="btn btn-neutral float-right" title="Contributing" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||||
<a href="../../Finetune/index.html" class="btn btn-neutral float-right" title="Fine-tune a pre-trained model" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||||
</div>
|
||||
|
||||
<hr/>
|
||||
|
@ -58,6 +58,7 @@
|
||||
<li class="toctree-l2"><a class="reference internal" href="Streaming-ASR/index.html">Streaming ASR</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="RNN-LM/index.html">RNN-LM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="TTS/index.html">TTS</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Finetune/index.html">Fine-tune a pre-trained model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
@ -122,6 +123,10 @@ Currently, we provide recipes for speech recognition, language model, and speech
|
||||
<li class="toctree-l2"><a class="reference internal" href="TTS/vctk/vits.html">VITS</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="Finetune/index.html">Fine-tune a pre-trained model</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="Finetune/from_supervised/finetune_zipformer.html">Finetune from a supervised pre-trained Zipformer model</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</section>
|
||||
|
File diff suppressed because one or more lines are too long
Loading…
x
Reference in New Issue
Block a user