mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-12-11 06:55:27 +00:00
deploy: 2fd970b6821d47dacb2e6513321520db21fff67b
This commit is contained in:
parent
67a922737c
commit
9289dab4d6
@ -140,22 +140,22 @@ it should succeed this time:</p>
|
||||
<p>If you want to check the style of your code before <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code>, you
|
||||
can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ pre-commit install
|
||||
$ pre-commit run
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>pre-commit<span class="w"> </span>install
|
||||
$<span class="w"> </span>pre-commit<span class="w"> </span>run
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p>Or without installing the pre-commit hooks:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> icefall
|
||||
$ pip install <span class="nv">black</span><span class="o">==</span><span class="m">22</span>.3.0 <span class="nv">flake8</span><span class="o">==</span><span class="m">5</span>.0.4 <span class="nv">isort</span><span class="o">==</span><span class="m">5</span>.10.1
|
||||
$ black --check your_changed_file.py
|
||||
$ black your_changed_file.py <span class="c1"># modify it in-place</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>icefall
|
||||
$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span><span class="nv">black</span><span class="o">==</span><span class="m">22</span>.3.0<span class="w"> </span><span class="nv">flake8</span><span class="o">==</span><span class="m">5</span>.0.4<span class="w"> </span><span class="nv">isort</span><span class="o">==</span><span class="m">5</span>.10.1
|
||||
$<span class="w"> </span>black<span class="w"> </span>--check<span class="w"> </span>your_changed_file.py
|
||||
$<span class="w"> </span>black<span class="w"> </span>your_changed_file.py<span class="w"> </span><span class="c1"># modify it in-place</span>
|
||||
$
|
||||
$ flake8 your_changed_file.py
|
||||
$<span class="w"> </span>flake8<span class="w"> </span>your_changed_file.py
|
||||
$
|
||||
$ isort --check your_changed_file.py <span class="c1"># modify it in-place</span>
|
||||
$ isort your_changed_file.py
|
||||
$<span class="w"> </span>isort<span class="w"> </span>--check<span class="w"> </span>your_changed_file.py<span class="w"> </span><span class="c1"># modify it in-place</span>
|
||||
$<span class="w"> </span>isort<span class="w"> </span>your_changed_file.py
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
|
||||
@ -88,8 +88,8 @@
|
||||
for documentation.</p>
|
||||
<p>Before writing documentation, you have to prepare the environment:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> docs
|
||||
$ pip install -r requirements.txt
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>docs
|
||||
$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -99,16 +99,16 @@ if you are not familiar with <code class="docutils literal notranslate"><span cl
|
||||
<p>After writing some documentation, you can build the documentation <strong>locally</strong>
|
||||
to preview what it looks like if it is published:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> docs
|
||||
$ make html
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>docs
|
||||
$<span class="w"> </span>make<span class="w"> </span>html
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p>The generated documentation is in <code class="docutils literal notranslate"><span class="pre">docs/build/html</span></code> and can be viewed
|
||||
with the following commands:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> docs/build/html
|
||||
$ python3 -m http.server
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>docs/build/html
|
||||
$<span class="w"> </span>python3<span class="w"> </span>-m<span class="w"> </span>http.server
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
|
||||
@ -140,12 +140,12 @@ $ touch README.md model.py train.py decode.py asr_datamodule.py pretrained.py
|
||||
<p>For instance , the <code class="docutils literal notranslate"><span class="pre">yesno</span></code> recipe has a <code class="docutils literal notranslate"><span class="pre">tdnn</span></code> model and its directory structure
|
||||
looks like the following:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>egs/yesno/ASR/tdnn/
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- asr_datamodule.py
|
||||
<span class="p">|</span>-- decode.py
|
||||
<span class="p">|</span>-- model.py
|
||||
<span class="p">|</span>-- pretrained.py
|
||||
<span class="sb">`</span>-- train.py
|
||||
<span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="p">|</span>--<span class="w"> </span>asr_datamodule.py
|
||||
<span class="p">|</span>--<span class="w"> </span>decode.py
|
||||
<span class="p">|</span>--<span class="w"> </span>model.py
|
||||
<span class="p">|</span>--<span class="w"> </span>pretrained.py
|
||||
<span class="sb">`</span>--<span class="w"> </span>train.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>File description</strong>:</p>
|
||||
|
||||
@ -166,11 +166,11 @@ to install <code class="docutils literal notranslate"><span class="pre">lhotse</
|
||||
and set the environment variable <code class="docutils literal notranslate"><span class="pre">PYTHONPATH</span></code> to point to it.</p>
|
||||
<p>Assume you want to place <code class="docutils literal notranslate"><span class="pre">icefall</span></code> in the folder <code class="docutils literal notranslate"><span class="pre">/tmp</span></code>. The
|
||||
following commands show you how to setup <code class="docutils literal notranslate"><span class="pre">icefall</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> /tmp
|
||||
git clone https://github.com/k2-fsa/icefall
|
||||
<span class="nb">cd</span> icefall
|
||||
pip install -r requirements.txt
|
||||
<span class="nb">export</span> <span class="nv">PYTHONPATH</span><span class="o">=</span>/tmp/icefall:<span class="nv">$PYTHONPATH</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>/tmp
|
||||
git<span class="w"> </span>clone<span class="w"> </span>https://github.com/k2-fsa/icefall
|
||||
<span class="nb">cd</span><span class="w"> </span>icefall
|
||||
pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
|
||||
<span class="nb">export</span><span class="w"> </span><span class="nv">PYTHONPATH</span><span class="o">=</span>/tmp/icefall:<span class="nv">$PYTHONPATH</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -185,39 +185,39 @@ to point to the version you want.</p>
|
||||
<p>The following shows an example about setting up the environment.</p>
|
||||
<section id="create-a-virtual-environment">
|
||||
<h3>(1) Create a virtual environment<a class="headerlink" href="#create-a-virtual-environment" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ virtualenv -p python3.8 test-icefall
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>virtualenv<span class="w"> </span>-p<span class="w"> </span>python3.8<span class="w"> </span>test-icefall
|
||||
|
||||
created virtual environment CPython3.8.6.final.0-64 <span class="k">in</span> 1540ms
|
||||
creator CPython3Posix<span class="o">(</span><span class="nv">dest</span><span class="o">=</span>/ceph-fj/fangjun/test-icefall, <span class="nv">clear</span><span class="o">=</span>False, <span class="nv">no_vcs_ignore</span><span class="o">=</span>False, <span class="nv">global</span><span class="o">=</span>False<span class="o">)</span>
|
||||
seeder FromAppData<span class="o">(</span><span class="nv">download</span><span class="o">=</span>False, <span class="nv">pip</span><span class="o">=</span>bundle, <span class="nv">setuptools</span><span class="o">=</span>bundle, <span class="nv">wheel</span><span class="o">=</span>bundle, <span class="nv">via</span><span class="o">=</span>copy, <span class="nv">app_data_dir</span><span class="o">=</span>/root/fangjun/.local/share/v
|
||||
created<span class="w"> </span>virtual<span class="w"> </span>environment<span class="w"> </span>CPython3.8.6.final.0-64<span class="w"> </span><span class="k">in</span><span class="w"> </span>1540ms
|
||||
<span class="w"> </span>creator<span class="w"> </span>CPython3Posix<span class="o">(</span><span class="nv">dest</span><span class="o">=</span>/ceph-fj/fangjun/test-icefall,<span class="w"> </span><span class="nv">clear</span><span class="o">=</span>False,<span class="w"> </span><span class="nv">no_vcs_ignore</span><span class="o">=</span>False,<span class="w"> </span><span class="nv">global</span><span class="o">=</span>False<span class="o">)</span>
|
||||
<span class="w"> </span>seeder<span class="w"> </span>FromAppData<span class="o">(</span><span class="nv">download</span><span class="o">=</span>False,<span class="w"> </span><span class="nv">pip</span><span class="o">=</span>bundle,<span class="w"> </span><span class="nv">setuptools</span><span class="o">=</span>bundle,<span class="w"> </span><span class="nv">wheel</span><span class="o">=</span>bundle,<span class="w"> </span><span class="nv">via</span><span class="o">=</span>copy,<span class="w"> </span><span class="nv">app_data_dir</span><span class="o">=</span>/root/fangjun/.local/share/v
|
||||
irtualenv<span class="o">)</span>
|
||||
added seed packages: <span class="nv">pip</span><span class="o">==</span><span class="m">21</span>.1.3, <span class="nv">setuptools</span><span class="o">==</span><span class="m">57</span>.4.0, <span class="nv">wheel</span><span class="o">==</span><span class="m">0</span>.36.2
|
||||
activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
|
||||
<span class="w"> </span>added<span class="w"> </span>seed<span class="w"> </span>packages:<span class="w"> </span><span class="nv">pip</span><span class="o">==</span><span class="m">21</span>.1.3,<span class="w"> </span><span class="nv">setuptools</span><span class="o">==</span><span class="m">57</span>.4.0,<span class="w"> </span><span class="nv">wheel</span><span class="o">==</span><span class="m">0</span>.36.2
|
||||
<span class="w"> </span>activators<span class="w"> </span>BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="activate-your-virtual-environment">
|
||||
<h3>(2) Activate your virtual environment<a class="headerlink" href="#activate-your-virtual-environment" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">source</span> test-icefall/bin/activate
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">source</span><span class="w"> </span>test-icefall/bin/activate
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="id1">
|
||||
<h3>(3) Install k2<a class="headerlink" href="#id1" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ pip install <span class="nv">k2</span><span class="o">==</span><span class="m">1</span>.4.dev20210822+cpu.torch1.9.0 -f https://k2-fsa.org/nightly/index.html
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span><span class="nv">k2</span><span class="o">==</span><span class="m">1</span>.4.dev20210822+cpu.torch1.9.0<span class="w"> </span>-f<span class="w"> </span>https://k2-fsa.org/nightly/index.html
|
||||
|
||||
Looking <span class="k">in</span> links: https://k2-fsa.org/nightly/index.html
|
||||
Collecting <span class="nv">k2</span><span class="o">==</span><span class="m">1</span>.4.dev20210822+cpu.torch1.9.0
|
||||
Downloading https://k2-fsa.org/nightly/whl/k2-1.4.dev20210822%2Bcpu.torch1.9.0-cp38-cp38-linux_x86_64.whl <span class="o">(</span><span class="m">1</span>.6 MB<span class="o">)</span>
|
||||
<span class="p">|</span>________________________________<span class="p">|</span> <span class="m">1</span>.6 MB <span class="m">185</span> kB/s
|
||||
Collecting graphviz
|
||||
Downloading graphviz-0.17-py3-none-any.whl <span class="o">(</span><span class="m">18</span> kB<span class="o">)</span>
|
||||
Collecting <span class="nv">torch</span><span class="o">==</span><span class="m">1</span>.9.0
|
||||
Using cached torch-1.9.0-cp38-cp38-manylinux1_x86_64.whl <span class="o">(</span><span class="m">831</span>.4 MB<span class="o">)</span>
|
||||
Collecting typing-extensions
|
||||
Using cached typing_extensions-3.10.0.0-py3-none-any.whl <span class="o">(</span><span class="m">26</span> kB<span class="o">)</span>
|
||||
Installing collected packages: typing-extensions, torch, graphviz, k2
|
||||
Successfully installed graphviz-0.17 k2-1.4.dev20210822+cpu.torch1.9.0 torch-1.9.0 typing-extensions-3.10.0.0
|
||||
Looking<span class="w"> </span><span class="k">in</span><span class="w"> </span>links:<span class="w"> </span>https://k2-fsa.org/nightly/index.html
|
||||
Collecting<span class="w"> </span><span class="nv">k2</span><span class="o">==</span><span class="m">1</span>.4.dev20210822+cpu.torch1.9.0
|
||||
<span class="w"> </span>Downloading<span class="w"> </span>https://k2-fsa.org/nightly/whl/k2-1.4.dev20210822%2Bcpu.torch1.9.0-cp38-cp38-linux_x86_64.whl<span class="w"> </span><span class="o">(</span><span class="m">1</span>.6<span class="w"> </span>MB<span class="o">)</span>
|
||||
<span class="w"> </span><span class="p">|</span>________________________________<span class="p">|</span><span class="w"> </span><span class="m">1</span>.6<span class="w"> </span>MB<span class="w"> </span><span class="m">185</span><span class="w"> </span>kB/s
|
||||
Collecting<span class="w"> </span>graphviz
|
||||
<span class="w"> </span>Downloading<span class="w"> </span>graphviz-0.17-py3-none-any.whl<span class="w"> </span><span class="o">(</span><span class="m">18</span><span class="w"> </span>kB<span class="o">)</span>
|
||||
Collecting<span class="w"> </span><span class="nv">torch</span><span class="o">==</span><span class="m">1</span>.9.0
|
||||
<span class="w"> </span>Using<span class="w"> </span>cached<span class="w"> </span>torch-1.9.0-cp38-cp38-manylinux1_x86_64.whl<span class="w"> </span><span class="o">(</span><span class="m">831</span>.4<span class="w"> </span>MB<span class="o">)</span>
|
||||
Collecting<span class="w"> </span>typing-extensions
|
||||
<span class="w"> </span>Using<span class="w"> </span>cached<span class="w"> </span>typing_extensions-3.10.0.0-py3-none-any.whl<span class="w"> </span><span class="o">(</span><span class="m">26</span><span class="w"> </span>kB<span class="o">)</span>
|
||||
Installing<span class="w"> </span>collected<span class="w"> </span>packages:<span class="w"> </span>typing-extensions,<span class="w"> </span>torch,<span class="w"> </span>graphviz,<span class="w"> </span>k2
|
||||
Successfully<span class="w"> </span>installed<span class="w"> </span>graphviz-0.17<span class="w"> </span>k2-1.4.dev20210822+cpu.torch1.9.0<span class="w"> </span>torch-1.9.0<span class="w"> </span>typing-extensions-3.10.0.0
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition warning">
|
||||
@ -393,10 +393,10 @@ the <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/m
|
||||
on CPU.</p>
|
||||
<section id="data-preparation">
|
||||
<h3>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">PYTHONPATH</span><span class="o">=</span>/tmp/icefall:<span class="nv">$PYTHONPATH</span>
|
||||
$ <span class="nb">cd</span> /tmp/icefall
|
||||
$ <span class="nb">cd</span> egs/yesno/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">PYTHONPATH</span><span class="o">=</span>/tmp/icefall:<span class="nv">$PYTHONPATH</span>
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>/tmp/icefall
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The log of running <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> is:</p>
|
||||
@ -457,7 +457,7 @@ even if there are GPUs available.</p>
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>In case you get a <code class="docutils literal notranslate"><span class="pre">Segmentation</span> <span class="pre">fault</span> <span class="pre">(core</span> <span class="pre">dump)</span></code> error, please use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION</span><span class="o">=</span>python
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION</span><span class="o">=</span>python
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
|
||||
@ -123,13 +123,13 @@ as an example.</p>
|
||||
<p class="admonition-title">Note</p>
|
||||
<p>The steps for other recipes are almost the same.</p>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
|
||||
./pruned_transducer_stateless3/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless3/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="m">20</span> <span class="se">\</span>
|
||||
--avg <span class="m">10</span>
|
||||
./pruned_transducer_stateless3/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless3/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">10</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>will generate a file <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless3/exp/pretrained.pt</span></code>, which
|
||||
@ -141,10 +141,10 @@ is a dict containing <code class="docutils literal notranslate"><span class="pre
|
||||
You can find links to pretrained models in <code class="docutils literal notranslate"><span class="pre">RESULTS.md</span></code> of each dataset.</p>
|
||||
<p>In the following, we demonstrate how to use the pretrained model from
|
||||
<a class="reference external" href="https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13">https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13</a>.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
|
||||
git lfs install
|
||||
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
|
||||
git<span class="w"> </span>lfs<span class="w"> </span>install
|
||||
git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>After cloning the repo with <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">lfs</span></code>, you will find several files in the folder
|
||||
@ -153,15 +153,15 @@ that have a prefix <code class="docutils literal notranslate"><span class="pre">
|
||||
exported by the above <code class="docutils literal notranslate"><span class="pre">export.py</span></code>.</p>
|
||||
<p>In each recipe, there is also a file <code class="docutils literal notranslate"><span class="pre">pretrained.py</span></code>, which can use
|
||||
<code class="docutils literal notranslate"><span class="pre">pretrained-xxx.pt</span></code> to decode waves. The following is an example:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
|
||||
./pruned_transducer_stateless3/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt <span class="se">\</span>
|
||||
--bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method greedy_search <span class="se">\</span>
|
||||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav
|
||||
./pruned_transducer_stateless3/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The above commands show how to use the exported model with <code class="docutils literal notranslate"><span class="pre">pretrained.py</span></code> to
|
||||
@ -195,25 +195,25 @@ decode multiple sound files. Its output is given as follows for reference:</p>
|
||||
<p>When we publish the model, we always note down its WERs on some test
|
||||
dataset in <code class="docutils literal notranslate"><span class="pre">RESULTS.md</span></code>. This section describes how to use the
|
||||
pretrained model to reproduce the WER.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
|
||||
git lfs install
|
||||
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
git<span class="w"> </span>lfs<span class="w"> </span>install
|
||||
git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
|
||||
|
||||
<span class="nb">cd</span> icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp
|
||||
ln -s pretrained-iter-1224000-avg-14.pt epoch-9999.pt
|
||||
<span class="nb">cd</span> ../..
|
||||
<span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp
|
||||
ln<span class="w"> </span>-s<span class="w"> </span>pretrained-iter-1224000-avg-14.pt<span class="w"> </span>epoch-9999.pt
|
||||
<span class="nb">cd</span><span class="w"> </span>../..
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>We create a symlink with name <code class="docutils literal notranslate"><span class="pre">epoch-9999.pt</span></code> to <code class="docutils literal notranslate"><span class="pre">pretrained-iter-1224000-avg-14.pt</span></code>,
|
||||
so that we can pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span></code> to <code class="docutils literal notranslate"><span class="pre">decode.py</span></code> in the following
|
||||
command:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless3/decode.py <span class="se">\</span>
|
||||
--epoch <span class="m">9999</span> <span class="se">\</span>
|
||||
--avg <span class="m">1</span> <span class="se">\</span>
|
||||
--exp-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp <span class="se">\</span>
|
||||
--lang-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500 <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method greedy_search
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless3/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">9999</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lang-dir<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span>greedy_search
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You will find the decoding results in
|
||||
|
||||
@ -106,16 +106,16 @@ to run the pretrained model.</p>
|
||||
<p>We use
|
||||
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3">https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3</a>
|
||||
as an example in the following.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
<span class="nv">epoch</span><span class="o">=</span><span class="m">14</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">2</span>
|
||||
|
||||
./pruned_transducer_stateless3/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless3/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--onnx <span class="m">1</span>
|
||||
./pruned_transducer_stateless3/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless3/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--onnx<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate the following files inside <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless3/exp</span></code>:</p>
|
||||
@ -130,16 +130,16 @@ as an example in the following.</p>
|
||||
</div></blockquote>
|
||||
<p>You can use <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless3/exp/onnx_pretrained.py</span></code> to decode
|
||||
waves with the generated files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless3/onnx_pretrained.py <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--encoder-model-filename ./pruned_transducer_stateless3/exp/encoder.onnx <span class="se">\</span>
|
||||
--decoder-model-filename ./pruned_transducer_stateless3/exp/decoder.onnx <span class="se">\</span>
|
||||
--joiner-model-filename ./pruned_transducer_stateless3/exp/joiner.onnx <span class="se">\</span>
|
||||
--joiner-encoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_encoder_proj.onnx <span class="se">\</span>
|
||||
--joiner-decoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_decoder_proj.onnx <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav <span class="se">\</span>
|
||||
/path/to/baz.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless3/onnx_pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--encoder-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/encoder.onnx<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoder-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/decoder.onnx<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/joiner.onnx<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-encoder-proj-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/joiner_encoder_proj.onnx<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-decoder-proj-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/joiner_decoder_proj.onnx<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/baz.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
@ -108,16 +108,16 @@ if you want to use <code class="docutils literal notranslate"><span class="pre">
|
||||
<p>We use
|
||||
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3">https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3</a>
|
||||
as an example in the following.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
<span class="nv">epoch</span><span class="o">=</span><span class="m">14</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">1</span>
|
||||
|
||||
./pruned_transducer_stateless3/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless3/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--jit <span class="m">1</span>
|
||||
./pruned_transducer_stateless3/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless3/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless3/exp</span></code>.</p>
|
||||
|
||||
@ -111,14 +111,14 @@ as an example in the following.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
|
||||
|
||||
<span class="nb">cd</span> egs/librispeech/ASR
|
||||
<span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
|
||||
./lstm_transducer_stateless2/export.py <span class="se">\</span>
|
||||
--exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--jit-trace <span class="m">1</span>
|
||||
./lstm_transducer_stateless2/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit-trace<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate three files inside <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/exp</span></code>:</p>
|
||||
@ -132,15 +132,15 @@ as an example in the following.</p>
|
||||
<p>You can use
|
||||
<a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/jit_pretrained.py">https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/jit_pretrained.py</a>
|
||||
to decode sound files with the following commands:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
|
||||
./lstm_transducer_stateless2/jit_pretrained.py <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt <span class="se">\</span>
|
||||
--decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt <span class="se">\</span>
|
||||
--joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav <span class="se">\</span>
|
||||
/path/to/baz.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
./lstm_transducer_stateless2/jit_pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--encoder-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoder-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/baz.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
@ -130,8 +130,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
|
||||
</div></blockquote>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -146,13 +146,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -167,8 +167,8 @@ the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>A 3-gram language model will be downloaded from huggingface, we assume you have
|
||||
intalled and initialized <code class="docutils literal notranslate"><span class="pre">git-lfs</span></code>. If not, you could install <code class="docutils literal notranslate"><span class="pre">git-lfs</span></code> by</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ sudo apt-get install git-lfs
|
||||
$ git-lfs install
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>apt-get<span class="w"> </span>install<span class="w"> </span>git-lfs
|
||||
$<span class="w"> </span>git-lfs<span class="w"> </span>install
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>If you don’t have the <code class="docutils literal notranslate"><span class="pre">sudo</span></code> permission, you could download the
|
||||
@ -184,8 +184,8 @@ are saved in <code class="docutils literal notranslate"><span class="pre">./data
|
||||
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./conformer_ctc/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -227,26 +227,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./conformer_ctc/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./conformer_ctc/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./conformer_ctc/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -299,7 +299,7 @@ Each epoch actually processes <code class="docutils literal notranslate"><span c
|
||||
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./conformer_ctc/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -308,8 +308,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> conformer_ctc/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --name <span class="s2">"Aishell conformer ctc training with icefall"</span> --description <span class="s2">"Training with new LabelSmoothing loss, see https://github.com/k2-fsa/icefall/pull/109"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>conformer_ctc/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--name<span class="w"> </span><span class="s2">"Aishell conformer ctc training with icefall"</span><span class="w"> </span>--description<span class="w"> </span><span class="s2">"Training with new LabelSmoothing loss, see https://github.com/k2-fsa/icefall/pull/109"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -351,25 +351,25 @@ you saw printed to the console during training.</p>
|
||||
<p>The following shows typical use cases:</p>
|
||||
<section id="case-1">
|
||||
<h4><strong>Case 1</strong><a class="headerlink" href="#case-1" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./conformer_ctc/train.py --max-duration <span class="m">200</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">200</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It uses <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> of 200 to avoid OOM.</p>
|
||||
</section>
|
||||
<section id="case-2">
|
||||
<h4><strong>Case 2</strong><a class="headerlink" href="#case-2" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,3"</span>
|
||||
$ ./conformer_ctc/train.py --world-size <span class="m">2</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,3"</span>
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It uses GPU 0 and GPU 3 for DDP training.</p>
|
||||
</section>
|
||||
<section id="case-3">
|
||||
<h4><strong>Case 3</strong><a class="headerlink" href="#case-3" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./conformer_ctc/train.py --num-epochs <span class="m">10</span> --start-epoch <span class="m">3</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">3</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It loads checkpoint <code class="docutils literal notranslate"><span class="pre">./conformer_ctc/exp/epoch-2.pt</span></code> and starts
|
||||
@ -381,8 +381,8 @@ training from epoch 3. Also, it trains for 10 epochs.</p>
|
||||
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
|
||||
<p>The decoding part uses checkpoints saved by the training part, so you have
|
||||
to run the training part first.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./conformer_ctc/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.</p>
|
||||
@ -440,27 +440,27 @@ $ git clone https://huggingface.co/pkufool/icefall_asr_aishell_conformer_ctc
|
||||
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
|
||||
</div>
|
||||
<p>After downloading, you will have the following files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ tree tmp
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>tree<span class="w"> </span>tmp
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/
|
||||
<span class="sb">`</span>-- icefall_asr_aishell_conformer_ctc
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- data
|
||||
<span class="p">|</span> <span class="sb">`</span>-- lang_char
|
||||
<span class="p">|</span> <span class="p">|</span>-- HLG.pt
|
||||
<span class="p">|</span> <span class="p">|</span>-- tokens.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- words.txt
|
||||
<span class="p">|</span>-- exp
|
||||
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
|
||||
<span class="sb">`</span>-- test_waves
|
||||
<span class="p">|</span>-- BAC009S0764W0121.wav
|
||||
<span class="p">|</span>-- BAC009S0764W0122.wav
|
||||
<span class="p">|</span>-- BAC009S0764W0123.wav
|
||||
<span class="sb">`</span>-- trans.txt
|
||||
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_aishell_conformer_ctc
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lang_char
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_waves
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0121.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0122.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0123.wav
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
|
||||
|
||||
<span class="m">5</span> directories, <span class="m">9</span> files
|
||||
<span class="m">5</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">9</span><span class="w"> </span>files
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>File descriptions</strong>:</p>
|
||||
@ -502,38 +502,38 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>The information of the test sound files is listed below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall_asr_aishell_conformer_ctc/test_waves/*.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall_asr_aishell_conformer_ctc/test_waves/*.wav
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.20 <span class="o">=</span> <span class="m">67263</span> samples ~ <span class="m">315</span>.295 CDDA sectors
|
||||
File Size : 135k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.20<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">67263</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">315</span>.295<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>135k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.12 <span class="o">=</span> <span class="m">65840</span> samples ~ <span class="m">308</span>.625 CDDA sectors
|
||||
File Size : 132k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.12<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">65840</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">308</span>.625<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>132k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.00 <span class="o">=</span> <span class="m">64000</span> samples ~ <span class="m">300</span> CDDA sectors
|
||||
File Size : 128k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.00<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">64000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">300</span><span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>128k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:12.32
|
||||
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:12.32
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -556,14 +556,14 @@ $ ./conformer_ctc/pretrained.py --help
|
||||
<h4>CTC decoding<a class="headerlink" href="#ctc-decoding" title="Permalink to this heading"></a></h4>
|
||||
<p>CTC decoding only uses the ctc topology for decoding without a lexicon and language model</p>
|
||||
<p>The command to run CTC decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt <span class="se">\</span>
|
||||
--tokens-file ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/tokens.txt <span class="se">\</span>
|
||||
--method ctc-decoding <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--tokens-file<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/tokens.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is given below:</p>
|
||||
@ -593,15 +593,15 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
<h4>HLG decoding<a class="headerlink" href="#hlg-decoding" title="Permalink to this heading"></a></h4>
|
||||
<p>HLG decoding uses the best path of the decoding lattice as the decoding result.</p>
|
||||
<p>The command to run HLG decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt <span class="se">\</span>
|
||||
--words-file ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt <span class="se">\</span>
|
||||
--method 1best <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is given below:</p>
|
||||
@ -633,15 +633,15 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
<p>It extracts n paths from the lattice, recores the extracted paths with
|
||||
an attention decoder. The path with the highest score is the decoding result.</p>
|
||||
<p>The command to run HLG decoding + attention decoder rescoring is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt <span class="se">\</span>
|
||||
--words-file ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt <span class="se">\</span>
|
||||
--method attention-decoder <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>attention-decoder<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is below:</p>
|
||||
@ -693,20 +693,20 @@ Python dependencies.</p>
|
||||
<p>At present, it does NOT support streaming decoding.</p>
|
||||
</div>
|
||||
<p>First, let us compile k2 from source:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> <span class="nv">$HOME</span>
|
||||
$ git clone https://github.com/k2-fsa/k2
|
||||
$ <span class="nb">cd</span> k2
|
||||
$ git checkout v2.0-pre
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span><span class="nv">$HOME</span>
|
||||
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/k2-fsa/k2
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>k2
|
||||
$<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>v2.0-pre
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
<p class="admonition-title">Caution</p>
|
||||
<p>You have to switch to the branch <code class="docutils literal notranslate"><span class="pre">v2.0-pre</span></code>!</p>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ mkdir build-release
|
||||
$ <span class="nb">cd</span> build-release
|
||||
$ cmake -DCMAKE_BUILD_TYPE<span class="o">=</span>Release ..
|
||||
$ make -j hlg_decode
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>build-release
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build-release
|
||||
$<span class="w"> </span>cmake<span class="w"> </span>-DCMAKE_BUILD_TYPE<span class="o">=</span>Release<span class="w"> </span>..
|
||||
$<span class="w"> </span>make<span class="w"> </span>-j<span class="w"> </span>hlg_decode
|
||||
|
||||
<span class="c1"># You will find four binaries in `./bin`, i.e. ./bin/hlg_decode,</span>
|
||||
</pre></div>
|
||||
@ -714,8 +714,8 @@ $ make -j hlg_decode
|
||||
<p>Now you are ready to go!</p>
|
||||
<p>Assume you have run:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> k2/build-release
|
||||
$ ln -s /path/to/icefall-asr-aishell-conformer-ctc ./
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>k2/build-release
|
||||
$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>/path/to/icefall-asr-aishell-conformer-ctc<span class="w"> </span>./
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -724,40 +724,40 @@ $ ln -s /path/to/icefall-asr-aishell-conformer-ctc ./
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will show you the following message:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Please provide --nn_model
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Please<span class="w"> </span>provide<span class="w"> </span>--nn_model
|
||||
|
||||
This file implements decoding with an HLG decoding graph.
|
||||
This<span class="w"> </span>file<span class="w"> </span>implements<span class="w"> </span>decoding<span class="w"> </span>with<span class="w"> </span>an<span class="w"> </span>HLG<span class="w"> </span>decoding<span class="w"> </span>graph.
|
||||
|
||||
Usage:
|
||||
./bin/hlg_decode <span class="se">\</span>
|
||||
--use_gpu <span class="nb">true</span> <span class="se">\</span>
|
||||
--nn_model <path to torch scripted pt file> <span class="se">\</span>
|
||||
--hlg <path to HLG.pt> <span class="se">\</span>
|
||||
--word_table <path to words.txt> <span class="se">\</span>
|
||||
<path to foo.wav> <span class="se">\</span>
|
||||
<path to bar.wav> <span class="se">\</span>
|
||||
<more waves <span class="k">if</span> any>
|
||||
<span class="w"> </span>./bin/hlg_decode<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn_model<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>torch<span class="w"> </span>scripted<span class="w"> </span>pt<span class="w"> </span>file><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--hlg<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>HLG.pt><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--word_table<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>words.txt><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>foo.wav><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>bar.wav><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span><more<span class="w"> </span>waves<span class="w"> </span><span class="k">if</span><span class="w"> </span>any>
|
||||
|
||||
To see all possible options, use
|
||||
./bin/hlg_decode --help
|
||||
To<span class="w"> </span>see<span class="w"> </span>all<span class="w"> </span>possible<span class="w"> </span>options,<span class="w"> </span>use
|
||||
<span class="w"> </span>./bin/hlg_decode<span class="w"> </span>--help
|
||||
|
||||
Caution:
|
||||
- Only sound files <span class="o">(</span>*.wav<span class="o">)</span> with single channel are supported.
|
||||
- It assumes the model is conformer_ctc/transformer.py from icefall.
|
||||
If you use a different model, you have to change the code
|
||||
related to <span class="sb">`</span>model.forward<span class="sb">`</span> <span class="k">in</span> this file.
|
||||
<span class="w"> </span>-<span class="w"> </span>Only<span class="w"> </span>sound<span class="w"> </span>files<span class="w"> </span><span class="o">(</span>*.wav<span class="o">)</span><span class="w"> </span>with<span class="w"> </span>single<span class="w"> </span>channel<span class="w"> </span>are<span class="w"> </span>supported.
|
||||
<span class="w"> </span>-<span class="w"> </span>It<span class="w"> </span>assumes<span class="w"> </span>the<span class="w"> </span>model<span class="w"> </span>is<span class="w"> </span>conformer_ctc/transformer.py<span class="w"> </span>from<span class="w"> </span>icefall.
|
||||
<span class="w"> </span>If<span class="w"> </span>you<span class="w"> </span>use<span class="w"> </span>a<span class="w"> </span>different<span class="w"> </span>model,<span class="w"> </span>you<span class="w"> </span>have<span class="w"> </span>to<span class="w"> </span>change<span class="w"> </span>the<span class="w"> </span>code
|
||||
<span class="w"> </span>related<span class="w"> </span>to<span class="w"> </span><span class="sb">`</span>model.forward<span class="sb">`</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>this<span class="w"> </span>file.
|
||||
</pre></div>
|
||||
</div>
|
||||
<section id="id2">
|
||||
<h3>HLG decoding<a class="headerlink" href="#id2" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode <span class="se">\</span>
|
||||
--use_gpu <span class="nb">true</span> <span class="se">\</span>
|
||||
--nn_model icefall_asr_aishell_conformer_ctc/exp/cpu_jit.pt <span class="se">\</span>
|
||||
--hlg icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt <span class="se">\</span>
|
||||
--word_table icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt <span class="se">\</span>
|
||||
icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
|
||||
icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
|
||||
icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn_model<span class="w"> </span>icefall_asr_aishell_conformer_ctc/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--hlg<span class="w"> </span>icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--word_table<span class="w"> </span>icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is:</p>
|
||||
|
||||
@ -211,9 +211,9 @@ alternatives.</p>
|
||||
<section id="data-preparation">
|
||||
<h2>Data Preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<p>To prepare the data for training, please use the following commands:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/aishell/ASR
|
||||
./prepare.sh --stop-stage <span class="m">4</span>
|
||||
./prepare.sh --stage <span class="m">6</span> --stop-stage <span class="m">6</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
./prepare.sh<span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">4</span>
|
||||
./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">6</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">6</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition note">
|
||||
@ -231,8 +231,8 @@ are not used in transducer training.</p>
|
||||
</section>
|
||||
<section id="training">
|
||||
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/aishell/ASR
|
||||
./transducer_stateless_modified/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
./transducer_stateless_modified/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -274,26 +274,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./transducer_stateless_modified/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./transducer_stateless_modified/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./transducer_stateless_modified/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -358,7 +358,7 @@ Each epoch actually processes <code class="docutils literal notranslate"><span c
|
||||
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./transducer_stateless_modified/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -367,8 +367,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> transducer_stateless_modified/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --name <span class="s2">"Aishell transducer training with icefall"</span> --description <span class="s2">"Training modified transducer, see https://github.com/k2-fsa/icefall/pull/219"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>transducer_stateless_modified/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--name<span class="w"> </span><span class="s2">"Aishell transducer training with icefall"</span><span class="w"> </span>--description<span class="w"> </span><span class="s2">"Training modified transducer, see https://github.com/k2-fsa/icefall/pull/219"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -410,25 +410,25 @@ you saw printed to the console during training.</p>
|
||||
<p>The following shows typical use cases:</p>
|
||||
<section id="case-1">
|
||||
<h4><strong>Case 1</strong><a class="headerlink" href="#case-1" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./transducer_stateless_modified/train.py --max-duration <span class="m">250</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">250</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It uses <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> of 250 to avoid OOM.</p>
|
||||
</section>
|
||||
<section id="case-2">
|
||||
<h4><strong>Case 2</strong><a class="headerlink" href="#case-2" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,3"</span>
|
||||
$ ./transducer_stateless_modified/train.py --world-size <span class="m">2</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,3"</span>
|
||||
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It uses GPU 0 and GPU 3 for DDP training.</p>
|
||||
</section>
|
||||
<section id="case-3">
|
||||
<h4><strong>Case 3</strong><a class="headerlink" href="#case-3" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./transducer_stateless_modified/train.py --num-epochs <span class="m">10</span> --start-epoch <span class="m">3</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">3</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It loads checkpoint <code class="docutils literal notranslate"><span class="pre">./transducer_stateless_modified/exp/epoch-2.pt</span></code> and starts
|
||||
@ -440,8 +440,8 @@ training from epoch 3. Also, it trains for 10 epochs.</p>
|
||||
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
|
||||
<p>The decoding part uses checkpoints saved by the training part, so you have
|
||||
to run the training part first.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./transducer_stateless_modified/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./transducer_stateless_modified/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.</p>
|
||||
@ -539,34 +539,34 @@ $ git clone https://huggingface.co/csukuangfj/icefall-aishell-transducer-statele
|
||||
<p>You have to use <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">lfs</span></code> to download the pre-trained model.</p>
|
||||
</div>
|
||||
<p>After downloading, you will have the following files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ tree tmp/icefall-aishell-transducer-stateless-modified-2022-03-01
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>tree<span class="w"> </span>tmp/icefall-aishell-transducer-stateless-modified-2022-03-01
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- data
|
||||
<span class="p">|</span> <span class="sb">`</span>-- lang_char
|
||||
<span class="p">|</span> <span class="p">|</span>-- L.pt
|
||||
<span class="p">|</span> <span class="p">|</span>-- lexicon.txt
|
||||
<span class="p">|</span> <span class="p">|</span>-- tokens.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- words.txt
|
||||
<span class="p">|</span>-- exp
|
||||
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
|
||||
<span class="p">|</span>-- log
|
||||
<span class="p">|</span> <span class="p">|</span>-- errs-test-beam_4-epoch-64-avg-33-beam-4.txt
|
||||
<span class="p">|</span> <span class="p">|</span>-- errs-test-greedy_search-epoch-64-avg-33-context-2-max-sym-per-frame-1.txt
|
||||
<span class="p">|</span> <span class="p">|</span>-- log-decode-epoch-64-avg-33-beam-4-2022-03-02-12-05-03
|
||||
<span class="p">|</span> <span class="p">|</span>-- log-decode-epoch-64-avg-33-context-2-max-sym-per-frame-1-2022-02-28-18-13-07
|
||||
<span class="p">|</span> <span class="p">|</span>-- recogs-test-beam_4-epoch-64-avg-33-beam-4.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- recogs-test-greedy_search-epoch-64-avg-33-context-2-max-sym-per-frame-1.txt
|
||||
<span class="sb">`</span>-- test_wavs
|
||||
<span class="p">|</span>-- BAC009S0764W0121.wav
|
||||
<span class="p">|</span>-- BAC009S0764W0122.wav
|
||||
<span class="p">|</span>-- BAC009S0764W0123.wav
|
||||
<span class="sb">`</span>-- transcript.txt
|
||||
<span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="p">|</span>--<span class="w"> </span>data
|
||||
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lang_char
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>L.pt
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lexicon.txt
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
|
||||
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
|
||||
<span class="p">|</span>--<span class="w"> </span>exp
|
||||
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
|
||||
<span class="p">|</span>--<span class="w"> </span>log
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>errs-test-beam_4-epoch-64-avg-33-beam-4.txt
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>errs-test-greedy_search-epoch-64-avg-33-context-2-max-sym-per-frame-1.txt
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>log-decode-epoch-64-avg-33-beam-4-2022-03-02-12-05-03
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>log-decode-epoch-64-avg-33-context-2-max-sym-per-frame-1-2022-02-28-18-13-07
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>recogs-test-beam_4-epoch-64-avg-33-beam-4.txt
|
||||
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>recogs-test-greedy_search-epoch-64-avg-33-context-2-max-sym-per-frame-1.txt
|
||||
<span class="sb">`</span>--<span class="w"> </span>test_wavs
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0121.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0122.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0123.wav
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>transcript.txt
|
||||
|
||||
<span class="m">5</span> directories, <span class="m">16</span> files
|
||||
<span class="m">5</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">16</span><span class="w"> </span>files
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>File descriptions</strong>:</p>
|
||||
@ -595,38 +595,38 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>The information of the test sound files is listed below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/*.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/*.wav
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.20 <span class="o">=</span> <span class="m">67263</span> samples ~ <span class="m">315</span>.295 CDDA sectors
|
||||
File Size : 135k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.20<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">67263</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">315</span>.295<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>135k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.12 <span class="o">=</span> <span class="m">65840</span> samples ~ <span class="m">308</span>.625 CDDA sectors
|
||||
File Size : 132k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.12<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">65840</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">308</span>.625<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>132k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.00 <span class="o">=</span> <span class="m">64000</span> samples ~ <span class="m">300</span> CDDA sectors
|
||||
File Size : 128k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.00<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">64000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">300</span><span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>128k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:12.32
|
||||
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:12.32
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -655,14 +655,14 @@ it may give you poor results.</p>
|
||||
<section id="greedy-search">
|
||||
<h4>Greedy search<a class="headerlink" href="#greedy-search" title="Permalink to this heading"></a></h4>
|
||||
<p>The command to run greedy search is given below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt <span class="se">\</span>
|
||||
--lang-dir ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char <span class="se">\</span>
|
||||
--method greedy_search <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./transducer_stateless_modified/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lang-dir<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is as follows:</p>
|
||||
@ -692,16 +692,16 @@ $ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
|
||||
<section id="beam-search">
|
||||
<h4>Beam search<a class="headerlink" href="#beam-search" title="Permalink to this heading"></a></h4>
|
||||
<p>The command to run beam search is given below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
|
||||
$ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt <span class="se">\</span>
|
||||
--lang-dir ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char <span class="se">\</span>
|
||||
--method beam_search <span class="se">\</span>
|
||||
--beam-size <span class="m">4</span> <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
|
||||
$<span class="w"> </span>./transducer_stateless_modified/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lang-dir<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>beam_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is as follows:</p>
|
||||
@ -731,16 +731,16 @@ $ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
|
||||
<section id="modified-beam-search">
|
||||
<h4>Modified Beam search<a class="headerlink" href="#modified-beam-search" title="Permalink to this heading"></a></h4>
|
||||
<p>The command to run modified beam search is given below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
|
||||
$ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt <span class="se">\</span>
|
||||
--lang-dir ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char <span class="se">\</span>
|
||||
--method modified_beam_search <span class="se">\</span>
|
||||
--beam-size <span class="m">4</span> <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav <span class="se">\</span>
|
||||
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
|
||||
$<span class="w"> </span>./transducer_stateless_modified/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lang-dir<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>modified_beam_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is as follows:</p>
|
||||
|
||||
@ -130,8 +130,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
|
||||
</div></blockquote>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -146,13 +146,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -167,8 +167,8 @@ the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>A 3-gram language model will be downloaded from huggingface, we assume you have
|
||||
intalled and initialized <code class="docutils literal notranslate"><span class="pre">git-lfs</span></code>. If not, you could install <code class="docutils literal notranslate"><span class="pre">git-lfs</span></code> by</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ sudo apt-get install git-lfs
|
||||
$ git-lfs install
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>apt-get<span class="w"> </span>install<span class="w"> </span>git-lfs
|
||||
$<span class="w"> </span>git-lfs<span class="w"> </span>install
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>If you don’t have the <code class="docutils literal notranslate"><span class="pre">sudo</span></code> permission, you could download the
|
||||
@ -184,8 +184,8 @@ are saved in <code class="docutils literal notranslate"><span class="pre">./data
|
||||
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./tdnn_lstm_ctc/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -223,26 +223,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -295,7 +295,7 @@ You will find the following files in that directory:</p>
|
||||
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -304,8 +304,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn_lstm_ctc/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"TDNN-LSTM CTC training for Aishell with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn_lstm_ctc/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"TDNN-LSTM CTC training for Aishell with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -347,17 +347,17 @@ you saw printed to the console during training.</p>
|
||||
<p>The following shows typical use cases:</p>
|
||||
<section id="case-1">
|
||||
<h4><strong>Case 1</strong><a class="headerlink" href="#case-1" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,3"</span>
|
||||
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">2</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,3"</span>
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It uses GPU 0 and GPU 3 for DDP training.</p>
|
||||
</section>
|
||||
<section id="case-2">
|
||||
<h4><strong>Case 2</strong><a class="headerlink" href="#case-2" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./tdnn_lstm_ctc/train.py --num-epochs <span class="m">10</span> --start-epoch <span class="m">3</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">3</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It loads checkpoint <code class="docutils literal notranslate"><span class="pre">./tdnn_lstm_ctc/exp/epoch-2.pt</span></code> and starts
|
||||
@ -369,8 +369,8 @@ training from epoch 3. Also, it trains for 10 epochs.</p>
|
||||
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
|
||||
<p>The decoding part uses checkpoints saved by the training part, so you have
|
||||
to run the training part first.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./tdnn_lstm_ctc/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.</p>
|
||||
@ -424,27 +424,27 @@ $ git clone https://huggingface.co/pkufool/icefall_asr_aishell_tdnn_lstm_ctc
|
||||
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
|
||||
</div>
|
||||
<p>After downloading, you will have the following files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ tree tmp
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>tree<span class="w"> </span>tmp
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/
|
||||
<span class="sb">`</span>-- icefall_asr_aishell_tdnn_lstm_ctc
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- data
|
||||
<span class="p">|</span> <span class="sb">`</span>-- lang_phone
|
||||
<span class="p">|</span> <span class="p">|</span>-- HLG.pt
|
||||
<span class="p">|</span> <span class="p">|</span>-- tokens.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- words.txt
|
||||
<span class="p">|</span>-- exp
|
||||
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
|
||||
<span class="sb">`</span>-- test_waves
|
||||
<span class="p">|</span>-- BAC009S0764W0121.wav
|
||||
<span class="p">|</span>-- BAC009S0764W0122.wav
|
||||
<span class="p">|</span>-- BAC009S0764W0123.wav
|
||||
<span class="sb">`</span>-- trans.txt
|
||||
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_aishell_tdnn_lstm_ctc
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lang_phone
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_waves
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0121.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0122.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0123.wav
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
|
||||
|
||||
<span class="m">5</span> directories, <span class="m">9</span> files
|
||||
<span class="m">5</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">9</span><span class="w"> </span>files
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>File descriptions</strong>:</p>
|
||||
@ -486,38 +486,38 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>The information of the test sound files is listed below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/*.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/*.wav
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0121.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.20 <span class="o">=</span> <span class="m">67263</span> samples ~ <span class="m">315</span>.295 CDDA sectors
|
||||
File Size : 135k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0121.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.20<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">67263</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">315</span>.295<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>135k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0122.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.12 <span class="o">=</span> <span class="m">65840</span> samples ~ <span class="m">308</span>.625 CDDA sectors
|
||||
File Size : 132k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0122.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.12<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">65840</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">308</span>.625<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>132k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0123.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.00 <span class="o">=</span> <span class="m">64000</span> samples ~ <span class="m">300</span> CDDA sectors
|
||||
File Size : 128k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0123.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.00<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">64000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">300</span><span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>128k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:12.32
|
||||
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:12.32
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -532,15 +532,15 @@ $ ./tdnn_lstm_ctc/pretrained.py --help
|
||||
<h4>HLG decoding<a class="headerlink" href="#hlg-decoding" title="Permalink to this heading"></a></h4>
|
||||
<p>HLG decoding uses the best path of the decoding lattice as the decoding result.</p>
|
||||
<p>The command to run HLG decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
|
||||
$ ./tdnn_lstm_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall_asr_aishell_tdnn_lstm_ctc/exp/pretrained.pt <span class="se">\</span>
|
||||
--words-file ./tmp/icefall_asr_aishell_tdnn_lstm_ctc/data/lang_phone/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp/icefall_asr_aishell_tdnn_lstm_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
|
||||
--method 1best <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0123.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0123.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is given below:</p>
|
||||
|
||||
@ -136,8 +136,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
|
||||
</div></blockquote>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -152,13 +152,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -190,8 +190,8 @@ the following YouTube channel by <a class="reference external" href="https://www
|
||||
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./conformer_ctc/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -240,26 +240,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./conformer_ctc/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./conformer_ctc/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./conformer_ctc/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -307,7 +307,7 @@ You will find the following files in that directory:</p>
|
||||
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./conformer_ctc/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -316,8 +316,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> conformer_ctc/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"Conformer CTC training for LibriSpeech with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>conformer_ctc/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"Conformer CTC training for LibriSpeech with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -358,8 +358,8 @@ you saw printed to the console during training.</p>
|
||||
<p>The following shows typical use cases:</p>
|
||||
<section id="case-1">
|
||||
<h4><strong>Case 1</strong><a class="headerlink" href="#case-1" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./conformer_ctc/train.py --max-duration <span class="m">200</span> --full-libri <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">200</span><span class="w"> </span>--full-libri<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It uses <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> of 200 to avoid OOM. Also, it uses only
|
||||
@ -367,17 +367,17 @@ a subset of the LibriSpeech data for training.</p>
|
||||
</section>
|
||||
<section id="case-2">
|
||||
<h4><strong>Case 2</strong><a class="headerlink" href="#case-2" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,3"</span>
|
||||
$ ./conformer_ctc/train.py --world-size <span class="m">2</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,3"</span>
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It uses GPU 0 and GPU 3 for DDP training.</p>
|
||||
</section>
|
||||
<section id="case-3">
|
||||
<h4><strong>Case 3</strong><a class="headerlink" href="#case-3" title="Permalink to this heading"></a></h4>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./conformer_ctc/train.py --num-epochs <span class="m">10</span> --start-epoch <span class="m">3</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">3</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It loads checkpoint <code class="docutils literal notranslate"><span class="pre">./conformer_ctc/exp/epoch-2.pt</span></code> and starts
|
||||
@ -389,8 +389,8 @@ training from epoch 3. Also, it trains for 10 epochs.</p>
|
||||
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
|
||||
<p>The decoding part uses checkpoints saved by the training part, so you have
|
||||
to run the training part first.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./conformer_ctc/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.</p>
|
||||
@ -425,54 +425,54 @@ value may cause OOM.</p>
|
||||
</div></blockquote>
|
||||
<p>Here are some results for CTC decoding with a vocab size of 500:</p>
|
||||
<p>Usage:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
<span class="c1"># NOTE: Tested with a model with vocab size 500.</span>
|
||||
<span class="c1"># It won't work for a model with vocab size 5000.</span>
|
||||
$ ./conformer_ctc/decode.py <span class="se">\</span>
|
||||
--epoch <span class="m">25</span> <span class="se">\</span>
|
||||
--avg <span class="m">1</span> <span class="se">\</span>
|
||||
--max-duration <span class="m">300</span> <span class="se">\</span>
|
||||
--exp-dir conformer_ctc/exp <span class="se">\</span>
|
||||
--lang-dir data/lang_bpe_500 <span class="se">\</span>
|
||||
--method ctc-decoding
|
||||
$<span class="w"> </span>./conformer_ctc/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">300</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>conformer_ctc/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lang-dir<span class="w"> </span>data/lang_bpe_500<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is given below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="m">2021</span>-09-26 <span class="m">12</span>:44:31,033 INFO <span class="o">[</span>decode.py:537<span class="o">]</span> Decoding started
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:31,033 INFO <span class="o">[</span>decode.py:538<span class="o">]</span>
|
||||
<span class="o">{</span><span class="s1">'lm_dir'</span>: PosixPath<span class="o">(</span><span class="s1">'data/lm'</span><span class="o">)</span>, <span class="s1">'subsampling_factor'</span>: <span class="m">4</span>, <span class="s1">'vgg_frontend'</span>: False, <span class="s1">'use_feat_batchnorm'</span>: True,
|
||||
<span class="s1">'feature_dim'</span>: <span class="m">80</span>, <span class="s1">'nhead'</span>: <span class="m">8</span>, <span class="s1">'attention_dim'</span>: <span class="m">512</span>, <span class="s1">'num_decoder_layers'</span>: <span class="m">6</span>, <span class="s1">'search_beam'</span>: <span class="m">20</span>, <span class="s1">'output_beam'</span>: <span class="m">8</span>,
|
||||
<span class="s1">'min_active_states'</span>: <span class="m">30</span>, <span class="s1">'max_active_states'</span>: <span class="m">10000</span>, <span class="s1">'use_double_scores'</span>: True,
|
||||
<span class="s1">'epoch'</span>: <span class="m">25</span>, <span class="s1">'avg'</span>: <span class="m">1</span>, <span class="s1">'method'</span>: <span class="s1">'ctc-decoding'</span>, <span class="s1">'num_paths'</span>: <span class="m">100</span>, <span class="s1">'nbest_scale'</span>: <span class="m">0</span>.5,
|
||||
<span class="s1">'export'</span>: False, <span class="s1">'exp_dir'</span>: PosixPath<span class="o">(</span><span class="s1">'conformer_ctc/exp'</span><span class="o">)</span>, <span class="s1">'lang_dir'</span>: PosixPath<span class="o">(</span><span class="s1">'data/lang_bpe_500'</span><span class="o">)</span>, <span class="s1">'full_libri'</span>: False,
|
||||
<span class="s1">'feature_dir'</span>: PosixPath<span class="o">(</span><span class="s1">'data/fbank'</span><span class="o">)</span>, <span class="s1">'max_duration'</span>: <span class="m">100</span>, <span class="s1">'bucketing_sampler'</span>: False, <span class="s1">'num_buckets'</span>: <span class="m">30</span>,
|
||||
<span class="s1">'concatenate_cuts'</span>: False, <span class="s1">'duration_factor'</span>: <span class="m">1</span>.0, <span class="s1">'gap'</span>: <span class="m">1</span>.0, <span class="s1">'on_the_fly_feats'</span>: False,
|
||||
<span class="s1">'shuffle'</span>: True, <span class="s1">'return_cuts'</span>: True, <span class="s1">'num_workers'</span>: <span class="m">2</span><span class="o">}</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:31,406 INFO <span class="o">[</span>lexicon.py:113<span class="o">]</span> Loading pre-compiled data/lang_bpe_500/Linv.pt
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:31,464 INFO <span class="o">[</span>decode.py:548<span class="o">]</span> device: cuda:0
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:36,171 INFO <span class="o">[</span>checkpoint.py:92<span class="o">]</span> Loading checkpoint from conformer_ctc/exp/epoch-25.pt
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:36,776 INFO <span class="o">[</span>decode.py:652<span class="o">]</span> Number of model parameters: <span class="m">109226120</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:37,714 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">0</span>/206, cuts processed <span class="k">until</span> now is <span class="m">12</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:15,944 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">100</span>/206, cuts processed <span class="k">until</span> now is <span class="m">1328</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:54,443 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">200</span>/206, cuts processed <span class="k">until</span> now is <span class="m">2563</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:56,411 INFO <span class="o">[</span>decode.py:494<span class="o">]</span> The transcripts are stored <span class="k">in</span> conformer_ctc/exp/recogs-test-clean-ctc-decoding.txt
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:56,592 INFO <span class="o">[</span>utils.py:331<span class="o">]</span> <span class="o">[</span>test-clean-ctc-decoding<span class="o">]</span> %WER <span class="m">3</span>.26% <span class="o">[</span><span class="m">1715</span> / <span class="m">52576</span>, <span class="m">163</span> ins, <span class="m">128</span> del, <span class="m">1424</span> sub <span class="o">]</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:56,807 INFO <span class="o">[</span>decode.py:506<span class="o">]</span> Wrote detailed error stats to conformer_ctc/exp/errs-test-clean-ctc-decoding.txt
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:56,808 INFO <span class="o">[</span>decode.py:522<span class="o">]</span>
|
||||
For test-clean, WER of different settings are:
|
||||
ctc-decoding <span class="m">3</span>.26 best <span class="k">for</span> test-clean
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:31,033<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:537<span class="o">]</span><span class="w"> </span>Decoding<span class="w"> </span>started
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:31,033<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:538<span class="o">]</span>
|
||||
<span class="o">{</span><span class="s1">'lm_dir'</span>:<span class="w"> </span>PosixPath<span class="o">(</span><span class="s1">'data/lm'</span><span class="o">)</span>,<span class="w"> </span><span class="s1">'subsampling_factor'</span>:<span class="w"> </span><span class="m">4</span>,<span class="w"> </span><span class="s1">'vgg_frontend'</span>:<span class="w"> </span>False,<span class="w"> </span><span class="s1">'use_feat_batchnorm'</span>:<span class="w"> </span>True,
|
||||
<span class="s1">'feature_dim'</span>:<span class="w"> </span><span class="m">80</span>,<span class="w"> </span><span class="s1">'nhead'</span>:<span class="w"> </span><span class="m">8</span>,<span class="w"> </span><span class="s1">'attention_dim'</span>:<span class="w"> </span><span class="m">512</span>,<span class="w"> </span><span class="s1">'num_decoder_layers'</span>:<span class="w"> </span><span class="m">6</span>,<span class="w"> </span><span class="s1">'search_beam'</span>:<span class="w"> </span><span class="m">20</span>,<span class="w"> </span><span class="s1">'output_beam'</span>:<span class="w"> </span><span class="m">8</span>,
|
||||
<span class="s1">'min_active_states'</span>:<span class="w"> </span><span class="m">30</span>,<span class="w"> </span><span class="s1">'max_active_states'</span>:<span class="w"> </span><span class="m">10000</span>,<span class="w"> </span><span class="s1">'use_double_scores'</span>:<span class="w"> </span>True,
|
||||
<span class="s1">'epoch'</span>:<span class="w"> </span><span class="m">25</span>,<span class="w"> </span><span class="s1">'avg'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'method'</span>:<span class="w"> </span><span class="s1">'ctc-decoding'</span>,<span class="w"> </span><span class="s1">'num_paths'</span>:<span class="w"> </span><span class="m">100</span>,<span class="w"> </span><span class="s1">'nbest_scale'</span>:<span class="w"> </span><span class="m">0</span>.5,
|
||||
<span class="s1">'export'</span>:<span class="w"> </span>False,<span class="w"> </span><span class="s1">'exp_dir'</span>:<span class="w"> </span>PosixPath<span class="o">(</span><span class="s1">'conformer_ctc/exp'</span><span class="o">)</span>,<span class="w"> </span><span class="s1">'lang_dir'</span>:<span class="w"> </span>PosixPath<span class="o">(</span><span class="s1">'data/lang_bpe_500'</span><span class="o">)</span>,<span class="w"> </span><span class="s1">'full_libri'</span>:<span class="w"> </span>False,
|
||||
<span class="s1">'feature_dir'</span>:<span class="w"> </span>PosixPath<span class="o">(</span><span class="s1">'data/fbank'</span><span class="o">)</span>,<span class="w"> </span><span class="s1">'max_duration'</span>:<span class="w"> </span><span class="m">100</span>,<span class="w"> </span><span class="s1">'bucketing_sampler'</span>:<span class="w"> </span>False,<span class="w"> </span><span class="s1">'num_buckets'</span>:<span class="w"> </span><span class="m">30</span>,
|
||||
<span class="s1">'concatenate_cuts'</span>:<span class="w"> </span>False,<span class="w"> </span><span class="s1">'duration_factor'</span>:<span class="w"> </span><span class="m">1</span>.0,<span class="w"> </span><span class="s1">'gap'</span>:<span class="w"> </span><span class="m">1</span>.0,<span class="w"> </span><span class="s1">'on_the_fly_feats'</span>:<span class="w"> </span>False,
|
||||
<span class="s1">'shuffle'</span>:<span class="w"> </span>True,<span class="w"> </span><span class="s1">'return_cuts'</span>:<span class="w"> </span>True,<span class="w"> </span><span class="s1">'num_workers'</span>:<span class="w"> </span><span class="m">2</span><span class="o">}</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:31,406<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>lexicon.py:113<span class="o">]</span><span class="w"> </span>Loading<span class="w"> </span>pre-compiled<span class="w"> </span>data/lang_bpe_500/Linv.pt
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:31,464<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:548<span class="o">]</span><span class="w"> </span>device:<span class="w"> </span>cuda:0
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:36,171<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>checkpoint.py:92<span class="o">]</span><span class="w"> </span>Loading<span class="w"> </span>checkpoint<span class="w"> </span>from<span class="w"> </span>conformer_ctc/exp/epoch-25.pt
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:36,776<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:652<span class="o">]</span><span class="w"> </span>Number<span class="w"> </span>of<span class="w"> </span>model<span class="w"> </span>parameters:<span class="w"> </span><span class="m">109226120</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:37,714<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">0</span>/206,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">12</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:15,944<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">100</span>/206,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">1328</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:54,443<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">200</span>/206,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">2563</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:56,411<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:494<span class="o">]</span><span class="w"> </span>The<span class="w"> </span>transcripts<span class="w"> </span>are<span class="w"> </span>stored<span class="w"> </span><span class="k">in</span><span class="w"> </span>conformer_ctc/exp/recogs-test-clean-ctc-decoding.txt
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:56,592<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>utils.py:331<span class="o">]</span><span class="w"> </span><span class="o">[</span>test-clean-ctc-decoding<span class="o">]</span><span class="w"> </span>%WER<span class="w"> </span><span class="m">3</span>.26%<span class="w"> </span><span class="o">[</span><span class="m">1715</span><span class="w"> </span>/<span class="w"> </span><span class="m">52576</span>,<span class="w"> </span><span class="m">163</span><span class="w"> </span>ins,<span class="w"> </span><span class="m">128</span><span class="w"> </span>del,<span class="w"> </span><span class="m">1424</span><span class="w"> </span>sub<span class="w"> </span><span class="o">]</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:56,807<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:506<span class="o">]</span><span class="w"> </span>Wrote<span class="w"> </span>detailed<span class="w"> </span>error<span class="w"> </span>stats<span class="w"> </span>to<span class="w"> </span>conformer_ctc/exp/errs-test-clean-ctc-decoding.txt
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:56,808<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:522<span class="o">]</span>
|
||||
For<span class="w"> </span>test-clean,<span class="w"> </span>WER<span class="w"> </span>of<span class="w"> </span>different<span class="w"> </span>settings<span class="w"> </span>are:
|
||||
ctc-decoding<span class="w"> </span><span class="m">3</span>.26<span class="w"> </span>best<span class="w"> </span><span class="k">for</span><span class="w"> </span>test-clean
|
||||
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:57,362 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">0</span>/203, cuts processed <span class="k">until</span> now is <span class="m">15</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:46:35,565 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">100</span>/203, cuts processed <span class="k">until</span> now is <span class="m">1477</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:15,106 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">200</span>/203, cuts processed <span class="k">until</span> now is <span class="m">2922</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,131 INFO <span class="o">[</span>decode.py:494<span class="o">]</span> The transcripts are stored <span class="k">in</span> conformer_ctc/exp/recogs-test-other-ctc-decoding.txt
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,208 INFO <span class="o">[</span>utils.py:331<span class="o">]</span> <span class="o">[</span>test-other-ctc-decoding<span class="o">]</span> %WER <span class="m">8</span>.21% <span class="o">[</span><span class="m">4295</span> / <span class="m">52343</span>, <span class="m">396</span> ins, <span class="m">315</span> del, <span class="m">3584</span> sub <span class="o">]</span>
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,432 INFO <span class="o">[</span>decode.py:506<span class="o">]</span> Wrote detailed error stats to conformer_ctc/exp/errs-test-other-ctc-decoding.txt
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,432 INFO <span class="o">[</span>decode.py:522<span class="o">]</span>
|
||||
For test-other, WER of different settings are:
|
||||
ctc-decoding <span class="m">8</span>.21 best <span class="k">for</span> test-other
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:57,362<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">0</span>/203,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">15</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:46:35,565<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">100</span>/203,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">1477</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:15,106<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">200</span>/203,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">2922</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,131<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:494<span class="o">]</span><span class="w"> </span>The<span class="w"> </span>transcripts<span class="w"> </span>are<span class="w"> </span>stored<span class="w"> </span><span class="k">in</span><span class="w"> </span>conformer_ctc/exp/recogs-test-other-ctc-decoding.txt
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,208<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>utils.py:331<span class="o">]</span><span class="w"> </span><span class="o">[</span>test-other-ctc-decoding<span class="o">]</span><span class="w"> </span>%WER<span class="w"> </span><span class="m">8</span>.21%<span class="w"> </span><span class="o">[</span><span class="m">4295</span><span class="w"> </span>/<span class="w"> </span><span class="m">52343</span>,<span class="w"> </span><span class="m">396</span><span class="w"> </span>ins,<span class="w"> </span><span class="m">315</span><span class="w"> </span>del,<span class="w"> </span><span class="m">3584</span><span class="w"> </span>sub<span class="w"> </span><span class="o">]</span>
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,432<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:506<span class="o">]</span><span class="w"> </span>Wrote<span class="w"> </span>detailed<span class="w"> </span>error<span class="w"> </span>stats<span class="w"> </span>to<span class="w"> </span>conformer_ctc/exp/errs-test-other-ctc-decoding.txt
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,432<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:522<span class="o">]</span>
|
||||
For<span class="w"> </span>test-other,<span class="w"> </span>WER<span class="w"> </span>of<span class="w"> </span>different<span class="w"> </span>settings<span class="w"> </span>are:
|
||||
ctc-decoding<span class="w"> </span><span class="m">8</span>.21<span class="w"> </span>best<span class="w"> </span><span class="k">for</span><span class="w"> </span>test-other
|
||||
|
||||
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,433 INFO <span class="o">[</span>decode.py:680<span class="o">]</span> Done!
|
||||
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,433<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:680<span class="o">]</span><span class="w"> </span>Done!
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -492,10 +492,10 @@ at the same time.</p>
|
||||
<section id="download-the-pre-trained-model">
|
||||
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<p>The following commands describe how to download the pre-trained model:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
$ <span class="nb">cd</span> icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
$ git lfs pull
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
@ -509,8 +509,8 @@ Otherwise, you will have the following issue when running <code class="docutils
|
||||
</div></blockquote>
|
||||
<p>To fix that issue, please use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
git lfs pull
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
git<span class="w"> </span>lfs<span class="w"> </span>pull
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -520,31 +520,31 @@ git lfs pull
|
||||
<p>In order to use this pre-trained model, your k2 version has to be v1.9 or later.</p>
|
||||
</div>
|
||||
<p>After downloading, you will have the following files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ tree icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>tree<span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- data
|
||||
<span class="p">|</span> <span class="p">|</span>-- lang_bpe_500
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG.pt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG_modified.pt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- bpe.model
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- tokens.txt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="sb">`</span>-- words.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- lm
|
||||
<span class="p">|</span> <span class="sb">`</span>-- G_4_gram.pt
|
||||
<span class="p">|</span>-- exp
|
||||
<span class="p">|</span> <span class="p">|</span>-- cpu_jit.pt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
|
||||
<span class="p">|</span>-- log
|
||||
<span class="p">|</span> <span class="sb">`</span>-- log-decode-2021-11-09-17-38-28
|
||||
<span class="sb">`</span>-- test_wavs
|
||||
<span class="p">|</span>-- <span class="m">1089</span>-134686-0001.wav
|
||||
<span class="p">|</span>-- <span class="m">1221</span>-135766-0001.wav
|
||||
<span class="p">|</span>-- <span class="m">1221</span>-135766-0002.wav
|
||||
<span class="sb">`</span>-- trans.txt
|
||||
<span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="p">|</span>--<span class="w"> </span>data
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_bpe_500
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG_modified.pt
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>bpe.model
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
|
||||
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lm
|
||||
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G_4_gram.pt
|
||||
<span class="p">|</span>--<span class="w"> </span>exp
|
||||
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>cpu_jit.pt
|
||||
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
|
||||
<span class="p">|</span>--<span class="w"> </span>log
|
||||
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>log-decode-2021-11-09-17-38-28
|
||||
<span class="sb">`</span>--<span class="w"> </span>test_wavs
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1089</span>-134686-0001.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1221</span>-135766-0001.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1221</span>-135766-0002.wav
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
|
||||
</pre></div>
|
||||
</div>
|
||||
<dl>
|
||||
@ -606,38 +606,38 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
|
||||
</dd>
|
||||
</dl>
|
||||
<p>The information of the test sound files is listed below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/*.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/*.wav
|
||||
|
||||
Input File : <span class="s1">'icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:06.62 <span class="o">=</span> <span class="m">106000</span> samples ~ <span class="m">496</span>.875 CDDA sectors
|
||||
File Size : 212k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:06.62<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">106000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">496</span>.875<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>212k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
|
||||
Input File : <span class="s1">'icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:16.71 <span class="o">=</span> <span class="m">267440</span> samples ~ <span class="m">1253</span>.62 CDDA sectors
|
||||
File Size : 535k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:16.71<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">267440</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">1253</span>.62<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>535k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
|
||||
Input File : <span class="s1">'icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.83 <span class="o">=</span> <span class="m">77200</span> samples ~ <span class="m">361</span>.875 CDDA sectors
|
||||
File Size : 154k
|
||||
Bit Rate : 256k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.83<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">77200</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">361</span>.875<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>154k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
|
||||
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:28.16
|
||||
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:28.16
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -662,15 +662,15 @@ $ ./conformer_ctc/pretrained.py --help
|
||||
<p>CTC decoding uses the best path of the decoding lattice as the decoding result
|
||||
without any LM or lexicon.</p>
|
||||
<p>The command to run CTC decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt <span class="se">\</span>
|
||||
--bpe-model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method ctc-decoding <span class="se">\</span>
|
||||
--num-classes <span class="m">500</span> <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-classes<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is given below:</p>
|
||||
@ -700,16 +700,16 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
<h4>HLG decoding<a class="headerlink" href="#hlg-decoding" title="Permalink to this heading"></a></h4>
|
||||
<p>HLG decoding uses the best path of the decoding lattice as the decoding result.</p>
|
||||
<p>The command to run HLG decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt <span class="se">\</span>
|
||||
--words-file ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
|
||||
--method 1best <span class="se">\</span>
|
||||
--num-classes <span class="m">500</span> <span class="se">\</span>
|
||||
--HLG ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-classes<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is given below:</p>
|
||||
@ -740,18 +740,18 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
<p>It uses an n-gram LM to rescore the decoding lattice and the best
|
||||
path of the rescored lattice is the decoding result.</p>
|
||||
<p>The command to run HLG decoding + LM rescoring is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt <span class="se">\</span>
|
||||
--words-file ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
|
||||
--method whole-lattice-rescoring <span class="se">\</span>
|
||||
--num-classes <span class="m">500</span> <span class="se">\</span>
|
||||
--HLG ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
|
||||
--G ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt <span class="se">\</span>
|
||||
--ngram-lm-scale <span class="m">1</span>.0 <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>whole-lattice-rescoring<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-classes<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--G<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">1</span>.0<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Its output is:</p>
|
||||
@ -784,23 +784,23 @@ path of the rescored lattice is the decoding result.</p>
|
||||
n paths from the rescored lattice, recores the extracted paths with
|
||||
an attention decoder. The path with the highest score is the decoding result.</p>
|
||||
<p>The command to run HLG decoding + LM rescoring + attention decoder rescoring is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt <span class="se">\</span>
|
||||
--words-file ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
|
||||
--method attention-decoder <span class="se">\</span>
|
||||
--num-classes <span class="m">500</span> <span class="se">\</span>
|
||||
--HLG ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
|
||||
--G ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt <span class="se">\</span>
|
||||
--ngram-lm-scale <span class="m">2</span>.0 <span class="se">\</span>
|
||||
--attention-decoder-scale <span class="m">2</span>.0 <span class="se">\</span>
|
||||
--nbest-scale <span class="m">0</span>.5 <span class="se">\</span>
|
||||
--num-paths <span class="m">100</span> <span class="se">\</span>
|
||||
--sos-id <span class="m">1</span> <span class="se">\</span>
|
||||
--eos-id <span class="m">1</span> <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>attention-decoder<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-classes<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--G<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">2</span>.0<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--attention-decoder-scale<span class="w"> </span><span class="m">2</span>.0<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nbest-scale<span class="w"> </span><span class="m">0</span>.5<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-paths<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--sos-id<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--eos-id<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is below:</p>
|
||||
@ -831,22 +831,22 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
|
||||
<section id="compute-wer-with-the-pre-trained-model">
|
||||
<h3>Compute WER with the pre-trained model<a class="headerlink" href="#compute-wer-with-the-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<p>To check the WER of the pre-trained model on the test datasets, run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">cd</span> icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/
|
||||
$ ln -s pretrained.pt epoch-999.pt
|
||||
$ <span class="nb">cd</span> ../..
|
||||
$ ./conformer_ctc/decode.py <span class="se">\</span>
|
||||
--exp-dir ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp <span class="se">\</span>
|
||||
--lang-dir ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500 <span class="se">\</span>
|
||||
--lm-dir ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm <span class="se">\</span>
|
||||
--epoch <span class="m">999</span> <span class="se">\</span>
|
||||
--avg <span class="m">1</span> <span class="se">\</span>
|
||||
--concatenate-cuts <span class="m">0</span> <span class="se">\</span>
|
||||
--bucketing-sampler <span class="m">1</span> <span class="se">\</span>
|
||||
--max-duration <span class="m">30</span> <span class="se">\</span>
|
||||
--num-paths <span class="m">1000</span> <span class="se">\</span>
|
||||
--method attention-decoder <span class="se">\</span>
|
||||
--nbest-scale <span class="m">0</span>.5
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/
|
||||
$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-999.pt
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>../..
|
||||
$<span class="w"> </span>./conformer_ctc/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lang-dir<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lm-dir<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">999</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--concatenate-cuts<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bucketing-sampler<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-paths<span class="w"> </span><span class="m">1000</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>attention-decoder<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nbest-scale<span class="w"> </span><span class="m">0</span>.5
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -875,20 +875,20 @@ Python dependencies.</p>
|
||||
<p>At present, it does NOT support streaming decoding.</p>
|
||||
</div>
|
||||
<p>First, let us compile k2 from source:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> <span class="nv">$HOME</span>
|
||||
$ git clone https://github.com/k2-fsa/k2
|
||||
$ <span class="nb">cd</span> k2
|
||||
$ git checkout v2.0-pre
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span><span class="nv">$HOME</span>
|
||||
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/k2-fsa/k2
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>k2
|
||||
$<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>v2.0-pre
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
<p class="admonition-title">Caution</p>
|
||||
<p>You have to switch to the branch <code class="docutils literal notranslate"><span class="pre">v2.0-pre</span></code>!</p>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ mkdir build-release
|
||||
$ <span class="nb">cd</span> build-release
|
||||
$ cmake -DCMAKE_BUILD_TYPE<span class="o">=</span>Release ..
|
||||
$ make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>build-release
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build-release
|
||||
$<span class="w"> </span>cmake<span class="w"> </span>-DCMAKE_BUILD_TYPE<span class="o">=</span>Release<span class="w"> </span>..
|
||||
$<span class="w"> </span>make<span class="w"> </span>-j<span class="w"> </span>ctc_decode<span class="w"> </span>hlg_decode<span class="w"> </span>ngram_lm_rescore<span class="w"> </span>attention_rescore
|
||||
|
||||
<span class="c1"># You will find four binaries in `./bin`, i.e.,</span>
|
||||
<span class="c1"># ./bin/ctc_decode, ./bin/hlg_decode,</span>
|
||||
@ -898,8 +898,8 @@ $ make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore
|
||||
<p>Now you are ready to go!</p>
|
||||
<p>Assume you have run:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> k2/build-release
|
||||
$ ln -s /path/to/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09 ./
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>k2/build-release
|
||||
$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>/path/to/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09<span class="w"> </span>./
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -908,39 +908,39 @@ $ ln -s /path/to/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09 ./
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will show you the following message:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Please provide --nn_model
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Please<span class="w"> </span>provide<span class="w"> </span>--nn_model
|
||||
|
||||
This file implements decoding with a CTC topology, without any
|
||||
kinds of LM or lexicons.
|
||||
This<span class="w"> </span>file<span class="w"> </span>implements<span class="w"> </span>decoding<span class="w"> </span>with<span class="w"> </span>a<span class="w"> </span>CTC<span class="w"> </span>topology,<span class="w"> </span>without<span class="w"> </span>any
|
||||
kinds<span class="w"> </span>of<span class="w"> </span>LM<span class="w"> </span>or<span class="w"> </span>lexicons.
|
||||
|
||||
Usage:
|
||||
./bin/ctc_decode <span class="se">\</span>
|
||||
--use_gpu <span class="nb">true</span> <span class="se">\</span>
|
||||
--nn_model <path to torch scripted pt file> <span class="se">\</span>
|
||||
--bpe_model <path to pre-trained BPE model> <span class="se">\</span>
|
||||
<path to foo.wav> <span class="se">\</span>
|
||||
<path to bar.wav> <span class="se">\</span>
|
||||
<more waves <span class="k">if</span> any>
|
||||
<span class="w"> </span>./bin/ctc_decode<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn_model<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>torch<span class="w"> </span>scripted<span class="w"> </span>pt<span class="w"> </span>file><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe_model<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>pre-trained<span class="w"> </span>BPE<span class="w"> </span>model><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>foo.wav><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span><path<span class="w"> </span>to<span class="w"> </span>bar.wav><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span><more<span class="w"> </span>waves<span class="w"> </span><span class="k">if</span><span class="w"> </span>any>
|
||||
|
||||
To see all possible options, use
|
||||
./bin/ctc_decode --help
|
||||
To<span class="w"> </span>see<span class="w"> </span>all<span class="w"> </span>possible<span class="w"> </span>options,<span class="w"> </span>use
|
||||
<span class="w"> </span>./bin/ctc_decode<span class="w"> </span>--help
|
||||
|
||||
Caution:
|
||||
- Only sound files <span class="o">(</span>*.wav<span class="o">)</span> with single channel are supported.
|
||||
- It assumes the model is conformer_ctc/transformer.py from icefall.
|
||||
If you use a different model, you have to change the code
|
||||
related to <span class="sb">`</span>model.forward<span class="sb">`</span> <span class="k">in</span> this file.
|
||||
<span class="w"> </span>-<span class="w"> </span>Only<span class="w"> </span>sound<span class="w"> </span>files<span class="w"> </span><span class="o">(</span>*.wav<span class="o">)</span><span class="w"> </span>with<span class="w"> </span>single<span class="w"> </span>channel<span class="w"> </span>are<span class="w"> </span>supported.
|
||||
<span class="w"> </span>-<span class="w"> </span>It<span class="w"> </span>assumes<span class="w"> </span>the<span class="w"> </span>model<span class="w"> </span>is<span class="w"> </span>conformer_ctc/transformer.py<span class="w"> </span>from<span class="w"> </span>icefall.
|
||||
<span class="w"> </span>If<span class="w"> </span>you<span class="w"> </span>use<span class="w"> </span>a<span class="w"> </span>different<span class="w"> </span>model,<span class="w"> </span>you<span class="w"> </span>have<span class="w"> </span>to<span class="w"> </span>change<span class="w"> </span>the<span class="w"> </span>code
|
||||
<span class="w"> </span>related<span class="w"> </span>to<span class="w"> </span><span class="sb">`</span>model.forward<span class="sb">`</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>this<span class="w"> </span>file.
|
||||
</pre></div>
|
||||
</div>
|
||||
<section id="id2">
|
||||
<h3>CTC decoding<a class="headerlink" href="#id2" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ctc_decode <span class="se">\</span>
|
||||
--use_gpu <span class="nb">true</span> <span class="se">\</span>
|
||||
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
|
||||
--bpe_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ctc_decode<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Its output is:</p>
|
||||
@ -969,14 +969,14 @@ Caution:
|
||||
</section>
|
||||
<section id="id3">
|
||||
<h3>HLG decoding<a class="headerlink" href="#id3" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode <span class="se">\</span>
|
||||
--use_gpu <span class="nb">true</span> <span class="se">\</span>
|
||||
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
|
||||
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
|
||||
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--hlg<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--word_table<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is:</p>
|
||||
@ -1005,16 +1005,16 @@ Caution:
|
||||
</section>
|
||||
<section id="hlg-decoding-n-gram-lm-rescoring">
|
||||
<h3>HLG decoding + n-gram LM rescoring<a class="headerlink" href="#hlg-decoding-n-gram-lm-rescoring" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ngram_lm_rescore <span class="se">\</span>
|
||||
--use_gpu <span class="nb">true</span> <span class="se">\</span>
|
||||
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
|
||||
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
|
||||
--g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt <span class="se">\</span>
|
||||
--ngram_lm_scale <span class="m">1</span>.0 <span class="se">\</span>
|
||||
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ngram_lm_rescore<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--hlg<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--g<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--ngram_lm_scale<span class="w"> </span><span class="m">1</span>.0<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--word_table<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is:</p>
|
||||
@ -1047,21 +1047,21 @@ Caution:
|
||||
</section>
|
||||
<section id="hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring">
|
||||
<h3>HLG decoding + n-gram LM rescoring + attention decoder rescoring<a class="headerlink" href="#hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/attention_rescore <span class="se">\</span>
|
||||
--use_gpu <span class="nb">true</span> <span class="se">\</span>
|
||||
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
|
||||
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
|
||||
--g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt <span class="se">\</span>
|
||||
--ngram_lm_scale <span class="m">2</span>.0 <span class="se">\</span>
|
||||
--attention_scale <span class="m">2</span>.0 <span class="se">\</span>
|
||||
--num_paths <span class="m">100</span> <span class="se">\</span>
|
||||
--nbest_scale <span class="m">0</span>.5 <span class="se">\</span>
|
||||
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
|
||||
--sos_id <span class="m">1</span> <span class="se">\</span>
|
||||
--eos_id <span class="m">1</span> <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
|
||||
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/attention_rescore<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--hlg<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--g<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--ngram_lm_scale<span class="w"> </span><span class="m">2</span>.0<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--attention_scale<span class="w"> </span><span class="m">2</span>.0<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num_paths<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nbest_scale<span class="w"> </span><span class="m">0</span>.5<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--word_table<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--sos_id<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--eos_id<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is:</p>
|
||||
|
||||
@ -149,8 +149,8 @@ That is, it has no recurrent connections.</p>
|
||||
<p>The data preparation is the same as other recipes on LibriSpeech dataset,
|
||||
if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Training</span></code> directly.</p>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -165,13 +165,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -203,8 +203,8 @@ the following YouTube channel by <a class="reference external" href="https://www
|
||||
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless4/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -256,26 +256,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -333,7 +333,7 @@ You will find the following files in that directory:</p>
|
||||
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -343,7 +343,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-batch <span class="m">436000</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -352,8 +352,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"pruned transducer training for LibriSpeech with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"pruned transducer training for LibriSpeech with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -390,8 +390,8 @@ the following screenshot:</p>
|
||||
<p>If you don’t have access to google, you can use the following command
|
||||
to view the tensorboard log locally:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
|
||||
tensorboard --logdir . --port <span class="m">6008</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp/tensorboard
|
||||
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -416,14 +416,14 @@ you saw printed to the console during training.</p>
|
||||
<section id="usage-example">
|
||||
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
|
||||
<p>You can use the following command to start the training using 6 GPUs:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3,4,5"</span>
|
||||
./pruned_transducer_stateless4/train.py <span class="se">\</span>
|
||||
--world-size <span class="m">6</span> <span class="se">\</span>
|
||||
--num-epochs <span class="m">30</span> <span class="se">\</span>
|
||||
--start-epoch <span class="m">1</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--full-libri <span class="m">1</span> <span class="se">\</span>
|
||||
--max-duration <span class="m">300</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3,4,5"</span>
|
||||
./pruned_transducer_stateless4/train.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">6</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">300</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -448,37 +448,37 @@ every <code class="docutils literal notranslate"><span class="pre">--save-every-
|
||||
that produces the lowest WERs.</p>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless4/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.</p>
|
||||
<p>The following shows two examples (for two types of checkpoints):</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="m">20</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
@ -547,11 +547,11 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<span class="nv">epoch</span><span class="o">=</span><span class="m">25</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">3</span>
|
||||
|
||||
./pruned_transducer_stateless4/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span>
|
||||
./pruned_transducer_stateless4/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/exp/pretrained.pt</span></code>.</p>
|
||||
@ -559,8 +559,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code>,
|
||||
you can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp
|
||||
ln -s pretrained.pt epoch-999.pt
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp
|
||||
ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-999.pt
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
|
||||
@ -568,23 +568,23 @@ ln -s pretrained.pt epoch-999.pt
|
||||
</div>
|
||||
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/pretrained.py</span></code>, you
|
||||
can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./pruned_transducer_stateless4/exp/pretrained.pt <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method greedy_search <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless4/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="export-model-using-torch-jit-script">
|
||||
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="m">25</span> <span class="se">\</span>
|
||||
--avg <span class="m">3</span> <span class="se">\</span>
|
||||
--jit <span class="m">1</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
|
||||
|
||||
@ -106,8 +106,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
|
||||
</div>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -122,13 +122,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>We provide the following YouTube video showing how to run <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
|
||||
@ -149,9 +149,9 @@ the following YouTube channel by <a class="reference external" href="https://www
|
||||
the <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc">tdnn_lstm_ctc</a>
|
||||
folder.</p>
|
||||
<p>The command to run the training part is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">4</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>By default, it will run <code class="docutils literal notranslate"><span class="pre">20</span></code> epochs. Training logs and checkpoints are saved
|
||||
@ -163,7 +163,7 @@ in <code class="docutils literal notranslate"><span class="pre">tdnn_lstm_ctc/ex
|
||||
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -172,8 +172,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn_lstm_ctc/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"TDNN LSTM training for librispeech with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn_lstm_ctc/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"TDNN LSTM training for librispeech with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -185,7 +185,7 @@ you saw printed to the console during training.</p>
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>To see available training options, you can use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Other training options, e.g., learning rate, results dir, etc., are
|
||||
@ -199,13 +199,13 @@ you want.</p>
|
||||
<p>The decoding part uses checkpoints saved by the training part, so you have
|
||||
to run the training part first.</p>
|
||||
<p>The command for decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$ ./tdnn_lstm_ctc/decode.py
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/decode.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You will see the WER in the output log.</p>
|
||||
<p>Decoded results are saved in <code class="docutils literal notranslate"><span class="pre">tdnn_lstm_ctc/exp</span></code>.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the available decoding options.</p>
|
||||
@ -222,7 +222,7 @@ For instance, <code class="docutils literal notranslate"><span class="pre">./tdn
|
||||
to be averaged. The averaged model is used for decoding.
|
||||
For example, the following command:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/decode.py --epoch <span class="m">10</span> --avg <span class="m">3</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--epoch<span class="w"> </span><span class="m">10</span><span class="w"> </span>--avg<span class="w"> </span><span class="m">3</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -251,11 +251,11 @@ at the same time.</p>
|
||||
</section>
|
||||
<section id="download-the-pre-trained-model">
|
||||
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ mkdir tmp
|
||||
$ <span class="nb">cd</span> tmp
|
||||
$ git lfs install
|
||||
$ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>mkdir<span class="w"> </span>tmp
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tmp
|
||||
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
|
||||
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
@ -267,29 +267,29 @@ $ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
|
||||
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
|
||||
</div>
|
||||
<p>After downloading, you will have the following files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ tree tmp
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>tree<span class="w"> </span>tmp
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/
|
||||
<span class="sb">`</span>-- icefall_asr_librispeech_tdnn-lstm_ctc
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- data
|
||||
<span class="p">|</span> <span class="p">|</span>-- lang_phone
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG.pt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- tokens.txt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="sb">`</span>-- words.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- lm
|
||||
<span class="p">|</span> <span class="sb">`</span>-- G_4_gram.pt
|
||||
<span class="p">|</span>-- exp
|
||||
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
|
||||
<span class="sb">`</span>-- test_wavs
|
||||
<span class="p">|</span>-- <span class="m">1089</span>-134686-0001.flac
|
||||
<span class="p">|</span>-- <span class="m">1221</span>-135766-0001.flac
|
||||
<span class="p">|</span>-- <span class="m">1221</span>-135766-0002.flac
|
||||
<span class="sb">`</span>-- trans.txt
|
||||
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_librispeech_tdnn-lstm_ctc
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_phone
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lm
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G_4_gram.pt
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_wavs
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1089</span>-134686-0001.flac
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1221</span>-135766-0001.flac
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1221</span>-135766-0002.flac
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
|
||||
|
||||
<span class="m">6</span> directories, <span class="m">10</span> files
|
||||
<span class="m">6</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">10</span><span class="w"> </span>files
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>File descriptions</strong>:</p>
|
||||
@ -335,56 +335,56 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>The information of the test sound files is listed below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/*.flac
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/*.flac
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:06.62 <span class="o">=</span> <span class="m">106000</span> samples ~ <span class="m">496</span>.875 CDDA sectors
|
||||
File Size : 116k
|
||||
Bit Rate : 140k
|
||||
Sample Encoding: <span class="m">16</span>-bit FLAC
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:06.62<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">106000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">496</span>.875<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>116k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>140k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>FLAC
|
||||
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:16.71 <span class="o">=</span> <span class="m">267440</span> samples ~ <span class="m">1253</span>.62 CDDA sectors
|
||||
File Size : 343k
|
||||
Bit Rate : 164k
|
||||
Sample Encoding: <span class="m">16</span>-bit FLAC
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:16.71<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">267440</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">1253</span>.62<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>343k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>164k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>FLAC
|
||||
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">16000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:04.83 <span class="o">=</span> <span class="m">77200</span> samples ~ <span class="m">361</span>.875 CDDA sectors
|
||||
File Size : 105k
|
||||
Bit Rate : 174k
|
||||
Sample Encoding: <span class="m">16</span>-bit FLAC
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.83<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">77200</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">361</span>.875<span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>105k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>174k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>FLAC
|
||||
|
||||
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:28.16
|
||||
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:28.16
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="inference-with-a-pre-trained-model">
|
||||
<h3>Inference with a pre-trained model<a class="headerlink" href="#inference-with-a-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./tdnn_lstm_ctc/pretrained.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the usage information of <code class="docutils literal notranslate"><span class="pre">./tdnn_lstm_ctc/pretrained.py</span></code>.</p>
|
||||
<p>To decode with <code class="docutils literal notranslate"><span class="pre">1best</span></code> method, we can use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt <span class="se">\</span>
|
||||
--words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
|
||||
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac <span class="se">\</span>
|
||||
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac <span class="se">\</span>
|
||||
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is:</p>
|
||||
@ -410,16 +410,16 @@ $ ./tdnn_lstm_ctc/pretrained.py --help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To decode with <code class="docutils literal notranslate"><span class="pre">whole-lattice-rescoring</span></code> methond, you can use</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt <span class="se">\</span>
|
||||
--words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
|
||||
--method whole-lattice-rescoring <span class="se">\</span>
|
||||
--G ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt <span class="se">\</span>
|
||||
--ngram-lm-scale <span class="m">0</span>.8 <span class="se">\</span>
|
||||
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac <span class="se">\</span>
|
||||
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac <span class="se">\</span>
|
||||
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>whole-lattice-rescoring<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--G<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">0</span>.8<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The decoding output is:</p>
|
||||
|
||||
@ -116,8 +116,8 @@ similar to the one used in conformer (referred to as “LConv”) before the fra
|
||||
</div>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -136,13 +136,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -175,8 +175,8 @@ the following YouTube channel by <a class="reference external" href="https://www
|
||||
<p>For stability, it doesn`t use blank skip method until model warm-up.</p>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless7_ctc_bs/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -225,26 +225,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -292,7 +292,7 @@ You will find the following files in that directory:</p>
|
||||
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_ctc_bs/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -302,7 +302,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_ctc_bs/train.py --start-batch <span class="m">436000</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -311,8 +311,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"Zipformer-CTC co-training using blank skip for LibriSpeech with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"Zipformer-CTC co-training using blank skip for LibriSpeech with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -336,8 +336,8 @@ tensorboard.</p>
|
||||
<p>If you don’t have access to google, you can use the following command
|
||||
to view the tensorboard log locally:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp/tensorboard
|
||||
tensorboard --logdir . --port <span class="m">6008</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp/tensorboard
|
||||
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -362,15 +362,15 @@ you saw printed to the console during training.</p>
|
||||
<section id="usage-example">
|
||||
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
|
||||
<p>You can use the following command to start the training using 4 GPUs:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
./pruned_transducer_stateless7_ctc_bs/train.py <span class="se">\</span>
|
||||
--world-size <span class="m">4</span> <span class="se">\</span>
|
||||
--num-epochs <span class="m">30</span> <span class="se">\</span>
|
||||
--start-epoch <span class="m">1</span> <span class="se">\</span>
|
||||
--full-libri <span class="m">1</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--use-fp16 <span class="m">1</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -395,30 +395,30 @@ every <code class="docutils literal notranslate"><span class="pre">--save-every-
|
||||
that produces the lowest WERs.</p>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.</p>
|
||||
<p>The following shows the example using <code class="docutils literal notranslate"><span class="pre">epoch-*.pt</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py <span class="se">\</span>
|
||||
--epoch <span class="m">30</span> <span class="se">\</span>
|
||||
--avg <span class="m">13</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To test CTC branch, you can use the following command:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> ctc-decoding 1best<span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py <span class="se">\</span>
|
||||
--epoch <span class="m">30</span> <span class="se">\</span>
|
||||
--avg <span class="m">13</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>ctc-decoding<span class="w"> </span>1best<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
@ -432,12 +432,12 @@ $ ./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py --help
|
||||
<code class="docutils literal notranslate"><span class="pre">optimizer.state_dict()</span></code>. It is useful for resuming training. But after training,
|
||||
we are interested only in <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>. You can use the following
|
||||
command to extract <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="m">30</span> <span class="se">\</span>
|
||||
--avg <span class="m">13</span> <span class="se">\</span>
|
||||
--jit <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt</span></code>.</p>
|
||||
@ -445,8 +445,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py</span></code>,
|
||||
you can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp
|
||||
ln -s pretrained epoch-9999.pt
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp
|
||||
ln<span class="w"> </span>-s<span class="w"> </span>pretrained<span class="w"> </span>epoch-9999.pt
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
|
||||
@ -454,33 +454,33 @@ ln -s pretrained epoch-9999.pt
|
||||
</div>
|
||||
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/pretrained.py</span></code>, you
|
||||
can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method greedy_search <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To test CTC branch using the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/pretrained_ctc.py</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py <span class="se">\</span>
|
||||
--checkpoint ./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method ctc-decoding <span class="se">\</span>
|
||||
--sample-rate <span class="m">16000</span> <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--sample-rate<span class="w"> </span><span class="m">16000</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="export-model-using-torch-jit-script">
|
||||
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="m">30</span> <span class="se">\</span>
|
||||
--avg <span class="m">13</span> <span class="se">\</span>
|
||||
--jit <span class="m">1</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
|
||||
@ -488,20 +488,20 @@ load it by <code class="docutils literal notranslate"><span class="pre">torch.ji
|
||||
<p>Note <code class="docutils literal notranslate"><span class="pre">cpu</span></code> in the name <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> means the parameters when loaded into Python
|
||||
are on CPU. You can use <code class="docutils literal notranslate"><span class="pre">to("cuda")</span></code> to move them to a CUDA device.</p>
|
||||
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py <span class="se">\</span>
|
||||
--nn-model-filename ./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn-model-filename<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To test CTC branch using the generated files with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py <span class="se">\</span>
|
||||
--model-filename ./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method ctc-decoding <span class="se">\</span>
|
||||
--sample-rate <span class="m">16000</span> <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--model-filename<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--sample-rate<span class="w"> </span><span class="m">16000</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
@ -113,8 +113,8 @@ with the <a class="reference external" href="https://www.openslr.org/12">LibriSp
|
||||
</div>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -133,13 +133,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -172,8 +172,8 @@ the following YouTube channel by <a class="reference external" href="https://www
|
||||
<p>For stability, it uses CTC loss for model warm-up and then switches to MMI loss.</p>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./zipformer_mmi/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -222,26 +222,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./zipformer_mmi/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./zipformer_mmi/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./zipformer_mmi/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -289,7 +289,7 @@ You will find the following files in that directory:</p>
|
||||
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./zipformer_mmi/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -299,7 +299,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./zipformer_mmi/train.py --start-batch <span class="m">436000</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -308,8 +308,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> zipformer_mmi/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"Zipformer MMI training for LibriSpeech with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>zipformer_mmi/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"Zipformer MMI training for LibriSpeech with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -333,8 +333,8 @@ tensorboard.</p>
|
||||
<p>If you don’t have access to google, you can use the following command
|
||||
to view the tensorboard log locally:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> zipformer_mmi/exp/tensorboard
|
||||
tensorboard --logdir . --port <span class="m">6008</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>zipformer_mmi/exp/tensorboard
|
||||
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -359,16 +359,16 @@ you saw printed to the console during training.</p>
|
||||
<section id="usage-example">
|
||||
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
|
||||
<p>You can use the following command to start the training using 4 GPUs:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
./zipformer_mmi/train.py <span class="se">\</span>
|
||||
--world-size <span class="m">4</span> <span class="se">\</span>
|
||||
--num-epochs <span class="m">30</span> <span class="se">\</span>
|
||||
--start-epoch <span class="m">1</span> <span class="se">\</span>
|
||||
--full-libri <span class="m">1</span> <span class="se">\</span>
|
||||
--exp-dir zipformer_mmi/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">500</span> <span class="se">\</span>
|
||||
--use-fp16 <span class="m">1</span> <span class="se">\</span>
|
||||
--num-workers <span class="m">2</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
./zipformer_mmi/train.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>zipformer_mmi/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-workers<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -393,22 +393,22 @@ every <code class="docutils literal notranslate"><span class="pre">--save-every-
|
||||
that produces the lowest WERs.</p>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./zipformer_mmi/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./zipformer_mmi/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.</p>
|
||||
<p>The following shows the example using <code class="docutils literal notranslate"><span class="pre">epoch-*.pt</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> nbest nbest-rescoring-LG nbest-rescoring-3-gram nbest-rescoring-4-gram<span class="p">;</span> <span class="k">do</span>
|
||||
./zipformer_mmi/decode.py <span class="se">\</span>
|
||||
--epoch <span class="m">30</span> <span class="se">\</span>
|
||||
--avg <span class="m">10</span> <span class="se">\</span>
|
||||
--exp-dir ./zipformer_mmi/exp/ <span class="se">\</span>
|
||||
--max-duration <span class="m">100</span> <span class="se">\</span>
|
||||
--lang-dir data/lang_bpe_500 <span class="se">\</span>
|
||||
--nbest-scale <span class="m">1</span>.2 <span class="se">\</span>
|
||||
--hp-scale <span class="m">1</span>.0 <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>nbest<span class="w"> </span>nbest-rescoring-LG<span class="w"> </span>nbest-rescoring-3-gram<span class="w"> </span>nbest-rescoring-4-gram<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./zipformer_mmi/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./zipformer_mmi/exp/<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lang-dir<span class="w"> </span>data/lang_bpe_500<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nbest-scale<span class="w"> </span><span class="m">1</span>.2<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--hp-scale<span class="w"> </span><span class="m">1</span>.0<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
@ -422,12 +422,12 @@ $ ./zipformer_mmi/decode.py --help
|
||||
<code class="docutils literal notranslate"><span class="pre">optimizer.state_dict()</span></code>. It is useful for resuming training. But after training,
|
||||
we are interested only in <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>. You can use the following
|
||||
command to extract <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/export.py <span class="se">\</span>
|
||||
--exp-dir ./zipformer_mmi/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="m">30</span> <span class="se">\</span>
|
||||
--avg <span class="m">9</span> <span class="se">\</span>
|
||||
--jit <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./zipformer_mmi/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./zipformer_mmi/exp/pretrained.pt</span></code>.</p>
|
||||
@ -435,8 +435,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">zipformer_mmi/decode.py</span></code>,
|
||||
you can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> zipformer_mmi/exp
|
||||
ln -s pretrained epoch-9999.pt
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>zipformer_mmi/exp
|
||||
ln<span class="w"> </span>-s<span class="w"> </span>pretrained<span class="w"> </span>epoch-9999.pt
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
|
||||
@ -444,23 +444,23 @@ ln -s pretrained epoch-9999.pt
|
||||
</div>
|
||||
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./zipformer_mmi/pretrained.py</span></code>, you
|
||||
can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./zipformer_mmi/exp/pretrained.pt <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method 1best <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./zipformer_mmi/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="export-model-using-torch-jit-script">
|
||||
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/export.py <span class="se">\</span>
|
||||
--exp-dir ./zipformer_mmi/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="m">30</span> <span class="se">\</span>
|
||||
--avg <span class="m">9</span> <span class="se">\</span>
|
||||
--jit <span class="m">1</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./zipformer_mmi/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
|
||||
@ -468,12 +468,12 @@ load it by <code class="docutils literal notranslate"><span class="pre">torch.ji
|
||||
<p>Note <code class="docutils literal notranslate"><span class="pre">cpu</span></code> in the name <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> means the parameters when loaded into Python
|
||||
are on CPU. You can use <code class="docutils literal notranslate"><span class="pre">to("cuda")</span></code> to move them to a CUDA device.</p>
|
||||
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./zipformer_mmi/jit_pretrained.py</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/jit_pretrained.py <span class="se">\</span>
|
||||
--nn-model-filename ./zipformer_mmi/exp/cpu_jit.pt <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method 1best <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/jit_pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--nn-model-filename<span class="w"> </span>./zipformer_mmi/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
@ -103,8 +103,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
|
||||
</div>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -119,13 +119,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -139,9 +139,9 @@ folder.</p>
|
||||
<p>TIMIT is a very small dataset. So one GPU is enough.</p>
|
||||
</div>
|
||||
<p>The command to run the training part is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$ ./tdnn_ligru_ctc/train.py
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$<span class="w"> </span>./tdnn_ligru_ctc/train.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>By default, it will run <code class="docutils literal notranslate"><span class="pre">25</span></code> epochs. Training logs and checkpoints are saved
|
||||
@ -153,7 +153,7 @@ in <code class="docutils literal notranslate"><span class="pre">tdnn_ligru_ctc/e
|
||||
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_ligru_ctc/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_ligru_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -162,8 +162,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn_ligru_ctc/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"TDNN ligru training for timit with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn_ligru_ctc/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"TDNN ligru training for timit with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -175,7 +175,7 @@ you saw printed to the console during training.</p>
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>To see available training options, you can use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_ligru_ctc/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_ligru_ctc/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Other training options, e.g., learning rate, results dir, etc., are
|
||||
@ -189,13 +189,13 @@ you want.</p>
|
||||
<p>The decoding part uses checkpoints saved by the training part, so you have
|
||||
to run the training part first.</p>
|
||||
<p>The command for decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$ ./tdnn_ligru_ctc/decode.py
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$<span class="w"> </span>./tdnn_ligru_ctc/decode.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You will see the WER in the output log.</p>
|
||||
<p>Decoded results are saved in <code class="docutils literal notranslate"><span class="pre">tdnn_ligru_ctc/exp</span></code>.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_ligru_ctc/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_ligru_ctc/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the available decoding options.</p>
|
||||
@ -212,7 +212,7 @@ For instance, <code class="docutils literal notranslate"><span class="pre">./tdn
|
||||
to be averaged. The averaged model is used for decoding.
|
||||
For example, the following command:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_ligru_ctc/decode.py --epoch <span class="m">25</span> --avg <span class="m">17</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_ligru_ctc/decode.py<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span>--avg<span class="w"> </span><span class="m">17</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -245,11 +245,11 @@ at the same time.</p>
|
||||
</section>
|
||||
<section id="download-the-pre-trained-model">
|
||||
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ mkdir tmp-ligru
|
||||
$ <span class="nb">cd</span> tmp-ligru
|
||||
$ git lfs install
|
||||
$ git clone https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_ligru_ctc
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>mkdir<span class="w"> </span>tmp-ligru
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tmp-ligru
|
||||
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
|
||||
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_ligru_ctc
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
@ -261,29 +261,29 @@ $ git clone https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_ligru_ct
|
||||
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
|
||||
</div>
|
||||
<p>After downloading, you will have the following files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ tree tmp-ligru
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>tree<span class="w"> </span>tmp-ligru
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp-ligru/
|
||||
<span class="sb">`</span>-- icefall_asr_timit_tdnn_ligru_ctc
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- data
|
||||
<span class="p">|</span> <span class="p">|</span>-- lang_phone
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG.pt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- tokens.txt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="sb">`</span>-- words.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- lm
|
||||
<span class="p">|</span> <span class="sb">`</span>-- G_4_gram.pt
|
||||
<span class="p">|</span>-- exp
|
||||
<span class="p">|</span> <span class="sb">`</span>-- pretrained_average_9_25.pt
|
||||
<span class="sb">`</span>-- test_wavs
|
||||
<span class="p">|</span>-- FDHC0_SI1559.WAV
|
||||
<span class="p">|</span>-- FELC0_SI756.WAV
|
||||
<span class="p">|</span>-- FMGD0_SI1564.WAV
|
||||
<span class="sb">`</span>-- trans.txt
|
||||
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_timit_tdnn_ligru_ctc
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_phone
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lm
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G_4_gram.pt
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained_average_9_25.pt
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_wavs
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FDHC0_SI1559.WAV
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FELC0_SI756.WAV
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FMGD0_SI1564.WAV
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
|
||||
|
||||
<span class="m">6</span> directories, <span class="m">10</span> files
|
||||
<span class="m">6</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">10</span><span class="w"> </span>files
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>File descriptions</strong>:</p>
|
||||
@ -329,60 +329,60 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>The information of the test sound files is listed below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ffprobe -show_format tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
|
||||
Input <span class="c1">#0, nistsphere, from 'tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV':</span>
|
||||
Input<span class="w"> </span><span class="c1">#0, nistsphere, from 'tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV':</span>
|
||||
Metadata:
|
||||
database_id : TIMIT
|
||||
database_version: <span class="m">1</span>.0
|
||||
utterance_id : dhc0_si1559
|
||||
sample_min : -4176
|
||||
sample_max : <span class="m">5984</span>
|
||||
Duration: <span class="m">00</span>:00:03.40, bitrate: <span class="m">258</span> kb/s
|
||||
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
|
||||
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
|
||||
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>dhc0_si1559
|
||||
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-4176
|
||||
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">5984</span>
|
||||
Duration:<span class="w"> </span><span class="m">00</span>:00:03.40,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">258</span><span class="w"> </span>kb/s
|
||||
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
|
||||
$ ffprobe -show_format tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
|
||||
$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
|
||||
|
||||
Input <span class="c1">#0, nistsphere, from 'tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV':</span>
|
||||
Input<span class="w"> </span><span class="c1">#0, nistsphere, from 'tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV':</span>
|
||||
Metadata:
|
||||
database_id : TIMIT
|
||||
database_version: <span class="m">1</span>.0
|
||||
utterance_id : elc0_si756
|
||||
sample_min : -1546
|
||||
sample_max : <span class="m">1989</span>
|
||||
Duration: <span class="m">00</span>:00:04.19, bitrate: <span class="m">257</span> kb/s
|
||||
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
|
||||
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
|
||||
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>elc0_si756
|
||||
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-1546
|
||||
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">1989</span>
|
||||
Duration:<span class="w"> </span><span class="m">00</span>:00:04.19,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">257</span><span class="w"> </span>kb/s
|
||||
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
|
||||
$ ffprobe -show_format tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
|
||||
Input <span class="c1">#0, nistsphere, from 'tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV':</span>
|
||||
Input<span class="w"> </span><span class="c1">#0, nistsphere, from 'tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV':</span>
|
||||
Metadata:
|
||||
database_id : TIMIT
|
||||
database_version: <span class="m">1</span>.0
|
||||
utterance_id : mgd0_si1564
|
||||
sample_min : -7626
|
||||
sample_max : <span class="m">10573</span>
|
||||
Duration: <span class="m">00</span>:00:04.44, bitrate: <span class="m">257</span> kb/s
|
||||
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
|
||||
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
|
||||
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>mgd0_si1564
|
||||
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-7626
|
||||
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">10573</span>
|
||||
Duration:<span class="w"> </span><span class="m">00</span>:00:04.44,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">257</span><span class="w"> </span>kb/s
|
||||
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="inference-with-a-pre-trained-model">
|
||||
<h3>Inference with a pre-trained model<a class="headerlink" href="#inference-with-a-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ ./tdnn_ligru_ctc/pretrained.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>./tdnn_ligru_ctc/pretrained.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the usage information of <code class="docutils literal notranslate"><span class="pre">./tdnn_ligru_ctc/pretrained.py</span></code>.</p>
|
||||
<p>To decode with <code class="docutils literal notranslate"><span class="pre">1best</span></code> method, we can use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_ligru_ctc/pretrained.py
|
||||
--method 1best
|
||||
--checkpoint ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/exp/pretrained_average_9_25.pt
|
||||
--words-file ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/words.txt
|
||||
--HLG ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/HLG.pt
|
||||
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
|
||||
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
<span class="w"> </span>--method<span class="w"> </span>1best
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/exp/pretrained_average_9_25.pt
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/words.txt
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/HLG.pt
|
||||
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
|
||||
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is:</p>
|
||||
@ -408,16 +408,16 @@ $ ./tdnn_ligru_ctc/pretrained.py --help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To decode with <code class="docutils literal notranslate"><span class="pre">whole-lattice-rescoring</span></code> methond, you can use</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_ligru_ctc/pretrained.py <span class="se">\</span>
|
||||
--method whole-lattice-rescoring <span class="se">\</span>
|
||||
--checkpoint ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/exp/pretrained_average_9_25.pt <span class="se">\</span>
|
||||
--words-file ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
|
||||
--G ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lm/G_4_gram.pt <span class="se">\</span>
|
||||
--ngram-lm-scale <span class="m">0</span>.1 <span class="se">\</span>
|
||||
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
|
||||
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_ligru_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>whole-lattice-rescoring<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/exp/pretrained_average_9_25.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--G<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">0</span>.1<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
|
||||
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The decoding output is:</p>
|
||||
|
||||
@ -103,8 +103,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
|
||||
</div>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -119,13 +119,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -139,9 +139,9 @@ folder.</p>
|
||||
<p>TIMIT is a very small dataset. So one GPU for training is enough.</p>
|
||||
</div>
|
||||
<p>The command to run the training part is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$ ./tdnn_lstm_ctc/train.py
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/train.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>By default, it will run <code class="docutils literal notranslate"><span class="pre">25</span></code> epochs. Training logs and checkpoints are saved
|
||||
@ -153,7 +153,7 @@ in <code class="docutils literal notranslate"><span class="pre">tdnn_lstm_ctc/ex
|
||||
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -162,8 +162,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn_lstm_ctc/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"TDNN LSTM training for timit with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn_lstm_ctc/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"TDNN LSTM training for timit with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -175,7 +175,7 @@ you saw printed to the console during training.</p>
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>To see available training options, you can use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Other training options, e.g., learning rate, results dir, etc., are
|
||||
@ -189,13 +189,13 @@ you want.</p>
|
||||
<p>The decoding part uses checkpoints saved by the training part, so you have
|
||||
to run the training part first.</p>
|
||||
<p>The command for decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$ ./tdnn_lstm_ctc/decode.py
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0"</span>
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/decode.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You will see the WER in the output log.</p>
|
||||
<p>Decoded results are saved in <code class="docutils literal notranslate"><span class="pre">tdnn_lstm_ctc/exp</span></code>.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the available decoding options.</p>
|
||||
@ -212,7 +212,7 @@ For instance, <code class="docutils literal notranslate"><span class="pre">./tdn
|
||||
to be averaged. The averaged model is used for decoding.
|
||||
For example, the following command:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/decode.py --epoch <span class="m">25</span> --avg <span class="m">10</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span>--avg<span class="w"> </span><span class="m">10</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -243,11 +243,11 @@ at the same time.</p>
|
||||
</section>
|
||||
<section id="download-the-pre-trained-model">
|
||||
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ mkdir tmp-lstm
|
||||
$ <span class="nb">cd</span> tmp-lstm
|
||||
$ git lfs install
|
||||
$ git clone https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_lstm_ctc
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>mkdir<span class="w"> </span>tmp-lstm
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tmp-lstm
|
||||
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
|
||||
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_lstm_ctc
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
@ -259,29 +259,29 @@ $ git clone https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_lstm_ctc
|
||||
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
|
||||
</div>
|
||||
<p>After downloading, you will have the following files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ tree tmp-lstm
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>tree<span class="w"> </span>tmp-lstm
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp-lstm/
|
||||
<span class="sb">`</span>-- icefall_asr_timit_tdnn_lstm_ctc
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- data
|
||||
<span class="p">|</span> <span class="p">|</span>-- lang_phone
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG.pt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- tokens.txt
|
||||
<span class="p">|</span> <span class="p">|</span> <span class="sb">`</span>-- words.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- lm
|
||||
<span class="p">|</span> <span class="sb">`</span>-- G_4_gram.pt
|
||||
<span class="p">|</span>-- exp
|
||||
<span class="p">|</span> <span class="sb">`</span>-- pretrained_average_16_25.pt
|
||||
<span class="sb">`</span>-- test_wavs
|
||||
<span class="p">|</span>-- FDHC0_SI1559.WAV
|
||||
<span class="p">|</span>-- FELC0_SI756.WAV
|
||||
<span class="p">|</span>-- FMGD0_SI1564.WAV
|
||||
<span class="sb">`</span>-- trans.txt
|
||||
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_timit_tdnn_lstm_ctc
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_phone
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lm
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G_4_gram.pt
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained_average_16_25.pt
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_wavs
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FDHC0_SI1559.WAV
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FELC0_SI756.WAV
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FMGD0_SI1564.WAV
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
|
||||
|
||||
<span class="m">6</span> directories, <span class="m">10</span> files
|
||||
<span class="m">6</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">10</span><span class="w"> </span>files
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>File descriptions</strong>:</p>
|
||||
@ -327,60 +327,60 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>The information of the test sound files is listed below:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ffprobe -show_format tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
|
||||
Input <span class="c1">#0, nistsphere, from 'tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV':</span>
|
||||
Input<span class="w"> </span><span class="c1">#0, nistsphere, from 'tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV':</span>
|
||||
Metadata:
|
||||
database_id : TIMIT
|
||||
database_version: <span class="m">1</span>.0
|
||||
utterance_id : dhc0_si1559
|
||||
sample_min : -4176
|
||||
sample_max : <span class="m">5984</span>
|
||||
Duration: <span class="m">00</span>:00:03.40, bitrate: <span class="m">258</span> kb/s
|
||||
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
|
||||
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
|
||||
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>dhc0_si1559
|
||||
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-4176
|
||||
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">5984</span>
|
||||
Duration:<span class="w"> </span><span class="m">00</span>:00:03.40,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">258</span><span class="w"> </span>kb/s
|
||||
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
|
||||
$ ffprobe -show_format tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
|
||||
$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
|
||||
|
||||
Input <span class="c1">#0, nistsphere, from 'tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV':</span>
|
||||
Input<span class="w"> </span><span class="c1">#0, nistsphere, from 'tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV':</span>
|
||||
Metadata:
|
||||
database_id : TIMIT
|
||||
database_version: <span class="m">1</span>.0
|
||||
utterance_id : elc0_si756
|
||||
sample_min : -1546
|
||||
sample_max : <span class="m">1989</span>
|
||||
Duration: <span class="m">00</span>:00:04.19, bitrate: <span class="m">257</span> kb/s
|
||||
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
|
||||
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
|
||||
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>elc0_si756
|
||||
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-1546
|
||||
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">1989</span>
|
||||
Duration:<span class="w"> </span><span class="m">00</span>:00:04.19,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">257</span><span class="w"> </span>kb/s
|
||||
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
|
||||
$ ffprobe -show_format tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
|
||||
Input <span class="c1">#0, nistsphere, from 'tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV':</span>
|
||||
Input<span class="w"> </span><span class="c1">#0, nistsphere, from 'tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV':</span>
|
||||
Metadata:
|
||||
database_id : TIMIT
|
||||
database_version: <span class="m">1</span>.0
|
||||
utterance_id : mgd0_si1564
|
||||
sample_min : -7626
|
||||
sample_max : <span class="m">10573</span>
|
||||
Duration: <span class="m">00</span>:00:04.44, bitrate: <span class="m">257</span> kb/s
|
||||
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
|
||||
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
|
||||
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>mgd0_si1564
|
||||
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-7626
|
||||
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">10573</span>
|
||||
Duration:<span class="w"> </span><span class="m">00</span>:00:04.44,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">257</span><span class="w"> </span>kb/s
|
||||
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="inference-with-a-pre-trained-model">
|
||||
<h3>Inference with a pre-trained model<a class="headerlink" href="#inference-with-a-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
|
||||
$ ./tdnn_lstm_ctc/pretrained.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
|
||||
$<span class="w"> </span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the usage information of <code class="docutils literal notranslate"><span class="pre">./tdnn_lstm_ctc/pretrained.py</span></code>.</p>
|
||||
<p>To decode with <code class="docutils literal notranslate"><span class="pre">1best</span></code> method, we can use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py
|
||||
--method 1best
|
||||
--checkpoint ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/exp/pretrained_average_16_25.pt
|
||||
--words-file ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/words.txt
|
||||
--HLG ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/HLG.pt
|
||||
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
|
||||
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
<span class="w"> </span>--method<span class="w"> </span>1best
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/exp/pretrained_average_16_25.pt
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/words.txt
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/HLG.pt
|
||||
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
|
||||
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is:</p>
|
||||
@ -406,16 +406,16 @@ $ ./tdnn_lstm_ctc/pretrained.py --help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To decode with <code class="docutils literal notranslate"><span class="pre">whole-lattice-rescoring</span></code> methond, you can use</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py <span class="se">\</span>
|
||||
--method whole-lattice-rescoring <span class="se">\</span>
|
||||
--checkpoint ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/exp/pretrained_average_16_25.pt <span class="se">\</span>
|
||||
--words-file ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
|
||||
--G ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lm/G_4_gram.pt <span class="se">\</span>
|
||||
--ngram-lm-scale <span class="m">0</span>.08 <span class="se">\</span>
|
||||
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
|
||||
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>whole-lattice-rescoring<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/exp/pretrained_average_16_25.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--G<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">0</span>.08<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
|
||||
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
|
||||
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The decoding output is:</p>
|
||||
|
||||
@ -204,8 +204,8 @@ the following WER at the end:</p>
|
||||
</div>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -220,13 +220,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -236,9 +236,9 @@ $ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</
|
||||
the <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/yesno/ASR/tdnn">tdnn</a>
|
||||
folder, for <code class="docutils literal notranslate"><span class="pre">yesno</span></code>.</p>
|
||||
<p>The command to run the training part is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">""</span>
|
||||
$ ./tdnn/train.py
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">""</span>
|
||||
$<span class="w"> </span>./tdnn/train.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>By default, it will run <code class="docutils literal notranslate"><span class="pre">15</span></code> epochs. Training logs and checkpoints are saved
|
||||
@ -250,7 +250,7 @@ in <code class="docutils literal notranslate"><span class="pre">tdnn/exp</span><
|
||||
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -259,8 +259,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"TDNN training for yesno with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"TDNN training for yesno with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -302,15 +302,15 @@ you saw printed to the console during training.</p>
|
||||
If you have two GPUs, say, GPU 0 and GPU 1, and you want to use GPU 1 for
|
||||
training, you can run:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"1"</span>
|
||||
$ ./tdnn/train.py
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"1"</span>
|
||||
$<span class="w"> </span>./tdnn/train.py
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p>Since the <code class="docutils literal notranslate"><span class="pre">yesno</span></code> dataset is very small, containing only 30 sound files
|
||||
for training, and the model in use is also very small, we use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">""</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">""</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -319,7 +319,7 @@ for training, and the model in use is also very small, we use:</p>
|
||||
run <code class="docutils literal notranslate"><span class="pre">export</span> <span class="pre">CUDA_VISIBLE_DEVICES=""</span></code>.</p>
|
||||
</div>
|
||||
<p>To see available training options, you can use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Other training options, e.g., learning rate, results dir, etc., are
|
||||
@ -333,13 +333,13 @@ you want.</p>
|
||||
<p>The decoding part uses checkpoints saved by the training part, so you have
|
||||
to run the training part first.</p>
|
||||
<p>The command for decoding is:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">""</span>
|
||||
$ ./tdnn/decode.py
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">""</span>
|
||||
$<span class="w"> </span>./tdnn/decode.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>You will see the WER in the output log.</p>
|
||||
<p>Decoded results are saved in <code class="docutils literal notranslate"><span class="pre">tdnn/exp</span></code>.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the available decoding options.</p>
|
||||
@ -356,7 +356,7 @@ For instance, <code class="docutils literal notranslate"><span class="pre">./tdn
|
||||
to be averaged. The averaged model is used for decoding.
|
||||
For example, the following command:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn/decode.py --epoch <span class="m">10</span> --avg <span class="m">3</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn/decode.py<span class="w"> </span>--epoch<span class="w"> </span><span class="m">10</span><span class="w"> </span>--avg<span class="w"> </span><span class="m">3</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -378,11 +378,11 @@ See <a class="reference internal" href="#yesno-use-a-pre-trained-model"><span cl
|
||||
<p>The following shows you how to use the pre-trained model.</p>
|
||||
<section id="download-the-pre-trained-model">
|
||||
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
|
||||
$ mkdir tmp
|
||||
$ <span class="nb">cd</span> tmp
|
||||
$ git lfs install
|
||||
$ git clone https://huggingface.co/csukuangfj/icefall_asr_yesno_tdnn
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
|
||||
$<span class="w"> </span>mkdir<span class="w"> </span>tmp
|
||||
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tmp
|
||||
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
|
||||
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/csukuangfj/icefall_asr_yesno_tdnn
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
@ -390,71 +390,71 @@ $ git clone https://huggingface.co/csukuangfj/icefall_asr_yesno_tdnn
|
||||
<p>You have to use <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">lfs</span></code> to download the pre-trained model.</p>
|
||||
</div>
|
||||
<p>After downloading, you will have the following files:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
|
||||
$ tree tmp
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
|
||||
$<span class="w"> </span>tree<span class="w"> </span>tmp
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/
|
||||
<span class="sb">`</span>-- icefall_asr_yesno_tdnn
|
||||
<span class="p">|</span>-- README.md
|
||||
<span class="p">|</span>-- lang_phone
|
||||
<span class="p">|</span> <span class="p">|</span>-- HLG.pt
|
||||
<span class="p">|</span> <span class="p">|</span>-- L.pt
|
||||
<span class="p">|</span> <span class="p">|</span>-- L_disambig.pt
|
||||
<span class="p">|</span> <span class="p">|</span>-- Linv.pt
|
||||
<span class="p">|</span> <span class="p">|</span>-- lexicon.txt
|
||||
<span class="p">|</span> <span class="p">|</span>-- lexicon_disambig.txt
|
||||
<span class="p">|</span> <span class="p">|</span>-- tokens.txt
|
||||
<span class="p">|</span> <span class="sb">`</span>-- words.txt
|
||||
<span class="p">|</span>-- lm
|
||||
<span class="p">|</span> <span class="p">|</span>-- G.arpa
|
||||
<span class="p">|</span> <span class="sb">`</span>-- G.fst.txt
|
||||
<span class="p">|</span>-- pretrained.pt
|
||||
<span class="sb">`</span>-- test_waves
|
||||
<span class="p">|</span>-- 0_0_0_1_0_0_0_1.wav
|
||||
<span class="p">|</span>-- 0_0_1_0_0_0_1_0.wav
|
||||
<span class="p">|</span>-- 0_0_1_0_0_1_1_1.wav
|
||||
<span class="p">|</span>-- 0_0_1_0_1_0_0_1.wav
|
||||
<span class="p">|</span>-- 0_0_1_1_0_0_0_1.wav
|
||||
<span class="p">|</span>-- 0_0_1_1_0_1_1_0.wav
|
||||
<span class="p">|</span>-- 0_0_1_1_1_0_0_0.wav
|
||||
<span class="p">|</span>-- 0_0_1_1_1_1_0_0.wav
|
||||
<span class="p">|</span>-- 0_1_0_0_0_1_0_0.wav
|
||||
<span class="p">|</span>-- 0_1_0_0_1_0_1_0.wav
|
||||
<span class="p">|</span>-- 0_1_0_1_0_0_0_0.wav
|
||||
<span class="p">|</span>-- 0_1_0_1_1_1_0_0.wav
|
||||
<span class="p">|</span>-- 0_1_1_0_0_1_1_1.wav
|
||||
<span class="p">|</span>-- 0_1_1_1_0_0_1_0.wav
|
||||
<span class="p">|</span>-- 0_1_1_1_1_0_1_0.wav
|
||||
<span class="p">|</span>-- 1_0_0_0_0_0_0_0.wav
|
||||
<span class="p">|</span>-- 1_0_0_0_0_0_1_1.wav
|
||||
<span class="p">|</span>-- 1_0_0_1_0_1_1_1.wav
|
||||
<span class="p">|</span>-- 1_0_1_1_0_1_1_1.wav
|
||||
<span class="p">|</span>-- 1_0_1_1_1_1_0_1.wav
|
||||
<span class="p">|</span>-- 1_1_0_0_0_1_1_1.wav
|
||||
<span class="p">|</span>-- 1_1_0_0_1_0_1_1.wav
|
||||
<span class="p">|</span>-- 1_1_0_1_0_1_0_0.wav
|
||||
<span class="p">|</span>-- 1_1_0_1_1_0_0_1.wav
|
||||
<span class="p">|</span>-- 1_1_0_1_1_1_1_0.wav
|
||||
<span class="p">|</span>-- 1_1_1_0_0_1_0_1.wav
|
||||
<span class="p">|</span>-- 1_1_1_0_1_0_1_0.wav
|
||||
<span class="p">|</span>-- 1_1_1_1_0_0_1_0.wav
|
||||
<span class="p">|</span>-- 1_1_1_1_1_0_0_0.wav
|
||||
<span class="sb">`</span>-- 1_1_1_1_1_1_1_1.wav
|
||||
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_yesno_tdnn
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_phone
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>L.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>L_disambig.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>Linv.pt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lexicon.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lexicon_disambig.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lm
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>G.arpa
|
||||
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G.fst.txt
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>pretrained.pt
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_waves
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_0_1_0_0_0_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_0_0_0_1_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_0_0_1_1_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_0_1_0_0_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_1_0_0_0_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_1_0_1_1_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_1_1_0_0_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_1_1_1_0_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_0_0_0_1_0_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_0_0_1_0_1_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_0_1_0_0_0_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_0_1_1_1_0_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_1_0_0_1_1_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_1_1_0_0_1_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_1_1_1_0_1_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_0_0_0_0_0_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_0_0_0_0_1_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_0_1_0_1_1_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_1_1_0_1_1_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_1_1_1_1_0_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_0_0_1_1_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_0_1_0_1_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_1_0_1_0_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_1_1_0_0_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_1_1_1_1_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_1_0_0_1_0_1.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_1_0_1_0_1_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_1_1_0_0_1_0.wav
|
||||
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_1_1_1_0_0_0.wav
|
||||
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>1_1_1_1_1_1_1_1.wav
|
||||
|
||||
<span class="m">4</span> directories, <span class="m">42</span> files
|
||||
<span class="m">4</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">42</span><span class="w"> </span>files
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav
|
||||
|
||||
Input File : <span class="s1">'tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav'</span>
|
||||
Channels : <span class="m">1</span>
|
||||
Sample Rate : <span class="m">8000</span>
|
||||
Precision : <span class="m">16</span>-bit
|
||||
Duration : <span class="m">00</span>:00:06.76 <span class="o">=</span> <span class="m">54080</span> samples ~ <span class="m">507</span> CDDA sectors
|
||||
File Size : 108k
|
||||
Bit Rate : 128k
|
||||
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
|
||||
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">'tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav'</span>
|
||||
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
|
||||
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">8000</span>
|
||||
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
|
||||
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:06.76<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">54080</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">507</span><span class="w"> </span>CDDA<span class="w"> </span>sectors
|
||||
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>108k
|
||||
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>128k
|
||||
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
|
||||
</pre></div>
|
||||
</div>
|
||||
<ul>
|
||||
@ -475,17 +475,17 @@ features from a single or multiple sound files. Please refer to
|
||||
</section>
|
||||
<section id="inference-with-a-pre-trained-model">
|
||||
<h3>Inference with a pre-trained model<a class="headerlink" href="#inference-with-a-pre-trained-model" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
|
||||
$ ./tdnn/pretrained.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
|
||||
$<span class="w"> </span>./tdnn/pretrained.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the usage information of <code class="docutils literal notranslate"><span class="pre">./tdnn/pretrained.py</span></code>.</p>
|
||||
<p>To decode a single file, we can use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall_asr_yesno_tdnn/pretrained.pt <span class="se">\</span>
|
||||
--words-file ./tmp/icefall_asr_yesno_tdnn/lang_phone/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp/icefall_asr_yesno_tdnn/lang_phone/HLG.pt <span class="se">\</span>
|
||||
./tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The output is:</p>
|
||||
@ -507,12 +507,12 @@ $ ./tdnn/pretrained.py --help
|
||||
<p>You can see that for the sound file <code class="docutils literal notranslate"><span class="pre">0_0_1_0_1_0_0_1.wav</span></code>, the decoding result is
|
||||
<code class="docutils literal notranslate"><span class="pre">NO</span> <span class="pre">NO</span> <span class="pre">YES</span> <span class="pre">NO</span> <span class="pre">YES</span> <span class="pre">NO</span> <span class="pre">NO</span> <span class="pre">YES</span></code>.</p>
|
||||
<p>To decode <strong>multiple</strong> files at the same time, you can use</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./tmp/icefall_asr_yesno_tdnn/pretrained.pt <span class="se">\</span>
|
||||
--words-file ./tmp/icefall_asr_yesno_tdnn/lang_phone/words.txt <span class="se">\</span>
|
||||
--HLG ./tmp/icefall_asr_yesno_tdnn/lang_phone/HLG.pt <span class="se">\</span>
|
||||
./tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav <span class="se">\</span>
|
||||
./tmp/icefall_asr_yesno_tdnn/test_waves/1_0_1_1_0_1_1_1.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/test_waves/1_0_1_1_0_1_1_1.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The decoding output is:</p>
|
||||
|
||||
@ -151,11 +151,11 @@ to run <code class="docutils literal notranslate"><span class="pre">(2)</span></
|
||||
</section>
|
||||
<section id="data-preparation">
|
||||
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
|
||||
<span class="c1"># If you use (1), you can **skip** the following command</span>
|
||||
$ ./prepare_giga_speech.sh
|
||||
$<span class="w"> </span>./prepare_giga_speech.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -174,13 +174,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -212,8 +212,8 @@ the following YouTube channel by <a class="reference external" href="https://www
|
||||
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./lstm_transducer_stateless2/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -262,26 +262,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -333,7 +333,7 @@ You will find the following files in that directory:</p>
|
||||
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./lstm_transducer_stateless2/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -343,7 +343,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./lstm_transducer_stateless2/train.py --start-batch <span class="m">436000</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -352,8 +352,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> lstm_transducer_stateless2/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"LSTM transducer training for LibriSpeech with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>lstm_transducer_stateless2/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"LSTM transducer training for LibriSpeech with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -390,8 +390,8 @@ the following screenshot:</p>
|
||||
<p>If you don’t have access to google, you can use the following command
|
||||
to view the tensorboard log locally:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> lstm_transducer_stateless2/exp/tensorboard
|
||||
tensorboard --logdir . --port <span class="m">6008</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>lstm_transducer_stateless2/exp/tensorboard
|
||||
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -416,18 +416,18 @@ you saw printed to the console during training.</p>
|
||||
<section id="usage-example">
|
||||
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
|
||||
<p>You can use the following command to start the training using 8 GPUs:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3,4,5,6,7"</span>
|
||||
./lstm_transducer_stateless2/train.py <span class="se">\</span>
|
||||
--world-size <span class="m">8</span> <span class="se">\</span>
|
||||
--num-epochs <span class="m">35</span> <span class="se">\</span>
|
||||
--start-epoch <span class="m">1</span> <span class="se">\</span>
|
||||
--full-libri <span class="m">1</span> <span class="se">\</span>
|
||||
--exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">500</span> <span class="se">\</span>
|
||||
--use-fp16 <span class="m">0</span> <span class="se">\</span>
|
||||
--lr-epochs <span class="m">10</span> <span class="se">\</span>
|
||||
--num-workers <span class="m">2</span> <span class="se">\</span>
|
||||
--giga-prob <span class="m">0</span>.9
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3,4,5,6,7"</span>
|
||||
./lstm_transducer_stateless2/train.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">35</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--lr-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-workers<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--giga-prob<span class="w"> </span><span class="m">0</span>.9
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -452,51 +452,51 @@ every <code class="docutils literal notranslate"><span class="pre">--save-every-
|
||||
that produces the lowest WERs.</p>
|
||||
</div></blockquote>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./lstm_transducer_stateless2/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./lstm_transducer_stateless2/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.</p>
|
||||
<p>The following shows two examples:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">17</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">1</span> <span class="m">2</span><span class="p">;</span> <span class="k">do</span>
|
||||
./lstm_transducer_stateless2/decode.py <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--num-encoder-layers <span class="m">12</span> <span class="se">\</span>
|
||||
--rnn-hidden-size <span class="m">1024</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span> <span class="se">\</span>
|
||||
--use-averaged-model True <span class="se">\</span>
|
||||
--beam <span class="m">4</span> <span class="se">\</span>
|
||||
--max-contexts <span class="m">4</span> <span class="se">\</span>
|
||||
--max-states <span class="m">8</span> <span class="se">\</span>
|
||||
--beam-size <span class="m">4</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">17</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">2</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./lstm_transducer_stateless2/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-encoder-layers<span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--rnn-hidden-size<span class="w"> </span><span class="m">1024</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-averaged-model<span class="w"> </span>True<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--beam<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-contexts<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-states<span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--beam-size<span class="w"> </span><span class="m">4</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
|
||||
./lstm_transducer_stateless2/decode.py <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--num-encoder-layers <span class="m">12</span> <span class="se">\</span>
|
||||
--rnn-hidden-size <span class="m">1024</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span> <span class="se">\</span>
|
||||
--use-averaged-model True <span class="se">\</span>
|
||||
--beam <span class="m">4</span> <span class="se">\</span>
|
||||
--max-contexts <span class="m">4</span> <span class="se">\</span>
|
||||
--max-states <span class="m">8</span> <span class="se">\</span>
|
||||
--beam-size <span class="m">4</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./lstm_transducer_stateless2/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-encoder-layers<span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--rnn-hidden-size<span class="w"> </span><span class="m">1024</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-averaged-model<span class="w"> </span>True<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--beam<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-contexts<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-states<span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--beam-size<span class="w"> </span><span class="m">4</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
@ -516,11 +516,11 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
|
||||
|
||||
./lstm_transducer_stateless2/export.py <span class="se">\</span>
|
||||
--exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span>
|
||||
./lstm_transducer_stateless2/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/pretrained.pt</span></code>.</p>
|
||||
@ -528,8 +528,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/decode.py</span></code>,
|
||||
you can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> lstm_transducer_stateless2/exp
|
||||
ln -s pretrained epoch-9999.pt
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>lstm_transducer_stateless2/exp
|
||||
ln<span class="w"> </span>-s<span class="w"> </span>pretrained<span class="w"> </span>epoch-9999.pt
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
|
||||
@ -537,12 +537,12 @@ ln -s pretrained epoch-9999.pt
|
||||
</div>
|
||||
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/pretrained.py</span></code>, you
|
||||
can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./lstm_transducer_stateless2/exp/pretrained.pt <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method greedy_search <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./lstm_transducer_stateless2/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -551,12 +551,12 @@ can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
|
||||
|
||||
./lstm_transducer_stateless2/export.py <span class="se">\</span>
|
||||
--exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--jit-trace <span class="m">1</span>
|
||||
./lstm_transducer_stateless2/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit-trace<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate 3 files:</p>
|
||||
@ -568,13 +568,13 @@ can run:</p>
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/jit_pretrained</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/jit_pretrained.py <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt <span class="se">\</span>
|
||||
--decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt <span class="se">\</span>
|
||||
--joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/jit_pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--encoder-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoder-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -589,19 +589,19 @@ for how to use the exported models in <code class="docutils literal notranslate"
|
||||
<a class="reference external" href="https://github.com/tencent/ncnn">ncnn</a> using
|
||||
<a class="reference external" href="https://github.com/Tencent/ncnn/tree/master/tools/pnnx">pnnx</a>.</p>
|
||||
<p>First, let us install a modified version of <code class="docutils literal notranslate"><span class="pre">ncnn</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git clone https://github.com/csukuangfj/ncnn
|
||||
<span class="nb">cd</span> ncnn
|
||||
git submodule update --recursive --init
|
||||
python3 setup.py bdist_wheel
|
||||
ls -lh dist/
|
||||
pip install ./dist/*.whl
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/csukuangfj/ncnn
|
||||
<span class="nb">cd</span><span class="w"> </span>ncnn
|
||||
git<span class="w"> </span>submodule<span class="w"> </span>update<span class="w"> </span>--recursive<span class="w"> </span>--init
|
||||
python3<span class="w"> </span>setup.py<span class="w"> </span>bdist_wheel
|
||||
ls<span class="w"> </span>-lh<span class="w"> </span>dist/
|
||||
pip<span class="w"> </span>install<span class="w"> </span>./dist/*.whl
|
||||
|
||||
<span class="c1"># now build pnnx</span>
|
||||
<span class="nb">cd</span> tools/pnnx
|
||||
mkdir build
|
||||
<span class="nb">cd</span> build
|
||||
make -j4
|
||||
<span class="nb">export</span> <span class="nv">PATH</span><span class="o">=</span><span class="nv">$PWD</span>/src:<span class="nv">$PATH</span>
|
||||
<span class="nb">cd</span><span class="w"> </span>tools/pnnx
|
||||
mkdir<span class="w"> </span>build
|
||||
<span class="nb">cd</span><span class="w"> </span>build
|
||||
make<span class="w"> </span>-j4
|
||||
<span class="nb">export</span><span class="w"> </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$PWD</span>/src:<span class="nv">$PATH</span>
|
||||
|
||||
./src/pnnx
|
||||
</pre></div>
|
||||
@ -616,12 +616,12 @@ for <code class="docutils literal notranslate"><span class="pre">pnnx</span></co
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
|
||||
|
||||
./lstm_transducer_stateless2/export.py <span class="se">\</span>
|
||||
--exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--pnnx <span class="m">1</span>
|
||||
./lstm_transducer_stateless2/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--pnnx<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate 3 files:</p>
|
||||
@ -650,26 +650,26 @@ for <code class="docutils literal notranslate"><span class="pre">pnnx</span></co
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>To use the above generated files, run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/ncnn-decode.py <span class="se">\</span>
|
||||
--bpe-model-filename ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||
/path/to/foo.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/ncnn-decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model-filename<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--encoder-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--encoder-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoder-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoder-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/streaming-ncnn-decode.py <span class="se">\</span>
|
||||
--bpe-model-filename ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||
/path/to/foo.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/streaming-ncnn-decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model-filename<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--encoder-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--encoder-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoder-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoder-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To use the above generated files in C++, please see
|
||||
|
||||
@ -145,8 +145,8 @@ That is, it has no recurrent connections.</p>
|
||||
<p>The data preparation is the same as other recipes on LibriSpeech dataset,
|
||||
if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Training</span></code> directly.</p>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -161,13 +161,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -206,8 +206,8 @@ You can see the configurable options below for their meanings or read <a class="
|
||||
</div>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless4/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -259,26 +259,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -353,7 +353,7 @@ You will find the following files in that directory:</p>
|
||||
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -363,7 +363,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-batch <span class="m">436000</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -372,8 +372,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"pruned transducer training for LibriSpeech with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"pruned transducer training for LibriSpeech with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -410,8 +410,8 @@ the following screenshot:</p>
|
||||
<p>If you don’t have access to google, you can use the following command
|
||||
to view the tensorboard log locally:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
|
||||
tensorboard --logdir . --port <span class="m">6008</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp/tensorboard
|
||||
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -436,16 +436,16 @@ you saw printed to the console during training.</p>
|
||||
<section id="usage-example">
|
||||
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
|
||||
<p>You can use the following command to start the training using 4 GPUs:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
./pruned_transducer_stateless4/train.py <span class="se">\</span>
|
||||
--world-size <span class="m">4</span> <span class="se">\</span>
|
||||
--dynamic-chunk-training <span class="m">1</span> <span class="se">\</span>
|
||||
--causal-convolution <span class="m">1</span> <span class="se">\</span>
|
||||
--num-epochs <span class="m">30</span> <span class="se">\</span>
|
||||
--start-epoch <span class="m">1</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--full-libri <span class="m">1</span> <span class="se">\</span>
|
||||
--max-duration <span class="m">300</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
./pruned_transducer_stateless4/train.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--dynamic-chunk-training<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">300</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition note">
|
||||
@ -489,8 +489,8 @@ produce almost the same results given the same <code class="docutils literal not
|
||||
</div>
|
||||
<section id="simulate-streaming-decoding">
|
||||
<h3>Simulate streaming decoding<a class="headerlink" href="#simulate-streaming-decoding" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless4/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.
|
||||
@ -525,47 +525,47 @@ the attention mask.</p>
|
||||
</div></blockquote>
|
||||
</div></blockquote>
|
||||
<p>The following shows two examples (for the two types of checkpoints):</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
|
||||
--causal-convolution <span class="m">1</span> <span class="se">\</span>
|
||||
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
|
||||
--left-context <span class="m">64</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="m">20</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--simulate-streaming<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-size<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--left-context<span class="w"> </span><span class="m">64</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
|
||||
--causal-convolution <span class="m">1</span> <span class="se">\</span>
|
||||
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
|
||||
--left-context <span class="m">64</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--simulate-streaming<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-size<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--left-context<span class="w"> </span><span class="m">64</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="real-streaming-decoding">
|
||||
<h3>Real streaming decoding<a class="headerlink" href="#real-streaming-decoding" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless4/streaming_decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless4/streaming_decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.
|
||||
@ -599,37 +599,37 @@ the performance for all the models, the reasons might be the training and decodi
|
||||
can try decoding with <code class="docutils literal notranslate"><span class="pre">--right-context</span></code> to see if it helps. The default value is 0.</p>
|
||||
</div>
|
||||
<p>The following shows two examples (for the two types of checkpoints):</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
|
||||
--left-context <span class="m">64</span> <span class="se">\</span>
|
||||
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="m">20</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-size<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--left-context<span class="w"> </span><span class="m">64</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-decode-streams<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
|
||||
--left-context <span class="m">64</span> <span class="se">\</span>
|
||||
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-size<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--left-context<span class="w"> </span><span class="m">64</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-decode-streams<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
@ -704,13 +704,13 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<span class="nv">epoch</span><span class="o">=</span><span class="m">25</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">3</span>
|
||||
|
||||
./pruned_transducer_stateless4/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--streaming-model <span class="m">1</span> <span class="se">\</span>
|
||||
--causal-convolution <span class="m">1</span> <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span>
|
||||
./pruned_transducer_stateless4/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--streaming-model<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
@ -723,8 +723,8 @@ a streaming mdoel.</p>
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code>,
|
||||
you can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp
|
||||
ln -s pretrained.pt epoch-999.pt
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp
|
||||
ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-999.pt
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
|
||||
@ -732,27 +732,27 @@ ln -s pretrained.pt epoch-999.pt
|
||||
</div>
|
||||
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/pretrained.py</span></code>, you
|
||||
can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./pruned_transducer_stateless4/exp/pretrained.pt <span class="se">\</span>
|
||||
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
|
||||
--causal-convolution <span class="m">1</span> <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method greedy_search <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless4/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--simulate-streaming<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="export-model-using-torch-jit-script">
|
||||
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
|
||||
--streaming-model <span class="m">1</span> <span class="se">\</span>
|
||||
--causal-convolution <span class="m">1</span> <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="m">25</span> <span class="se">\</span>
|
||||
--avg <span class="m">3</span> <span class="se">\</span>
|
||||
--jit <span class="m">1</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--streaming-model<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
|
||||
@ -141,8 +141,8 @@ That is, it has no recurrent connections.</p>
|
||||
<p>The data preparation is the same as other recipes on LibriSpeech dataset,
|
||||
if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Training</span></code> directly.</p>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||
@ -157,13 +157,13 @@ options:</p>
|
||||
</div></blockquote>
|
||||
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
|
||||
<p>For example,</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>means to run only stage 0.</p>
|
||||
<p>To run stage 2 to stage 5, use:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition hint">
|
||||
@ -195,8 +195,8 @@ the following YouTube channel by <a class="reference external" href="https://www
|
||||
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
|
||||
<section id="configurable-options">
|
||||
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless7_streaming/train.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows you the training options that can be passed from the commandline.
|
||||
@ -248,26 +248,26 @@ training from epoch 10, based on the state from epoch 9.</p>
|
||||
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
|
||||
GPU 2 for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$ ./pruned_transducer_stateless7_streaming/train.py --world-size <span class="m">2</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,2"</span>
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless7_streaming/train.py --world-size <span class="m">4</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
|
||||
for training. You can do the following:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$ ./pruned_transducer_stateless7_streaming/train.py --world-size <span class="m">1</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"3"</span>
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -334,7 +334,7 @@ You will find the following files in that directory:</p>
|
||||
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_streaming/train.py --start-epoch <span class="m">11</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -344,7 +344,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
|
||||
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_streaming/train.py --start-batch <span class="m">436000</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -353,8 +353,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
||||
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless7_streaming/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description <span class="s2">"pruned transducer training for LibriSpeech with icefall"</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_streaming/exp/tensorboard
|
||||
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">"pruned transducer training for LibriSpeech with icefall"</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -365,8 +365,8 @@ $ tensorboard dev upload --logdir . --description <span class="s2">"pruned
|
||||
<p>If you don’t have access to google, you can use the following command
|
||||
to view the tensorboard log locally:</p>
|
||||
<blockquote>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_streaming/exp/tensorboard
|
||||
tensorboard --logdir . --port <span class="m">6008</span>
|
||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_streaming/exp/tensorboard
|
||||
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div></blockquote>
|
||||
@ -391,15 +391,15 @@ you saw printed to the console during training.</p>
|
||||
<section id="usage-example">
|
||||
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
|
||||
<p>You can use the following command to start the training using 4 GPUs:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
./pruned_transducer_stateless7_streaming/train.py <span class="se">\</span>
|
||||
--world-size <span class="m">4</span> <span class="se">\</span>
|
||||
--num-epochs <span class="m">30</span> <span class="se">\</span>
|
||||
--start-epoch <span class="m">1</span> <span class="se">\</span>
|
||||
--use-fp16 <span class="m">1</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
|
||||
--full-libri <span class="m">1</span> <span class="se">\</span>
|
||||
--max-duration <span class="m">550</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">"0,1,2,3"</span>
|
||||
./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">550</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
@ -438,8 +438,8 @@ produce almost the same results given the same <code class="docutils literal not
|
||||
</div>
|
||||
<section id="simulate-streaming-decoding">
|
||||
<h3>Simulate streaming decoding<a class="headerlink" href="#simulate-streaming-decoding" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless7_streaming/decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.
|
||||
@ -452,41 +452,41 @@ The default value is 32 (i.e., 320ms).</p>
|
||||
</div></blockquote>
|
||||
</div></blockquote>
|
||||
<p>The following shows two examples (for the two types of checkpoints):</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">30</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">12</span> <span class="m">11</span> <span class="m">10</span> <span class="m">9</span> <span class="m">8</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless7_streaming/decode.py <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">30</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="m">8</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless7_streaming/decode.py <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
|
||||
--max-duration <span class="m">600</span> <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="real-streaming-decoding">
|
||||
<h3>Real streaming decoding<a class="headerlink" href="#real-streaming-decoding" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
|
||||
$ ./pruned_transducer_stateless7_streaming/streaming_decode.py --help
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
|
||||
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/streaming_decode.py<span class="w"> </span>--help
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>shows the options for decoding.
|
||||
@ -507,33 +507,33 @@ suppose sequence 1 and 2 are done, so, sequence 3 to 12 will be processed parall
|
||||
</div></blockquote>
|
||||
</div></blockquote>
|
||||
<p>The following shows two examples (for the two types of checkpoints):</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">30</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">12</span> <span class="m">11</span> <span class="m">10</span> <span class="m">9</span> <span class="m">8</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless7_streaming/decode.py <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
|
||||
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">30</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="m">8</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-decode-streams<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
|
||||
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
|
||||
./pruned_transducer_stateless7_streaming/decode.py <span class="se">\</span>
|
||||
--iter <span class="nv">$iter</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">16</span> <span class="se">\</span>
|
||||
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
|
||||
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
|
||||
--decoding-method <span class="nv">$m</span>
|
||||
<span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
|
||||
<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--num-decode-streams<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="w"> </span><span class="k">done</span>
|
||||
<span class="k">done</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
@ -608,13 +608,13 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<span class="nv">epoch</span><span class="o">=</span><span class="m">30</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">9</span>
|
||||
|
||||
./pruned_transducer_stateless7_streaming/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span> <span class="se">\</span>
|
||||
--use-averaged-model<span class="o">=</span>True <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">32</span>
|
||||
./pruned_transducer_stateless7_streaming/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-averaged-model<span class="o">=</span>True<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_streaming/exp/pretrained.pt</span></code>.</p>
|
||||
@ -622,8 +622,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
|
||||
<p class="admonition-title">Hint</p>
|
||||
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_streaming/decode.py</span></code>,
|
||||
you can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_streaming/exp
|
||||
ln -s pretrained.pt epoch-999.pt
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_streaming/exp
|
||||
ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-999.pt
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
|
||||
@ -631,25 +631,25 @@ ln -s pretrained.pt epoch-999.pt
|
||||
</div>
|
||||
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_streaming/pretrained.py</span></code>, you
|
||||
can run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/pretrained.py <span class="se">\</span>
|
||||
--checkpoint ./pruned_transducer_stateless7_streaming/exp/pretrained.pt <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--method greedy_search <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
|
||||
/path/to/foo.wav <span class="se">\</span>
|
||||
/path/to/bar.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/bar.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="export-model-using-torch-jit-script">
|
||||
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/export.py <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--epoch <span class="m">30</span> <span class="se">\</span>
|
||||
--avg <span class="m">9</span> <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
|
||||
--jit <span class="m">1</span>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
@ -666,13 +666,13 @@ are on CPU. You can use <code class="docutils literal notranslate"><span class="
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">epoch</span><span class="o">=</span><span class="m">30</span>
|
||||
<span class="nv">avg</span><span class="o">=</span><span class="m">9</span>
|
||||
|
||||
./pruned_transducer_stateless7_streaming/jit_trace_export.py <span class="se">\</span>
|
||||
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--use-averaged-model<span class="o">=</span>True <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
|
||||
--exp-dir ./pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
|
||||
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
|
||||
--avg <span class="nv">$avg</span>
|
||||
./pruned_transducer_stateless7_streaming/jit_trace_export.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--use-averaged-model<span class="o">=</span>True<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="admonition caution">
|
||||
@ -688,13 +688,13 @@ are on CPU. You can use <code class="docutils literal notranslate"><span class="
|
||||
</ul>
|
||||
</div></blockquote>
|
||||
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_streaming/jit_trace_pretrained.py</span></code>:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/jit_trace_pretrained.py <span class="se">\</span>
|
||||
--encoder-model-filename ./pruned_transducer_stateless7_streaming/exp/encoder_jit_trace.pt <span class="se">\</span>
|
||||
--decoder-model-filename ./pruned_transducer_stateless7_streaming/exp/decoder_jit_trace.pt <span class="se">\</span>
|
||||
--joiner-model-filename ./pruned_transducer_stateless7_streaming/exp/joiner_jit_trace.pt <span class="se">\</span>
|
||||
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
|
||||
/path/to/foo.wav
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/jit_trace_pretrained.py<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--encoder-model-filename<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp/encoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decoder-model-filename<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp/decoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--joiner-model-filename<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp/joiner_jit_trace.pt<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
|
||||
<span class="w"> </span>/path/to/foo.wav
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user