deploy: 2fd970b6821d47dacb2e6513321520db21fff67b

This commit is contained in:
csukuangfj 2023-01-02 00:09:27 +00:00
parent 67a922737c
commit 9289dab4d6
22 changed files with 1632 additions and 1632 deletions

View File

@ -140,22 +140,22 @@ it should succeed this time:</p>
<p>If you want to check the style of your code before <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code>, you
can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ pre-commit install
$ pre-commit run
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>pre-commit<span class="w"> </span>install
$<span class="w"> </span>pre-commit<span class="w"> </span>run
</pre></div>
</div>
</div></blockquote>
<p>Or without installing the pre-commit hooks:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> icefall
$ pip install <span class="nv">black</span><span class="o">==</span><span class="m">22</span>.3.0 <span class="nv">flake8</span><span class="o">==</span><span class="m">5</span>.0.4 <span class="nv">isort</span><span class="o">==</span><span class="m">5</span>.10.1
$ black --check your_changed_file.py
$ black your_changed_file.py <span class="c1"># modify it in-place</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>icefall
$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span><span class="nv">black</span><span class="o">==</span><span class="m">22</span>.3.0<span class="w"> </span><span class="nv">flake8</span><span class="o">==</span><span class="m">5</span>.0.4<span class="w"> </span><span class="nv">isort</span><span class="o">==</span><span class="m">5</span>.10.1
$<span class="w"> </span>black<span class="w"> </span>--check<span class="w"> </span>your_changed_file.py
$<span class="w"> </span>black<span class="w"> </span>your_changed_file.py<span class="w"> </span><span class="c1"># modify it in-place</span>
$
$ flake8 your_changed_file.py
$<span class="w"> </span>flake8<span class="w"> </span>your_changed_file.py
$
$ isort --check your_changed_file.py <span class="c1"># modify it in-place</span>
$ isort your_changed_file.py
$<span class="w"> </span>isort<span class="w"> </span>--check<span class="w"> </span>your_changed_file.py<span class="w"> </span><span class="c1"># modify it in-place</span>
$<span class="w"> </span>isort<span class="w"> </span>your_changed_file.py
</pre></div>
</div>
</div></blockquote>

View File

@ -88,8 +88,8 @@
for documentation.</p>
<p>Before writing documentation, you have to prepare the environment:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> docs
$ pip install -r requirements.txt
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>docs
$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
</pre></div>
</div>
</div></blockquote>
@ -99,16 +99,16 @@ if you are not familiar with <code class="docutils literal notranslate"><span cl
<p>After writing some documentation, you can build the documentation <strong>locally</strong>
to preview what it looks like if it is published:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> docs
$ make html
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>docs
$<span class="w"> </span>make<span class="w"> </span>html
</pre></div>
</div>
</div></blockquote>
<p>The generated documentation is in <code class="docutils literal notranslate"><span class="pre">docs/build/html</span></code> and can be viewed
with the following commands:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> docs/build/html
$ python3 -m http.server
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>docs/build/html
$<span class="w"> </span>python3<span class="w"> </span>-m<span class="w"> </span>http.server
</pre></div>
</div>
</div></blockquote>

View File

@ -140,12 +140,12 @@ $ touch README.md model.py train.py decode.py asr_datamodule.py pretrained.py
<p>For instance , the <code class="docutils literal notranslate"><span class="pre">yesno</span></code> recipe has a <code class="docutils literal notranslate"><span class="pre">tdnn</span></code> model and its directory structure
looks like the following:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>egs/yesno/ASR/tdnn/
<span class="p">|</span>-- README.md
<span class="p">|</span>-- asr_datamodule.py
<span class="p">|</span>-- decode.py
<span class="p">|</span>-- model.py
<span class="p">|</span>-- pretrained.py
<span class="sb">`</span>-- train.py
<span class="p">|</span>--<span class="w"> </span>README.md
<span class="p">|</span>--<span class="w"> </span>asr_datamodule.py
<span class="p">|</span>--<span class="w"> </span>decode.py
<span class="p">|</span>--<span class="w"> </span>model.py
<span class="p">|</span>--<span class="w"> </span>pretrained.py
<span class="sb">`</span>--<span class="w"> </span>train.py
</pre></div>
</div>
<p><strong>File description</strong>:</p>

View File

@ -166,11 +166,11 @@ to install <code class="docutils literal notranslate"><span class="pre">lhotse</
and set the environment variable <code class="docutils literal notranslate"><span class="pre">PYTHONPATH</span></code> to point to it.</p>
<p>Assume you want to place <code class="docutils literal notranslate"><span class="pre">icefall</span></code> in the folder <code class="docutils literal notranslate"><span class="pre">/tmp</span></code>. The
following commands show you how to setup <code class="docutils literal notranslate"><span class="pre">icefall</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> /tmp
git clone https://github.com/k2-fsa/icefall
<span class="nb">cd</span> icefall
pip install -r requirements.txt
<span class="nb">export</span> <span class="nv">PYTHONPATH</span><span class="o">=</span>/tmp/icefall:<span class="nv">$PYTHONPATH</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>/tmp
git<span class="w"> </span>clone<span class="w"> </span>https://github.com/k2-fsa/icefall
<span class="nb">cd</span><span class="w"> </span>icefall
pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
<span class="nb">export</span><span class="w"> </span><span class="nv">PYTHONPATH</span><span class="o">=</span>/tmp/icefall:<span class="nv">$PYTHONPATH</span>
</pre></div>
</div>
<div class="admonition hint">
@ -185,39 +185,39 @@ to point to the version you want.</p>
<p>The following shows an example about setting up the environment.</p>
<section id="create-a-virtual-environment">
<h3>(1) Create a virtual environment<a class="headerlink" href="#create-a-virtual-environment" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ virtualenv -p python3.8 test-icefall
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>virtualenv<span class="w"> </span>-p<span class="w"> </span>python3.8<span class="w"> </span>test-icefall
created virtual environment CPython3.8.6.final.0-64 <span class="k">in</span> 1540ms
creator CPython3Posix<span class="o">(</span><span class="nv">dest</span><span class="o">=</span>/ceph-fj/fangjun/test-icefall, <span class="nv">clear</span><span class="o">=</span>False, <span class="nv">no_vcs_ignore</span><span class="o">=</span>False, <span class="nv">global</span><span class="o">=</span>False<span class="o">)</span>
seeder FromAppData<span class="o">(</span><span class="nv">download</span><span class="o">=</span>False, <span class="nv">pip</span><span class="o">=</span>bundle, <span class="nv">setuptools</span><span class="o">=</span>bundle, <span class="nv">wheel</span><span class="o">=</span>bundle, <span class="nv">via</span><span class="o">=</span>copy, <span class="nv">app_data_dir</span><span class="o">=</span>/root/fangjun/.local/share/v
created<span class="w"> </span>virtual<span class="w"> </span>environment<span class="w"> </span>CPython3.8.6.final.0-64<span class="w"> </span><span class="k">in</span><span class="w"> </span>1540ms
<span class="w"> </span>creator<span class="w"> </span>CPython3Posix<span class="o">(</span><span class="nv">dest</span><span class="o">=</span>/ceph-fj/fangjun/test-icefall,<span class="w"> </span><span class="nv">clear</span><span class="o">=</span>False,<span class="w"> </span><span class="nv">no_vcs_ignore</span><span class="o">=</span>False,<span class="w"> </span><span class="nv">global</span><span class="o">=</span>False<span class="o">)</span>
<span class="w"> </span>seeder<span class="w"> </span>FromAppData<span class="o">(</span><span class="nv">download</span><span class="o">=</span>False,<span class="w"> </span><span class="nv">pip</span><span class="o">=</span>bundle,<span class="w"> </span><span class="nv">setuptools</span><span class="o">=</span>bundle,<span class="w"> </span><span class="nv">wheel</span><span class="o">=</span>bundle,<span class="w"> </span><span class="nv">via</span><span class="o">=</span>copy,<span class="w"> </span><span class="nv">app_data_dir</span><span class="o">=</span>/root/fangjun/.local/share/v
irtualenv<span class="o">)</span>
added seed packages: <span class="nv">pip</span><span class="o">==</span><span class="m">21</span>.1.3, <span class="nv">setuptools</span><span class="o">==</span><span class="m">57</span>.4.0, <span class="nv">wheel</span><span class="o">==</span><span class="m">0</span>.36.2
activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
<span class="w"> </span>added<span class="w"> </span>seed<span class="w"> </span>packages:<span class="w"> </span><span class="nv">pip</span><span class="o">==</span><span class="m">21</span>.1.3,<span class="w"> </span><span class="nv">setuptools</span><span class="o">==</span><span class="m">57</span>.4.0,<span class="w"> </span><span class="nv">wheel</span><span class="o">==</span><span class="m">0</span>.36.2
<span class="w"> </span>activators<span class="w"> </span>BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
</pre></div>
</div>
</section>
<section id="activate-your-virtual-environment">
<h3>(2) Activate your virtual environment<a class="headerlink" href="#activate-your-virtual-environment" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">source</span> test-icefall/bin/activate
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">source</span><span class="w"> </span>test-icefall/bin/activate
</pre></div>
</div>
</section>
<section id="id1">
<h3>(3) Install k2<a class="headerlink" href="#id1" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ pip install <span class="nv">k2</span><span class="o">==</span><span class="m">1</span>.4.dev20210822+cpu.torch1.9.0 -f https://k2-fsa.org/nightly/index.html
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span><span class="nv">k2</span><span class="o">==</span><span class="m">1</span>.4.dev20210822+cpu.torch1.9.0<span class="w"> </span>-f<span class="w"> </span>https://k2-fsa.org/nightly/index.html
Looking <span class="k">in</span> links: https://k2-fsa.org/nightly/index.html
Collecting <span class="nv">k2</span><span class="o">==</span><span class="m">1</span>.4.dev20210822+cpu.torch1.9.0
Downloading https://k2-fsa.org/nightly/whl/k2-1.4.dev20210822%2Bcpu.torch1.9.0-cp38-cp38-linux_x86_64.whl <span class="o">(</span><span class="m">1</span>.6 MB<span class="o">)</span>
<span class="p">|</span>________________________________<span class="p">|</span> <span class="m">1</span>.6 MB <span class="m">185</span> kB/s
Collecting graphviz
Downloading graphviz-0.17-py3-none-any.whl <span class="o">(</span><span class="m">18</span> kB<span class="o">)</span>
Collecting <span class="nv">torch</span><span class="o">==</span><span class="m">1</span>.9.0
Using cached torch-1.9.0-cp38-cp38-manylinux1_x86_64.whl <span class="o">(</span><span class="m">831</span>.4 MB<span class="o">)</span>
Collecting typing-extensions
Using cached typing_extensions-3.10.0.0-py3-none-any.whl <span class="o">(</span><span class="m">26</span> kB<span class="o">)</span>
Installing collected packages: typing-extensions, torch, graphviz, k2
Successfully installed graphviz-0.17 k2-1.4.dev20210822+cpu.torch1.9.0 torch-1.9.0 typing-extensions-3.10.0.0
Looking<span class="w"> </span><span class="k">in</span><span class="w"> </span>links:<span class="w"> </span>https://k2-fsa.org/nightly/index.html
Collecting<span class="w"> </span><span class="nv">k2</span><span class="o">==</span><span class="m">1</span>.4.dev20210822+cpu.torch1.9.0
<span class="w"> </span>Downloading<span class="w"> </span>https://k2-fsa.org/nightly/whl/k2-1.4.dev20210822%2Bcpu.torch1.9.0-cp38-cp38-linux_x86_64.whl<span class="w"> </span><span class="o">(</span><span class="m">1</span>.6<span class="w"> </span>MB<span class="o">)</span>
<span class="w"> </span><span class="p">|</span>________________________________<span class="p">|</span><span class="w"> </span><span class="m">1</span>.6<span class="w"> </span>MB<span class="w"> </span><span class="m">185</span><span class="w"> </span>kB/s
Collecting<span class="w"> </span>graphviz
<span class="w"> </span>Downloading<span class="w"> </span>graphviz-0.17-py3-none-any.whl<span class="w"> </span><span class="o">(</span><span class="m">18</span><span class="w"> </span>kB<span class="o">)</span>
Collecting<span class="w"> </span><span class="nv">torch</span><span class="o">==</span><span class="m">1</span>.9.0
<span class="w"> </span>Using<span class="w"> </span>cached<span class="w"> </span>torch-1.9.0-cp38-cp38-manylinux1_x86_64.whl<span class="w"> </span><span class="o">(</span><span class="m">831</span>.4<span class="w"> </span>MB<span class="o">)</span>
Collecting<span class="w"> </span>typing-extensions
<span class="w"> </span>Using<span class="w"> </span>cached<span class="w"> </span>typing_extensions-3.10.0.0-py3-none-any.whl<span class="w"> </span><span class="o">(</span><span class="m">26</span><span class="w"> </span>kB<span class="o">)</span>
Installing<span class="w"> </span>collected<span class="w"> </span>packages:<span class="w"> </span>typing-extensions,<span class="w"> </span>torch,<span class="w"> </span>graphviz,<span class="w"> </span>k2
Successfully<span class="w"> </span>installed<span class="w"> </span>graphviz-0.17<span class="w"> </span>k2-1.4.dev20210822+cpu.torch1.9.0<span class="w"> </span>torch-1.9.0<span class="w"> </span>typing-extensions-3.10.0.0
</pre></div>
</div>
<div class="admonition warning">
@ -393,10 +393,10 @@ the <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/m
on CPU.</p>
<section id="data-preparation">
<h3>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">PYTHONPATH</span><span class="o">=</span>/tmp/icefall:<span class="nv">$PYTHONPATH</span>
$ <span class="nb">cd</span> /tmp/icefall
$ <span class="nb">cd</span> egs/yesno/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">PYTHONPATH</span><span class="o">=</span>/tmp/icefall:<span class="nv">$PYTHONPATH</span>
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>/tmp/icefall
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The log of running <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> is:</p>
@ -457,7 +457,7 @@ even if there are GPUs available.</p>
<p class="admonition-title">Hint</p>
<p>In case you get a <code class="docutils literal notranslate"><span class="pre">Segmentation</span> <span class="pre">fault</span> <span class="pre">(core</span> <span class="pre">dump)</span></code> error, please use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION</span><span class="o">=</span>python
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION</span><span class="o">=</span>python
</pre></div>
</div>
</div></blockquote>

View File

@ -123,13 +123,13 @@ as an example.</p>
<p class="admonition-title">Note</p>
<p>The steps for other recipes are almost the same.</p>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
./pruned_transducer_stateless3/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless3/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">20</span> <span class="se">\</span>
--avg <span class="m">10</span>
./pruned_transducer_stateless3/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless3/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">10</span>
</pre></div>
</div>
<p>will generate a file <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless3/exp/pretrained.pt</span></code>, which
@ -141,10 +141,10 @@ is a dict containing <code class="docutils literal notranslate"><span class="pre
You can find links to pretrained models in <code class="docutils literal notranslate"><span class="pre">RESULTS.md</span></code> of each dataset.</p>
<p>In the following, we demonstrate how to use the pretrained model from
<a class="reference external" href="https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13">https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13</a>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
git<span class="w"> </span>lfs<span class="w"> </span>install
git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
</pre></div>
</div>
<p>After cloning the repo with <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">lfs</span></code>, you will find several files in the folder
@ -153,15 +153,15 @@ that have a prefix <code class="docutils literal notranslate"><span class="pre">
exported by the above <code class="docutils literal notranslate"><span class="pre">export.py</span></code>.</p>
<p>In each recipe, there is also a file <code class="docutils literal notranslate"><span class="pre">pretrained.py</span></code>, which can use
<code class="docutils literal notranslate"><span class="pre">pretrained-xxx.pt</span></code> to decode waves. The following is an example:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
./pruned_transducer_stateless3/pretrained.py <span class="se">\</span>
--checkpoint ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt <span class="se">\</span>
--bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav
./pruned_transducer_stateless3/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>The above commands show how to use the exported model with <code class="docutils literal notranslate"><span class="pre">pretrained.py</span></code> to
@ -195,25 +195,25 @@ decode multiple sound files. Its output is given as follows for reference:</p>
<p>When we publish the model, we always note down its WERs on some test
dataset in <code class="docutils literal notranslate"><span class="pre">RESULTS.md</span></code>. This section describes how to use the
pretrained model to reproduce the WER.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
git<span class="w"> </span>lfs<span class="w"> </span>install
git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
<span class="nb">cd</span> icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp
ln -s pretrained-iter-1224000-avg-14.pt epoch-9999.pt
<span class="nb">cd</span> ../..
<span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp
ln<span class="w"> </span>-s<span class="w"> </span>pretrained-iter-1224000-avg-14.pt<span class="w"> </span>epoch-9999.pt
<span class="nb">cd</span><span class="w"> </span>../..
</pre></div>
</div>
<p>We create a symlink with name <code class="docutils literal notranslate"><span class="pre">epoch-9999.pt</span></code> to <code class="docutils literal notranslate"><span class="pre">pretrained-iter-1224000-avg-14.pt</span></code>,
so that we can pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span></code> to <code class="docutils literal notranslate"><span class="pre">decode.py</span></code> in the following
command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless3/decode.py <span class="se">\</span>
--epoch <span class="m">9999</span> <span class="se">\</span>
--avg <span class="m">1</span> <span class="se">\</span>
--exp-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp <span class="se">\</span>
--lang-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500 <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method greedy_search
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless3/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">9999</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lang-dir<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span>greedy_search
</pre></div>
</div>
<p>You will find the decoding results in

View File

@ -106,16 +106,16 @@ to run the pretrained model.</p>
<p>We use
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3">https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3</a>
as an example in the following.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
<span class="nv">epoch</span><span class="o">=</span><span class="m">14</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">2</span>
./pruned_transducer_stateless3/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless3/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--onnx <span class="m">1</span>
./pruned_transducer_stateless3/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless3/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--onnx<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<p>It will generate the following files inside <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless3/exp</span></code>:</p>
@ -130,16 +130,16 @@ as an example in the following.</p>
</div></blockquote>
<p>You can use <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless3/exp/onnx_pretrained.py</span></code> to decode
waves with the generated files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless3/onnx_pretrained.py <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--encoder-model-filename ./pruned_transducer_stateless3/exp/encoder.onnx <span class="se">\</span>
--decoder-model-filename ./pruned_transducer_stateless3/exp/decoder.onnx <span class="se">\</span>
--joiner-model-filename ./pruned_transducer_stateless3/exp/joiner.onnx <span class="se">\</span>
--joiner-encoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_encoder_proj.onnx <span class="se">\</span>
--joiner-decoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_decoder_proj.onnx <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav <span class="se">\</span>
/path/to/baz.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless3/onnx_pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--encoder-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/encoder.onnx<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoder-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/decoder.onnx<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/joiner.onnx<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-encoder-proj-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/joiner_encoder_proj.onnx<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-decoder-proj-model-filename<span class="w"> </span>./pruned_transducer_stateless3/exp/joiner_decoder_proj.onnx<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/baz.wav
</pre></div>
</div>
</section>

View File

@ -108,16 +108,16 @@ if you want to use <code class="docutils literal notranslate"><span class="pre">
<p>We use
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3">https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3</a>
as an example in the following.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
<span class="nv">epoch</span><span class="o">=</span><span class="m">14</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">1</span>
./pruned_transducer_stateless3/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless3/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--jit <span class="m">1</span>
./pruned_transducer_stateless3/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless3/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless3/exp</span></code>.</p>

View File

@ -111,14 +111,14 @@ as an example in the following.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
<span class="nb">cd</span> egs/librispeech/ASR
<span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
./lstm_transducer_stateless2/export.py <span class="se">\</span>
--exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--jit-trace <span class="m">1</span>
./lstm_transducer_stateless2/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit-trace<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<p>It will generate three files inside <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/exp</span></code>:</p>
@ -132,15 +132,15 @@ as an example in the following.</p>
<p>You can use
<a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/jit_pretrained.py">https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/jit_pretrained.py</a>
to decode sound files with the following commands:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/librispeech/ASR
./lstm_transducer_stateless2/jit_pretrained.py <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt <span class="se">\</span>
--decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt <span class="se">\</span>
--joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav <span class="se">\</span>
/path/to/baz.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
./lstm_transducer_stateless2/jit_pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--encoder-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoder-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/baz.wav
</pre></div>
</div>
</section>

View File

@ -130,8 +130,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
</div></blockquote>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -146,13 +146,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -167,8 +167,8 @@ the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></
<p class="admonition-title">Hint</p>
<p>A 3-gram language model will be downloaded from huggingface, we assume you have
intalled and initialized <code class="docutils literal notranslate"><span class="pre">git-lfs</span></code>. If not, you could install <code class="docutils literal notranslate"><span class="pre">git-lfs</span></code> by</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ sudo apt-get install git-lfs
$ git-lfs install
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>apt-get<span class="w"> </span>install<span class="w"> </span>git-lfs
$<span class="w"> </span>git-lfs<span class="w"> </span>install
</pre></div>
</div>
<p>If you dont have the <code class="docutils literal notranslate"><span class="pre">sudo</span></code> permission, you could download the
@ -184,8 +184,8 @@ are saved in <code class="docutils literal notranslate"><span class="pre">./data
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./conformer_ctc/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -227,26 +227,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./conformer_ctc/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./conformer_ctc/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./conformer_ctc/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -299,7 +299,7 @@ Each epoch actually processes <code class="docutils literal notranslate"><span c
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./conformer_ctc/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -308,8 +308,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> conformer_ctc/exp/tensorboard
$ tensorboard dev upload --logdir . --name <span class="s2">&quot;Aishell conformer ctc training with icefall&quot;</span> --description <span class="s2">&quot;Training with new LabelSmoothing loss, see https://github.com/k2-fsa/icefall/pull/109&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>conformer_ctc/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--name<span class="w"> </span><span class="s2">&quot;Aishell conformer ctc training with icefall&quot;</span><span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;Training with new LabelSmoothing loss, see https://github.com/k2-fsa/icefall/pull/109&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -351,25 +351,25 @@ you saw printed to the console during training.</p>
<p>The following shows typical use cases:</p>
<section id="case-1">
<h4><strong>Case 1</strong><a class="headerlink" href="#case-1" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./conformer_ctc/train.py --max-duration <span class="m">200</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">200</span>
</pre></div>
</div>
<p>It uses <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> of 200 to avoid OOM.</p>
</section>
<section id="case-2">
<h4><strong>Case 2</strong><a class="headerlink" href="#case-2" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,3&quot;</span>
$ ./conformer_ctc/train.py --world-size <span class="m">2</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,3&quot;</span>
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
<p>It uses GPU 0 and GPU 3 for DDP training.</p>
</section>
<section id="case-3">
<h4><strong>Case 3</strong><a class="headerlink" href="#case-3" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./conformer_ctc/train.py --num-epochs <span class="m">10</span> --start-epoch <span class="m">3</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">3</span>
</pre></div>
</div>
<p>It loads checkpoint <code class="docutils literal notranslate"><span class="pre">./conformer_ctc/exp/epoch-2.pt</span></code> and starts
@ -381,8 +381,8 @@ training from epoch 3. Also, it trains for 10 epochs.</p>
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./conformer_ctc/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./conformer_ctc/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.</p>
@ -440,27 +440,27 @@ $ git clone https://huggingface.co/pkufool/icefall_asr_aishell_conformer_ctc
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
</div>
<p>After downloading, you will have the following files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ tree tmp
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>tree<span class="w"> </span>tmp
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/
<span class="sb">`</span>-- icefall_asr_aishell_conformer_ctc
<span class="p">|</span>-- README.md
<span class="p">|</span>-- data
<span class="p">|</span> <span class="sb">`</span>-- lang_char
<span class="p">|</span> <span class="p">|</span>-- HLG.pt
<span class="p">|</span> <span class="p">|</span>-- tokens.txt
<span class="p">|</span> <span class="sb">`</span>-- words.txt
<span class="p">|</span>-- exp
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
<span class="sb">`</span>-- test_waves
<span class="p">|</span>-- BAC009S0764W0121.wav
<span class="p">|</span>-- BAC009S0764W0122.wav
<span class="p">|</span>-- BAC009S0764W0123.wav
<span class="sb">`</span>-- trans.txt
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_aishell_conformer_ctc
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lang_char
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_waves
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0121.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0122.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0123.wav
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
<span class="m">5</span> directories, <span class="m">9</span> files
<span class="m">5</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">9</span><span class="w"> </span>files
</pre></div>
</div>
<p><strong>File descriptions</strong>:</p>
@ -502,38 +502,38 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
</ul>
</div></blockquote>
<p>The information of the test sound files is listed below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall_asr_aishell_conformer_ctc/test_waves/*.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall_asr_aishell_conformer_ctc/test_waves/*.wav
Input File : <span class="s1">&#39;tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.20 <span class="o">=</span> <span class="m">67263</span> samples ~ <span class="m">315</span>.295 CDDA sectors
File Size : 135k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.20<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">67263</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">315</span>.295<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>135k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Input File : <span class="s1">&#39;tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.12 <span class="o">=</span> <span class="m">65840</span> samples ~ <span class="m">308</span>.625 CDDA sectors
File Size : 132k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.12<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">65840</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">308</span>.625<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>132k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Input File : <span class="s1">&#39;tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.00 <span class="o">=</span> <span class="m">64000</span> samples ~ <span class="m">300</span> CDDA sectors
File Size : 128k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.00<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">64000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">300</span><span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>128k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:12.32
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:12.32
</pre></div>
</div>
</section>
@ -556,14 +556,14 @@ $ ./conformer_ctc/pretrained.py --help
<h4>CTC decoding<a class="headerlink" href="#ctc-decoding" title="Permalink to this heading"></a></h4>
<p>CTC decoding only uses the ctc topology for decoding without a lexicon and language model</p>
<p>The command to run CTC decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt <span class="se">\</span>
--tokens-file ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/tokens.txt <span class="se">\</span>
--method ctc-decoding <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--tokens-file<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/tokens.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
</pre></div>
</div>
<p>The output is given below:</p>
@ -593,15 +593,15 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
<h4>HLG decoding<a class="headerlink" href="#hlg-decoding" title="Permalink to this heading"></a></h4>
<p>HLG decoding uses the best path of the decoding lattice as the decoding result.</p>
<p>The command to run HLG decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt <span class="se">\</span>
--words-file ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt <span class="se">\</span>
--HLG ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt <span class="se">\</span>
--method 1best <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
</pre></div>
</div>
<p>The output is given below:</p>
@ -633,15 +633,15 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
<p>It extracts n paths from the lattice, recores the extracted paths with
an attention decoder. The path with the highest score is the decoding result.</p>
<p>The command to run HLG decoding + attention decoder rescoring is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt <span class="se">\</span>
--words-file ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt <span class="se">\</span>
--HLG ./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt <span class="se">\</span>
--method attention-decoder <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>attention-decoder<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
</pre></div>
</div>
<p>The output is below:</p>
@ -693,20 +693,20 @@ Python dependencies.</p>
<p>At present, it does NOT support streaming decoding.</p>
</div>
<p>First, let us compile k2 from source:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> <span class="nv">$HOME</span>
$ git clone https://github.com/k2-fsa/k2
$ <span class="nb">cd</span> k2
$ git checkout v2.0-pre
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span><span class="nv">$HOME</span>
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/k2-fsa/k2
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>k2
$<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>v2.0-pre
</pre></div>
</div>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>You have to switch to the branch <code class="docutils literal notranslate"><span class="pre">v2.0-pre</span></code>!</p>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ mkdir build-release
$ <span class="nb">cd</span> build-release
$ cmake -DCMAKE_BUILD_TYPE<span class="o">=</span>Release ..
$ make -j hlg_decode
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>build-release
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build-release
$<span class="w"> </span>cmake<span class="w"> </span>-DCMAKE_BUILD_TYPE<span class="o">=</span>Release<span class="w"> </span>..
$<span class="w"> </span>make<span class="w"> </span>-j<span class="w"> </span>hlg_decode
<span class="c1"># You will find four binaries in `./bin`, i.e. ./bin/hlg_decode,</span>
</pre></div>
@ -714,8 +714,8 @@ $ make -j hlg_decode
<p>Now you are ready to go!</p>
<p>Assume you have run:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> k2/build-release
$ ln -s /path/to/icefall-asr-aishell-conformer-ctc ./
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>k2/build-release
$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>/path/to/icefall-asr-aishell-conformer-ctc<span class="w"> </span>./
</pre></div>
</div>
</div></blockquote>
@ -724,40 +724,40 @@ $ ln -s /path/to/icefall-asr-aishell-conformer-ctc ./
</pre></div>
</div>
<p>It will show you the following message:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Please provide --nn_model
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Please<span class="w"> </span>provide<span class="w"> </span>--nn_model
This file implements decoding with an HLG decoding graph.
This<span class="w"> </span>file<span class="w"> </span>implements<span class="w"> </span>decoding<span class="w"> </span>with<span class="w"> </span>an<span class="w"> </span>HLG<span class="w"> </span>decoding<span class="w"> </span>graph.
Usage:
./bin/hlg_decode <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model &lt;path to torch scripted pt file&gt; <span class="se">\</span>
--hlg &lt;path to HLG.pt&gt; <span class="se">\</span>
--word_table &lt;path to words.txt&gt; <span class="se">\</span>
&lt;path to foo.wav&gt; <span class="se">\</span>
&lt;path to bar.wav&gt; <span class="se">\</span>
&lt;more waves <span class="k">if</span> any&gt;
<span class="w"> </span>./bin/hlg_decode<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn_model<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>torch<span class="w"> </span>scripted<span class="w"> </span>pt<span class="w"> </span>file&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hlg<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>HLG.pt&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--word_table<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>words.txt&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>foo.wav&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>bar.wav&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>&lt;more<span class="w"> </span>waves<span class="w"> </span><span class="k">if</span><span class="w"> </span>any&gt;
To see all possible options, use
./bin/hlg_decode --help
To<span class="w"> </span>see<span class="w"> </span>all<span class="w"> </span>possible<span class="w"> </span>options,<span class="w"> </span>use
<span class="w"> </span>./bin/hlg_decode<span class="w"> </span>--help
Caution:
- Only sound files <span class="o">(</span>*.wav<span class="o">)</span> with single channel are supported.
- It assumes the model is conformer_ctc/transformer.py from icefall.
If you use a different model, you have to change the code
related to <span class="sb">`</span>model.forward<span class="sb">`</span> <span class="k">in</span> this file.
<span class="w"> </span>-<span class="w"> </span>Only<span class="w"> </span>sound<span class="w"> </span>files<span class="w"> </span><span class="o">(</span>*.wav<span class="o">)</span><span class="w"> </span>with<span class="w"> </span>single<span class="w"> </span>channel<span class="w"> </span>are<span class="w"> </span>supported.
<span class="w"> </span>-<span class="w"> </span>It<span class="w"> </span>assumes<span class="w"> </span>the<span class="w"> </span>model<span class="w"> </span>is<span class="w"> </span>conformer_ctc/transformer.py<span class="w"> </span>from<span class="w"> </span>icefall.
<span class="w"> </span>If<span class="w"> </span>you<span class="w"> </span>use<span class="w"> </span>a<span class="w"> </span>different<span class="w"> </span>model,<span class="w"> </span>you<span class="w"> </span>have<span class="w"> </span>to<span class="w"> </span>change<span class="w"> </span>the<span class="w"> </span>code
<span class="w"> </span>related<span class="w"> </span>to<span class="w"> </span><span class="sb">`</span>model.forward<span class="sb">`</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>this<span class="w"> </span>file.
</pre></div>
</div>
<section id="id2">
<h3>HLG decoding<a class="headerlink" href="#id2" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model icefall_asr_aishell_conformer_ctc/exp/cpu_jit.pt <span class="se">\</span>
--hlg icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt <span class="se">\</span>
--word_table icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt <span class="se">\</span>
icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn_model<span class="w"> </span>icefall_asr_aishell_conformer_ctc/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hlg<span class="w"> </span>icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--word_table<span class="w"> </span>icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav
</pre></div>
</div>
<p>The output is:</p>

View File

@ -211,9 +211,9 @@ alternatives.</p>
<section id="data-preparation">
<h2>Data Preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<p>To prepare the data for training, please use the following commands:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/aishell/ASR
./prepare.sh --stop-stage <span class="m">4</span>
./prepare.sh --stage <span class="m">6</span> --stop-stage <span class="m">6</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
./prepare.sh<span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">4</span>
./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">6</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">6</span>
</pre></div>
</div>
<div class="admonition note">
@ -231,8 +231,8 @@ are not used in transducer training.</p>
</section>
<section id="training">
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> egs/aishell/ASR
./transducer_stateless_modified/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
./transducer_stateless_modified/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -274,26 +274,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./transducer_stateless_modified/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./transducer_stateless_modified/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./transducer_stateless_modified/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -358,7 +358,7 @@ Each epoch actually processes <code class="docutils literal notranslate"><span c
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./transducer_stateless_modified/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -367,8 +367,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> transducer_stateless_modified/exp/tensorboard
$ tensorboard dev upload --logdir . --name <span class="s2">&quot;Aishell transducer training with icefall&quot;</span> --description <span class="s2">&quot;Training modified transducer, see https://github.com/k2-fsa/icefall/pull/219&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>transducer_stateless_modified/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--name<span class="w"> </span><span class="s2">&quot;Aishell transducer training with icefall&quot;</span><span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;Training modified transducer, see https://github.com/k2-fsa/icefall/pull/219&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -410,25 +410,25 @@ you saw printed to the console during training.</p>
<p>The following shows typical use cases:</p>
<section id="case-1">
<h4><strong>Case 1</strong><a class="headerlink" href="#case-1" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./transducer_stateless_modified/train.py --max-duration <span class="m">250</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">250</span>
</pre></div>
</div>
<p>It uses <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> of 250 to avoid OOM.</p>
</section>
<section id="case-2">
<h4><strong>Case 2</strong><a class="headerlink" href="#case-2" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,3&quot;</span>
$ ./transducer_stateless_modified/train.py --world-size <span class="m">2</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,3&quot;</span>
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
<p>It uses GPU 0 and GPU 3 for DDP training.</p>
</section>
<section id="case-3">
<h4><strong>Case 3</strong><a class="headerlink" href="#case-3" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./transducer_stateless_modified/train.py --num-epochs <span class="m">10</span> --start-epoch <span class="m">3</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./transducer_stateless_modified/train.py<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">3</span>
</pre></div>
</div>
<p>It loads checkpoint <code class="docutils literal notranslate"><span class="pre">./transducer_stateless_modified/exp/epoch-2.pt</span></code> and starts
@ -440,8 +440,8 @@ training from epoch 3. Also, it trains for 10 epochs.</p>
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./transducer_stateless_modified/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./transducer_stateless_modified/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.</p>
@ -539,34 +539,34 @@ $ git clone https://huggingface.co/csukuangfj/icefall-aishell-transducer-statele
<p>You have to use <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">lfs</span></code> to download the pre-trained model.</p>
</div>
<p>After downloading, you will have the following files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ tree tmp/icefall-aishell-transducer-stateless-modified-2022-03-01
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>tree<span class="w"> </span>tmp/icefall-aishell-transducer-stateless-modified-2022-03-01
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/
<span class="p">|</span>-- README.md
<span class="p">|</span>-- data
<span class="p">|</span> <span class="sb">`</span>-- lang_char
<span class="p">|</span> <span class="p">|</span>-- L.pt
<span class="p">|</span> <span class="p">|</span>-- lexicon.txt
<span class="p">|</span> <span class="p">|</span>-- tokens.txt
<span class="p">|</span> <span class="sb">`</span>-- words.txt
<span class="p">|</span>-- exp
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
<span class="p">|</span>-- log
<span class="p">|</span> <span class="p">|</span>-- errs-test-beam_4-epoch-64-avg-33-beam-4.txt
<span class="p">|</span> <span class="p">|</span>-- errs-test-greedy_search-epoch-64-avg-33-context-2-max-sym-per-frame-1.txt
<span class="p">|</span> <span class="p">|</span>-- log-decode-epoch-64-avg-33-beam-4-2022-03-02-12-05-03
<span class="p">|</span> <span class="p">|</span>-- log-decode-epoch-64-avg-33-context-2-max-sym-per-frame-1-2022-02-28-18-13-07
<span class="p">|</span> <span class="p">|</span>-- recogs-test-beam_4-epoch-64-avg-33-beam-4.txt
<span class="p">|</span> <span class="sb">`</span>-- recogs-test-greedy_search-epoch-64-avg-33-context-2-max-sym-per-frame-1.txt
<span class="sb">`</span>-- test_wavs
<span class="p">|</span>-- BAC009S0764W0121.wav
<span class="p">|</span>-- BAC009S0764W0122.wav
<span class="p">|</span>-- BAC009S0764W0123.wav
<span class="sb">`</span>-- transcript.txt
<span class="p">|</span>--<span class="w"> </span>README.md
<span class="p">|</span>--<span class="w"> </span>data
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lang_char
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>L.pt
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lexicon.txt
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
<span class="p">|</span>--<span class="w"> </span>exp
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
<span class="p">|</span>--<span class="w"> </span>log
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>errs-test-beam_4-epoch-64-avg-33-beam-4.txt
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>errs-test-greedy_search-epoch-64-avg-33-context-2-max-sym-per-frame-1.txt
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>log-decode-epoch-64-avg-33-beam-4-2022-03-02-12-05-03
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>log-decode-epoch-64-avg-33-context-2-max-sym-per-frame-1-2022-02-28-18-13-07
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>recogs-test-beam_4-epoch-64-avg-33-beam-4.txt
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>recogs-test-greedy_search-epoch-64-avg-33-context-2-max-sym-per-frame-1.txt
<span class="sb">`</span>--<span class="w"> </span>test_wavs
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0121.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0122.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0123.wav
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>transcript.txt
<span class="m">5</span> directories, <span class="m">16</span> files
<span class="m">5</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">16</span><span class="w"> </span>files
</pre></div>
</div>
<p><strong>File descriptions</strong>:</p>
@ -595,38 +595,38 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
</ul>
</div></blockquote>
<p>The information of the test sound files is listed below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/*.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/*.wav
Input File : <span class="s1">&#39;tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.20 <span class="o">=</span> <span class="m">67263</span> samples ~ <span class="m">315</span>.295 CDDA sectors
File Size : 135k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.20<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">67263</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">315</span>.295<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>135k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Input File : <span class="s1">&#39;tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.12 <span class="o">=</span> <span class="m">65840</span> samples ~ <span class="m">308</span>.625 CDDA sectors
File Size : 132k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.12<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">65840</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">308</span>.625<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>132k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Input File : <span class="s1">&#39;tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.00 <span class="o">=</span> <span class="m">64000</span> samples ~ <span class="m">300</span> CDDA sectors
File Size : 128k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.00<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">64000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">300</span><span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>128k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:12.32
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:12.32
</pre></div>
</div>
</section>
@ -655,14 +655,14 @@ it may give you poor results.</p>
<section id="greedy-search">
<h4>Greedy search<a class="headerlink" href="#greedy-search" title="Permalink to this heading"></a></h4>
<p>The command to run greedy search is given below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt <span class="se">\</span>
--lang-dir ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char <span class="se">\</span>
--method greedy_search <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./transducer_stateless_modified/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lang-dir<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
</pre></div>
</div>
<p>The output is as follows:</p>
@ -692,16 +692,16 @@ $ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
<section id="beam-search">
<h4>Beam search<a class="headerlink" href="#beam-search" title="Permalink to this heading"></a></h4>
<p>The command to run beam search is given below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt <span class="se">\</span>
--lang-dir ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char <span class="se">\</span>
--method beam_search <span class="se">\</span>
--beam-size <span class="m">4</span> <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
$<span class="w"> </span>./transducer_stateless_modified/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lang-dir<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>beam_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
</pre></div>
</div>
<p>The output is as follows:</p>
@ -731,16 +731,16 @@ $ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
<section id="modified-beam-search">
<h4>Modified Beam search<a class="headerlink" href="#modified-beam-search" title="Permalink to this heading"></a></h4>
<p>The command to run modified beam search is given below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$ ./transducer_stateless_modified/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt <span class="se">\</span>
--lang-dir ./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char <span class="se">\</span>
--method modified_beam_search <span class="se">\</span>
--beam-size <span class="m">4</span> <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav <span class="se">\</span>
./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
$<span class="w"> </span>./transducer_stateless_modified/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lang-dir<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/data/lang_char<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>modified_beam_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall-aishell-transducer-stateless-modified-2022-03-01/test_wavs/BAC009S0764W0123.wav
</pre></div>
</div>
<p>The output is as follows:</p>

View File

@ -130,8 +130,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
</div></blockquote>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -146,13 +146,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -167,8 +167,8 @@ the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></
<p class="admonition-title">Hint</p>
<p>A 3-gram language model will be downloaded from huggingface, we assume you have
intalled and initialized <code class="docutils literal notranslate"><span class="pre">git-lfs</span></code>. If not, you could install <code class="docutils literal notranslate"><span class="pre">git-lfs</span></code> by</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ sudo apt-get install git-lfs
$ git-lfs install
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>apt-get<span class="w"> </span>install<span class="w"> </span>git-lfs
$<span class="w"> </span>git-lfs<span class="w"> </span>install
</pre></div>
</div>
<p>If you dont have the <code class="docutils literal notranslate"><span class="pre">sudo</span></code> permission, you could download the
@ -184,8 +184,8 @@ are saved in <code class="docutils literal notranslate"><span class="pre">./data
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./tdnn_lstm_ctc/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -223,26 +223,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -295,7 +295,7 @@ You will find the following files in that directory:</p>
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -304,8 +304,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn_lstm_ctc/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;TDNN-LSTM CTC training for Aishell with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn_lstm_ctc/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;TDNN-LSTM CTC training for Aishell with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -347,17 +347,17 @@ you saw printed to the console during training.</p>
<p>The following shows typical use cases:</p>
<section id="case-1">
<h4><strong>Case 1</strong><a class="headerlink" href="#case-1" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,3&quot;</span>
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">2</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,3&quot;</span>
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
<p>It uses GPU 0 and GPU 3 for DDP training.</p>
</section>
<section id="case-2">
<h4><strong>Case 2</strong><a class="headerlink" href="#case-2" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./tdnn_lstm_ctc/train.py --num-epochs <span class="m">10</span> --start-epoch <span class="m">3</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">3</span>
</pre></div>
</div>
<p>It loads checkpoint <code class="docutils literal notranslate"><span class="pre">./tdnn_lstm_ctc/exp/epoch-2.pt</span></code> and starts
@ -369,8 +369,8 @@ training from epoch 3. Also, it trains for 10 epochs.</p>
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./tdnn_lstm_ctc/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.</p>
@ -424,27 +424,27 @@ $ git clone https://huggingface.co/pkufool/icefall_asr_aishell_tdnn_lstm_ctc
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
</div>
<p>After downloading, you will have the following files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ tree tmp
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>tree<span class="w"> </span>tmp
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/
<span class="sb">`</span>-- icefall_asr_aishell_tdnn_lstm_ctc
<span class="p">|</span>-- README.md
<span class="p">|</span>-- data
<span class="p">|</span> <span class="sb">`</span>-- lang_phone
<span class="p">|</span> <span class="p">|</span>-- HLG.pt
<span class="p">|</span> <span class="p">|</span>-- tokens.txt
<span class="p">|</span> <span class="sb">`</span>-- words.txt
<span class="p">|</span>-- exp
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
<span class="sb">`</span>-- test_waves
<span class="p">|</span>-- BAC009S0764W0121.wav
<span class="p">|</span>-- BAC009S0764W0122.wav
<span class="p">|</span>-- BAC009S0764W0123.wav
<span class="sb">`</span>-- trans.txt
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_aishell_tdnn_lstm_ctc
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lang_phone
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_waves
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0121.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0122.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>BAC009S0764W0123.wav
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
<span class="m">5</span> directories, <span class="m">9</span> files
<span class="m">5</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">9</span><span class="w"> </span>files
</pre></div>
</div>
<p><strong>File descriptions</strong>:</p>
@ -486,38 +486,38 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
</ul>
</div></blockquote>
<p>The information of the test sound files is listed below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/*.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/*.wav
Input File : <span class="s1">&#39;tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0121.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.20 <span class="o">=</span> <span class="m">67263</span> samples ~ <span class="m">315</span>.295 CDDA sectors
File Size : 135k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0121.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.20<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">67263</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">315</span>.295<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>135k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Input File : <span class="s1">&#39;tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0122.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.12 <span class="o">=</span> <span class="m">65840</span> samples ~ <span class="m">308</span>.625 CDDA sectors
File Size : 132k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0122.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.12<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">65840</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">308</span>.625<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>132k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Input File : <span class="s1">&#39;tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0123.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.00 <span class="o">=</span> <span class="m">64000</span> samples ~ <span class="m">300</span> CDDA sectors
File Size : 128k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0123.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.00<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">64000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">300</span><span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>128k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:12.32
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:12.32
</pre></div>
</div>
</section>
@ -532,15 +532,15 @@ $ ./tdnn_lstm_ctc/pretrained.py --help
<h4>HLG decoding<a class="headerlink" href="#hlg-decoding" title="Permalink to this heading"></a></h4>
<p>HLG decoding uses the best path of the decoding lattice as the decoding result.</p>
<p>The command to run HLG decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/aishell/ASR
$ ./tdnn_lstm_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall_asr_aishell_tdnn_lstm_ctc/exp/pretrained.pt <span class="se">\</span>
--words-file ./tmp/icefall_asr_aishell_tdnn_lstm_ctc/data/lang_phone/words.txt <span class="se">\</span>
--HLG ./tmp/icefall_asr_aishell_tdnn_lstm_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
--method 1best <span class="se">\</span>
./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0121.wav <span class="se">\</span>
./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0122.wav <span class="se">\</span>
./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0123.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/aishell/ASR
$<span class="w"> </span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0121.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0122.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_aishell_tdnn_lstm_ctc/test_waves/BAC009S0764W0123.wav
</pre></div>
</div>
<p>The output is given below:</p>

View File

@ -136,8 +136,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
</div></blockquote>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -152,13 +152,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -190,8 +190,8 @@ the following YouTube channel by <a class="reference external" href="https://www
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./conformer_ctc/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -240,26 +240,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./conformer_ctc/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./conformer_ctc/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./conformer_ctc/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -307,7 +307,7 @@ You will find the following files in that directory:</p>
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./conformer_ctc/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -316,8 +316,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> conformer_ctc/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;Conformer CTC training for LibriSpeech with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>conformer_ctc/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;Conformer CTC training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -358,8 +358,8 @@ you saw printed to the console during training.</p>
<p>The following shows typical use cases:</p>
<section id="case-1">
<h4><strong>Case 1</strong><a class="headerlink" href="#case-1" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./conformer_ctc/train.py --max-duration <span class="m">200</span> --full-libri <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">200</span><span class="w"> </span>--full-libri<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>It uses <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> of 200 to avoid OOM. Also, it uses only
@ -367,17 +367,17 @@ a subset of the LibriSpeech data for training.</p>
</section>
<section id="case-2">
<h4><strong>Case 2</strong><a class="headerlink" href="#case-2" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,3&quot;</span>
$ ./conformer_ctc/train.py --world-size <span class="m">2</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,3&quot;</span>
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
<p>It uses GPU 0 and GPU 3 for DDP training.</p>
</section>
<section id="case-3">
<h4><strong>Case 3</strong><a class="headerlink" href="#case-3" title="Permalink to this heading"></a></h4>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./conformer_ctc/train.py --num-epochs <span class="m">10</span> --start-epoch <span class="m">3</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./conformer_ctc/train.py<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">3</span>
</pre></div>
</div>
<p>It loads checkpoint <code class="docutils literal notranslate"><span class="pre">./conformer_ctc/exp/epoch-2.pt</span></code> and starts
@ -389,8 +389,8 @@ training from epoch 3. Also, it trains for 10 epochs.</p>
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./conformer_ctc/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./conformer_ctc/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.</p>
@ -425,54 +425,54 @@ value may cause OOM.</p>
</div></blockquote>
<p>Here are some results for CTC decoding with a vocab size of 500:</p>
<p>Usage:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
<span class="c1"># NOTE: Tested with a model with vocab size 500.</span>
<span class="c1"># It won&#39;t work for a model with vocab size 5000.</span>
$ ./conformer_ctc/decode.py <span class="se">\</span>
--epoch <span class="m">25</span> <span class="se">\</span>
--avg <span class="m">1</span> <span class="se">\</span>
--max-duration <span class="m">300</span> <span class="se">\</span>
--exp-dir conformer_ctc/exp <span class="se">\</span>
--lang-dir data/lang_bpe_500 <span class="se">\</span>
--method ctc-decoding
$<span class="w"> </span>./conformer_ctc/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">300</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>conformer_ctc/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lang-dir<span class="w"> </span>data/lang_bpe_500<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding
</pre></div>
</div>
<p>The output is given below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="m">2021</span>-09-26 <span class="m">12</span>:44:31,033 INFO <span class="o">[</span>decode.py:537<span class="o">]</span> Decoding started
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:31,033 INFO <span class="o">[</span>decode.py:538<span class="o">]</span>
<span class="o">{</span><span class="s1">&#39;lm_dir&#39;</span>: PosixPath<span class="o">(</span><span class="s1">&#39;data/lm&#39;</span><span class="o">)</span>, <span class="s1">&#39;subsampling_factor&#39;</span>: <span class="m">4</span>, <span class="s1">&#39;vgg_frontend&#39;</span>: False, <span class="s1">&#39;use_feat_batchnorm&#39;</span>: True,
<span class="s1">&#39;feature_dim&#39;</span>: <span class="m">80</span>, <span class="s1">&#39;nhead&#39;</span>: <span class="m">8</span>, <span class="s1">&#39;attention_dim&#39;</span>: <span class="m">512</span>, <span class="s1">&#39;num_decoder_layers&#39;</span>: <span class="m">6</span>, <span class="s1">&#39;search_beam&#39;</span>: <span class="m">20</span>, <span class="s1">&#39;output_beam&#39;</span>: <span class="m">8</span>,
<span class="s1">&#39;min_active_states&#39;</span>: <span class="m">30</span>, <span class="s1">&#39;max_active_states&#39;</span>: <span class="m">10000</span>, <span class="s1">&#39;use_double_scores&#39;</span>: True,
<span class="s1">&#39;epoch&#39;</span>: <span class="m">25</span>, <span class="s1">&#39;avg&#39;</span>: <span class="m">1</span>, <span class="s1">&#39;method&#39;</span>: <span class="s1">&#39;ctc-decoding&#39;</span>, <span class="s1">&#39;num_paths&#39;</span>: <span class="m">100</span>, <span class="s1">&#39;nbest_scale&#39;</span>: <span class="m">0</span>.5,
<span class="s1">&#39;export&#39;</span>: False, <span class="s1">&#39;exp_dir&#39;</span>: PosixPath<span class="o">(</span><span class="s1">&#39;conformer_ctc/exp&#39;</span><span class="o">)</span>, <span class="s1">&#39;lang_dir&#39;</span>: PosixPath<span class="o">(</span><span class="s1">&#39;data/lang_bpe_500&#39;</span><span class="o">)</span>, <span class="s1">&#39;full_libri&#39;</span>: False,
<span class="s1">&#39;feature_dir&#39;</span>: PosixPath<span class="o">(</span><span class="s1">&#39;data/fbank&#39;</span><span class="o">)</span>, <span class="s1">&#39;max_duration&#39;</span>: <span class="m">100</span>, <span class="s1">&#39;bucketing_sampler&#39;</span>: False, <span class="s1">&#39;num_buckets&#39;</span>: <span class="m">30</span>,
<span class="s1">&#39;concatenate_cuts&#39;</span>: False, <span class="s1">&#39;duration_factor&#39;</span>: <span class="m">1</span>.0, <span class="s1">&#39;gap&#39;</span>: <span class="m">1</span>.0, <span class="s1">&#39;on_the_fly_feats&#39;</span>: False,
<span class="s1">&#39;shuffle&#39;</span>: True, <span class="s1">&#39;return_cuts&#39;</span>: True, <span class="s1">&#39;num_workers&#39;</span>: <span class="m">2</span><span class="o">}</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:31,406 INFO <span class="o">[</span>lexicon.py:113<span class="o">]</span> Loading pre-compiled data/lang_bpe_500/Linv.pt
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:31,464 INFO <span class="o">[</span>decode.py:548<span class="o">]</span> device: cuda:0
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:36,171 INFO <span class="o">[</span>checkpoint.py:92<span class="o">]</span> Loading checkpoint from conformer_ctc/exp/epoch-25.pt
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:36,776 INFO <span class="o">[</span>decode.py:652<span class="o">]</span> Number of model parameters: <span class="m">109226120</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:44:37,714 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">0</span>/206, cuts processed <span class="k">until</span> now is <span class="m">12</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:15,944 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">100</span>/206, cuts processed <span class="k">until</span> now is <span class="m">1328</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:54,443 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">200</span>/206, cuts processed <span class="k">until</span> now is <span class="m">2563</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:56,411 INFO <span class="o">[</span>decode.py:494<span class="o">]</span> The transcripts are stored <span class="k">in</span> conformer_ctc/exp/recogs-test-clean-ctc-decoding.txt
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:56,592 INFO <span class="o">[</span>utils.py:331<span class="o">]</span> <span class="o">[</span>test-clean-ctc-decoding<span class="o">]</span> %WER <span class="m">3</span>.26% <span class="o">[</span><span class="m">1715</span> / <span class="m">52576</span>, <span class="m">163</span> ins, <span class="m">128</span> del, <span class="m">1424</span> sub <span class="o">]</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:56,807 INFO <span class="o">[</span>decode.py:506<span class="o">]</span> Wrote detailed error stats to conformer_ctc/exp/errs-test-clean-ctc-decoding.txt
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:56,808 INFO <span class="o">[</span>decode.py:522<span class="o">]</span>
For test-clean, WER of different settings are:
ctc-decoding <span class="m">3</span>.26 best <span class="k">for</span> test-clean
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:31,033<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:537<span class="o">]</span><span class="w"> </span>Decoding<span class="w"> </span>started
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:31,033<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:538<span class="o">]</span>
<span class="o">{</span><span class="s1">&#39;lm_dir&#39;</span>:<span class="w"> </span>PosixPath<span class="o">(</span><span class="s1">&#39;data/lm&#39;</span><span class="o">)</span>,<span class="w"> </span><span class="s1">&#39;subsampling_factor&#39;</span>:<span class="w"> </span><span class="m">4</span>,<span class="w"> </span><span class="s1">&#39;vgg_frontend&#39;</span>:<span class="w"> </span>False,<span class="w"> </span><span class="s1">&#39;use_feat_batchnorm&#39;</span>:<span class="w"> </span>True,
<span class="s1">&#39;feature_dim&#39;</span>:<span class="w"> </span><span class="m">80</span>,<span class="w"> </span><span class="s1">&#39;nhead&#39;</span>:<span class="w"> </span><span class="m">8</span>,<span class="w"> </span><span class="s1">&#39;attention_dim&#39;</span>:<span class="w"> </span><span class="m">512</span>,<span class="w"> </span><span class="s1">&#39;num_decoder_layers&#39;</span>:<span class="w"> </span><span class="m">6</span>,<span class="w"> </span><span class="s1">&#39;search_beam&#39;</span>:<span class="w"> </span><span class="m">20</span>,<span class="w"> </span><span class="s1">&#39;output_beam&#39;</span>:<span class="w"> </span><span class="m">8</span>,
<span class="s1">&#39;min_active_states&#39;</span>:<span class="w"> </span><span class="m">30</span>,<span class="w"> </span><span class="s1">&#39;max_active_states&#39;</span>:<span class="w"> </span><span class="m">10000</span>,<span class="w"> </span><span class="s1">&#39;use_double_scores&#39;</span>:<span class="w"> </span>True,
<span class="s1">&#39;epoch&#39;</span>:<span class="w"> </span><span class="m">25</span>,<span class="w"> </span><span class="s1">&#39;avg&#39;</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">&#39;method&#39;</span>:<span class="w"> </span><span class="s1">&#39;ctc-decoding&#39;</span>,<span class="w"> </span><span class="s1">&#39;num_paths&#39;</span>:<span class="w"> </span><span class="m">100</span>,<span class="w"> </span><span class="s1">&#39;nbest_scale&#39;</span>:<span class="w"> </span><span class="m">0</span>.5,
<span class="s1">&#39;export&#39;</span>:<span class="w"> </span>False,<span class="w"> </span><span class="s1">&#39;exp_dir&#39;</span>:<span class="w"> </span>PosixPath<span class="o">(</span><span class="s1">&#39;conformer_ctc/exp&#39;</span><span class="o">)</span>,<span class="w"> </span><span class="s1">&#39;lang_dir&#39;</span>:<span class="w"> </span>PosixPath<span class="o">(</span><span class="s1">&#39;data/lang_bpe_500&#39;</span><span class="o">)</span>,<span class="w"> </span><span class="s1">&#39;full_libri&#39;</span>:<span class="w"> </span>False,
<span class="s1">&#39;feature_dir&#39;</span>:<span class="w"> </span>PosixPath<span class="o">(</span><span class="s1">&#39;data/fbank&#39;</span><span class="o">)</span>,<span class="w"> </span><span class="s1">&#39;max_duration&#39;</span>:<span class="w"> </span><span class="m">100</span>,<span class="w"> </span><span class="s1">&#39;bucketing_sampler&#39;</span>:<span class="w"> </span>False,<span class="w"> </span><span class="s1">&#39;num_buckets&#39;</span>:<span class="w"> </span><span class="m">30</span>,
<span class="s1">&#39;concatenate_cuts&#39;</span>:<span class="w"> </span>False,<span class="w"> </span><span class="s1">&#39;duration_factor&#39;</span>:<span class="w"> </span><span class="m">1</span>.0,<span class="w"> </span><span class="s1">&#39;gap&#39;</span>:<span class="w"> </span><span class="m">1</span>.0,<span class="w"> </span><span class="s1">&#39;on_the_fly_feats&#39;</span>:<span class="w"> </span>False,
<span class="s1">&#39;shuffle&#39;</span>:<span class="w"> </span>True,<span class="w"> </span><span class="s1">&#39;return_cuts&#39;</span>:<span class="w"> </span>True,<span class="w"> </span><span class="s1">&#39;num_workers&#39;</span>:<span class="w"> </span><span class="m">2</span><span class="o">}</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:31,406<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>lexicon.py:113<span class="o">]</span><span class="w"> </span>Loading<span class="w"> </span>pre-compiled<span class="w"> </span>data/lang_bpe_500/Linv.pt
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:31,464<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:548<span class="o">]</span><span class="w"> </span>device:<span class="w"> </span>cuda:0
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:36,171<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>checkpoint.py:92<span class="o">]</span><span class="w"> </span>Loading<span class="w"> </span>checkpoint<span class="w"> </span>from<span class="w"> </span>conformer_ctc/exp/epoch-25.pt
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:36,776<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:652<span class="o">]</span><span class="w"> </span>Number<span class="w"> </span>of<span class="w"> </span>model<span class="w"> </span>parameters:<span class="w"> </span><span class="m">109226120</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:44:37,714<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">0</span>/206,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">12</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:15,944<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">100</span>/206,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">1328</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:54,443<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">200</span>/206,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">2563</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:56,411<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:494<span class="o">]</span><span class="w"> </span>The<span class="w"> </span>transcripts<span class="w"> </span>are<span class="w"> </span>stored<span class="w"> </span><span class="k">in</span><span class="w"> </span>conformer_ctc/exp/recogs-test-clean-ctc-decoding.txt
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:56,592<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>utils.py:331<span class="o">]</span><span class="w"> </span><span class="o">[</span>test-clean-ctc-decoding<span class="o">]</span><span class="w"> </span>%WER<span class="w"> </span><span class="m">3</span>.26%<span class="w"> </span><span class="o">[</span><span class="m">1715</span><span class="w"> </span>/<span class="w"> </span><span class="m">52576</span>,<span class="w"> </span><span class="m">163</span><span class="w"> </span>ins,<span class="w"> </span><span class="m">128</span><span class="w"> </span>del,<span class="w"> </span><span class="m">1424</span><span class="w"> </span>sub<span class="w"> </span><span class="o">]</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:56,807<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:506<span class="o">]</span><span class="w"> </span>Wrote<span class="w"> </span>detailed<span class="w"> </span>error<span class="w"> </span>stats<span class="w"> </span>to<span class="w"> </span>conformer_ctc/exp/errs-test-clean-ctc-decoding.txt
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:56,808<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:522<span class="o">]</span>
For<span class="w"> </span>test-clean,<span class="w"> </span>WER<span class="w"> </span>of<span class="w"> </span>different<span class="w"> </span>settings<span class="w"> </span>are:
ctc-decoding<span class="w"> </span><span class="m">3</span>.26<span class="w"> </span>best<span class="w"> </span><span class="k">for</span><span class="w"> </span>test-clean
<span class="m">2021</span>-09-26 <span class="m">12</span>:45:57,362 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">0</span>/203, cuts processed <span class="k">until</span> now is <span class="m">15</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:46:35,565 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">100</span>/203, cuts processed <span class="k">until</span> now is <span class="m">1477</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:15,106 INFO <span class="o">[</span>decode.py:473<span class="o">]</span> batch <span class="m">200</span>/203, cuts processed <span class="k">until</span> now is <span class="m">2922</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,131 INFO <span class="o">[</span>decode.py:494<span class="o">]</span> The transcripts are stored <span class="k">in</span> conformer_ctc/exp/recogs-test-other-ctc-decoding.txt
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,208 INFO <span class="o">[</span>utils.py:331<span class="o">]</span> <span class="o">[</span>test-other-ctc-decoding<span class="o">]</span> %WER <span class="m">8</span>.21% <span class="o">[</span><span class="m">4295</span> / <span class="m">52343</span>, <span class="m">396</span> ins, <span class="m">315</span> del, <span class="m">3584</span> sub <span class="o">]</span>
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,432 INFO <span class="o">[</span>decode.py:506<span class="o">]</span> Wrote detailed error stats to conformer_ctc/exp/errs-test-other-ctc-decoding.txt
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,432 INFO <span class="o">[</span>decode.py:522<span class="o">]</span>
For test-other, WER of different settings are:
ctc-decoding <span class="m">8</span>.21 best <span class="k">for</span> test-other
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:45:57,362<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">0</span>/203,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">15</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:46:35,565<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">100</span>/203,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">1477</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:15,106<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:473<span class="o">]</span><span class="w"> </span>batch<span class="w"> </span><span class="m">200</span>/203,<span class="w"> </span>cuts<span class="w"> </span>processed<span class="w"> </span><span class="k">until</span><span class="w"> </span>now<span class="w"> </span>is<span class="w"> </span><span class="m">2922</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,131<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:494<span class="o">]</span><span class="w"> </span>The<span class="w"> </span>transcripts<span class="w"> </span>are<span class="w"> </span>stored<span class="w"> </span><span class="k">in</span><span class="w"> </span>conformer_ctc/exp/recogs-test-other-ctc-decoding.txt
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,208<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>utils.py:331<span class="o">]</span><span class="w"> </span><span class="o">[</span>test-other-ctc-decoding<span class="o">]</span><span class="w"> </span>%WER<span class="w"> </span><span class="m">8</span>.21%<span class="w"> </span><span class="o">[</span><span class="m">4295</span><span class="w"> </span>/<span class="w"> </span><span class="m">52343</span>,<span class="w"> </span><span class="m">396</span><span class="w"> </span>ins,<span class="w"> </span><span class="m">315</span><span class="w"> </span>del,<span class="w"> </span><span class="m">3584</span><span class="w"> </span>sub<span class="w"> </span><span class="o">]</span>
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,432<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:506<span class="o">]</span><span class="w"> </span>Wrote<span class="w"> </span>detailed<span class="w"> </span>error<span class="w"> </span>stats<span class="w"> </span>to<span class="w"> </span>conformer_ctc/exp/errs-test-other-ctc-decoding.txt
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,432<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:522<span class="o">]</span>
For<span class="w"> </span>test-other,<span class="w"> </span>WER<span class="w"> </span>of<span class="w"> </span>different<span class="w"> </span>settings<span class="w"> </span>are:
ctc-decoding<span class="w"> </span><span class="m">8</span>.21<span class="w"> </span>best<span class="w"> </span><span class="k">for</span><span class="w"> </span>test-other
<span class="m">2021</span>-09-26 <span class="m">12</span>:47:16,433 INFO <span class="o">[</span>decode.py:680<span class="o">]</span> Done!
<span class="m">2021</span>-09-26<span class="w"> </span><span class="m">12</span>:47:16,433<span class="w"> </span>INFO<span class="w"> </span><span class="o">[</span>decode.py:680<span class="o">]</span><span class="w"> </span>Done!
</pre></div>
</div>
</section>
@ -492,10 +492,10 @@ at the same time.</p>
<section id="download-the-pre-trained-model">
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
<p>The following commands describe how to download the pre-trained model:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
$ <span class="nb">cd</span> icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
$ git lfs pull
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull
</pre></div>
</div>
<div class="admonition caution">
@ -509,8 +509,8 @@ Otherwise, you will have the following issue when running <code class="docutils
</div></blockquote>
<p>To fix that issue, please use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
git lfs pull
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
git<span class="w"> </span>lfs<span class="w"> </span>pull
</pre></div>
</div>
</div></blockquote>
@ -520,31 +520,31 @@ git lfs pull
<p>In order to use this pre-trained model, your k2 version has to be v1.9 or later.</p>
</div>
<p>After downloading, you will have the following files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ tree icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>tree<span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
<span class="p">|</span>-- README.md
<span class="p">|</span>-- data
<span class="p">|</span> <span class="p">|</span>-- lang_bpe_500
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG.pt
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG_modified.pt
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- bpe.model
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- tokens.txt
<span class="p">|</span> <span class="p">|</span> <span class="sb">`</span>-- words.txt
<span class="p">|</span> <span class="sb">`</span>-- lm
<span class="p">|</span> <span class="sb">`</span>-- G_4_gram.pt
<span class="p">|</span>-- exp
<span class="p">|</span> <span class="p">|</span>-- cpu_jit.pt
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
<span class="p">|</span>-- log
<span class="p">|</span> <span class="sb">`</span>-- log-decode-2021-11-09-17-38-28
<span class="sb">`</span>-- test_wavs
<span class="p">|</span>-- <span class="m">1089</span>-134686-0001.wav
<span class="p">|</span>-- <span class="m">1221</span>-135766-0001.wav
<span class="p">|</span>-- <span class="m">1221</span>-135766-0002.wav
<span class="sb">`</span>-- trans.txt
<span class="p">|</span>--<span class="w"> </span>README.md
<span class="p">|</span>--<span class="w"> </span>data
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_bpe_500
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG_modified.pt
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>bpe.model
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
<span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lm
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G_4_gram.pt
<span class="p">|</span>--<span class="w"> </span>exp
<span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>cpu_jit.pt
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
<span class="p">|</span>--<span class="w"> </span>log
<span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>log-decode-2021-11-09-17-38-28
<span class="sb">`</span>--<span class="w"> </span>test_wavs
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1089</span>-134686-0001.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1221</span>-135766-0001.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1221</span>-135766-0002.wav
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
</pre></div>
</div>
<dl>
@ -606,38 +606,38 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
</dd>
</dl>
<p>The information of the test sound files is listed below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/*.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/*.wav
Input File : <span class="s1">&#39;icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:06.62 <span class="o">=</span> <span class="m">106000</span> samples ~ <span class="m">496</span>.875 CDDA sectors
File Size : 212k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:06.62<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">106000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">496</span>.875<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>212k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Input File : <span class="s1">&#39;icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:16.71 <span class="o">=</span> <span class="m">267440</span> samples ~ <span class="m">1253</span>.62 CDDA sectors
File Size : 535k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:16.71<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">267440</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">1253</span>.62<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>535k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Input File : <span class="s1">&#39;icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.83 <span class="o">=</span> <span class="m">77200</span> samples ~ <span class="m">361</span>.875 CDDA sectors
File Size : 154k
Bit Rate : 256k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.83<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">77200</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">361</span>.875<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>154k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>256k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:28.16
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:28.16
</pre></div>
</div>
</section>
@ -662,15 +662,15 @@ $ ./conformer_ctc/pretrained.py --help
<p>CTC decoding uses the best path of the decoding lattice as the decoding result
without any LM or lexicon.</p>
<p>The command to run CTC decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt <span class="se">\</span>
--bpe-model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model <span class="se">\</span>
--method ctc-decoding <span class="se">\</span>
--num-classes <span class="m">500</span> <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-classes<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>The output is given below:</p>
@ -700,16 +700,16 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
<h4>HLG decoding<a class="headerlink" href="#hlg-decoding" title="Permalink to this heading"></a></h4>
<p>HLG decoding uses the best path of the decoding lattice as the decoding result.</p>
<p>The command to run HLG decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt <span class="se">\</span>
--words-file ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
--method 1best <span class="se">\</span>
--num-classes <span class="m">500</span> <span class="se">\</span>
--HLG ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-classes<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>The output is given below:</p>
@ -740,18 +740,18 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
<p>It uses an n-gram LM to rescore the decoding lattice and the best
path of the rescored lattice is the decoding result.</p>
<p>The command to run HLG decoding + LM rescoring is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
./conformer_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt <span class="se">\</span>
--words-file ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
--method whole-lattice-rescoring <span class="se">\</span>
--num-classes <span class="m">500</span> <span class="se">\</span>
--HLG ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
--G ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt <span class="se">\</span>
--ngram-lm-scale <span class="m">1</span>.0 <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>whole-lattice-rescoring<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-classes<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--G<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">1</span>.0<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>Its output is:</p>
@ -784,23 +784,23 @@ path of the rescored lattice is the decoding result.</p>
n paths from the rescored lattice, recores the extracted paths with
an attention decoder. The path with the highest score is the decoding result.</p>
<p>The command to run HLG decoding + LM rescoring + attention decoder rescoring is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./conformer_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt <span class="se">\</span>
--words-file ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
--method attention-decoder <span class="se">\</span>
--num-classes <span class="m">500</span> <span class="se">\</span>
--HLG ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
--G ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt <span class="se">\</span>
--ngram-lm-scale <span class="m">2</span>.0 <span class="se">\</span>
--attention-decoder-scale <span class="m">2</span>.0 <span class="se">\</span>
--nbest-scale <span class="m">0</span>.5 <span class="se">\</span>
--num-paths <span class="m">100</span> <span class="se">\</span>
--sos-id <span class="m">1</span> <span class="se">\</span>
--eos-id <span class="m">1</span> <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./conformer_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>attention-decoder<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-classes<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--G<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">2</span>.0<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--attention-decoder-scale<span class="w"> </span><span class="m">2</span>.0<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nbest-scale<span class="w"> </span><span class="m">0</span>.5<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-paths<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--sos-id<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--eos-id<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>The output is below:</p>
@ -831,22 +831,22 @@ $ ./conformer_ctc/pretrained.py <span class="se">\</span>
<section id="compute-wer-with-the-pre-trained-model">
<h3>Compute WER with the pre-trained model<a class="headerlink" href="#compute-wer-with-the-pre-trained-model" title="Permalink to this heading"></a></h3>
<p>To check the WER of the pre-trained model on the test datasets, run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">cd</span> icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/
$ ln -s pretrained.pt epoch-999.pt
$ <span class="nb">cd</span> ../..
$ ./conformer_ctc/decode.py <span class="se">\</span>
--exp-dir ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp <span class="se">\</span>
--lang-dir ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500 <span class="se">\</span>
--lm-dir ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm <span class="se">\</span>
--epoch <span class="m">999</span> <span class="se">\</span>
--avg <span class="m">1</span> <span class="se">\</span>
--concatenate-cuts <span class="m">0</span> <span class="se">\</span>
--bucketing-sampler <span class="m">1</span> <span class="se">\</span>
--max-duration <span class="m">30</span> <span class="se">\</span>
--num-paths <span class="m">1000</span> <span class="se">\</span>
--method attention-decoder <span class="se">\</span>
--nbest-scale <span class="m">0</span>.5
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/
$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-999.pt
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>../..
$<span class="w"> </span>./conformer_ctc/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lang-dir<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lm-dir<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">999</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--concatenate-cuts<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bucketing-sampler<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-paths<span class="w"> </span><span class="m">1000</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>attention-decoder<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nbest-scale<span class="w"> </span><span class="m">0</span>.5
</pre></div>
</div>
</section>
@ -875,20 +875,20 @@ Python dependencies.</p>
<p>At present, it does NOT support streaming decoding.</p>
</div>
<p>First, let us compile k2 from source:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> <span class="nv">$HOME</span>
$ git clone https://github.com/k2-fsa/k2
$ <span class="nb">cd</span> k2
$ git checkout v2.0-pre
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span><span class="nv">$HOME</span>
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/k2-fsa/k2
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>k2
$<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>v2.0-pre
</pre></div>
</div>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>You have to switch to the branch <code class="docutils literal notranslate"><span class="pre">v2.0-pre</span></code>!</p>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ mkdir build-release
$ <span class="nb">cd</span> build-release
$ cmake -DCMAKE_BUILD_TYPE<span class="o">=</span>Release ..
$ make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>build-release
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build-release
$<span class="w"> </span>cmake<span class="w"> </span>-DCMAKE_BUILD_TYPE<span class="o">=</span>Release<span class="w"> </span>..
$<span class="w"> </span>make<span class="w"> </span>-j<span class="w"> </span>ctc_decode<span class="w"> </span>hlg_decode<span class="w"> </span>ngram_lm_rescore<span class="w"> </span>attention_rescore
<span class="c1"># You will find four binaries in `./bin`, i.e.,</span>
<span class="c1"># ./bin/ctc_decode, ./bin/hlg_decode,</span>
@ -898,8 +898,8 @@ $ make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore
<p>Now you are ready to go!</p>
<p>Assume you have run:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> k2/build-release
$ ln -s /path/to/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09 ./
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>k2/build-release
$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>/path/to/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09<span class="w"> </span>./
</pre></div>
</div>
</div></blockquote>
@ -908,39 +908,39 @@ $ ln -s /path/to/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09 ./
</pre></div>
</div>
<p>It will show you the following message:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Please provide --nn_model
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>Please<span class="w"> </span>provide<span class="w"> </span>--nn_model
This file implements decoding with a CTC topology, without any
kinds of LM or lexicons.
This<span class="w"> </span>file<span class="w"> </span>implements<span class="w"> </span>decoding<span class="w"> </span>with<span class="w"> </span>a<span class="w"> </span>CTC<span class="w"> </span>topology,<span class="w"> </span>without<span class="w"> </span>any
kinds<span class="w"> </span>of<span class="w"> </span>LM<span class="w"> </span>or<span class="w"> </span>lexicons.
Usage:
./bin/ctc_decode <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model &lt;path to torch scripted pt file&gt; <span class="se">\</span>
--bpe_model &lt;path to pre-trained BPE model&gt; <span class="se">\</span>
&lt;path to foo.wav&gt; <span class="se">\</span>
&lt;path to bar.wav&gt; <span class="se">\</span>
&lt;more waves <span class="k">if</span> any&gt;
<span class="w"> </span>./bin/ctc_decode<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn_model<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>torch<span class="w"> </span>scripted<span class="w"> </span>pt<span class="w"> </span>file&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe_model<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>pre-trained<span class="w"> </span>BPE<span class="w"> </span>model&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>foo.wav&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>&lt;path<span class="w"> </span>to<span class="w"> </span>bar.wav&gt;<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>&lt;more<span class="w"> </span>waves<span class="w"> </span><span class="k">if</span><span class="w"> </span>any&gt;
To see all possible options, use
./bin/ctc_decode --help
To<span class="w"> </span>see<span class="w"> </span>all<span class="w"> </span>possible<span class="w"> </span>options,<span class="w"> </span>use
<span class="w"> </span>./bin/ctc_decode<span class="w"> </span>--help
Caution:
- Only sound files <span class="o">(</span>*.wav<span class="o">)</span> with single channel are supported.
- It assumes the model is conformer_ctc/transformer.py from icefall.
If you use a different model, you have to change the code
related to <span class="sb">`</span>model.forward<span class="sb">`</span> <span class="k">in</span> this file.
<span class="w"> </span>-<span class="w"> </span>Only<span class="w"> </span>sound<span class="w"> </span>files<span class="w"> </span><span class="o">(</span>*.wav<span class="o">)</span><span class="w"> </span>with<span class="w"> </span>single<span class="w"> </span>channel<span class="w"> </span>are<span class="w"> </span>supported.
<span class="w"> </span>-<span class="w"> </span>It<span class="w"> </span>assumes<span class="w"> </span>the<span class="w"> </span>model<span class="w"> </span>is<span class="w"> </span>conformer_ctc/transformer.py<span class="w"> </span>from<span class="w"> </span>icefall.
<span class="w"> </span>If<span class="w"> </span>you<span class="w"> </span>use<span class="w"> </span>a<span class="w"> </span>different<span class="w"> </span>model,<span class="w"> </span>you<span class="w"> </span>have<span class="w"> </span>to<span class="w"> </span>change<span class="w"> </span>the<span class="w"> </span>code
<span class="w"> </span>related<span class="w"> </span>to<span class="w"> </span><span class="sb">`</span>model.forward<span class="sb">`</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>this<span class="w"> </span>file.
</pre></div>
</div>
<section id="id2">
<h3>CTC decoding<a class="headerlink" href="#id2" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ctc_decode <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
--bpe_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ctc_decode<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>Its output is:</p>
@ -969,14 +969,14 @@ Caution:
</section>
<section id="id3">
<h3>HLG decoding<a class="headerlink" href="#id3" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hlg<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--word_table<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>The output is:</p>
@ -1005,16 +1005,16 @@ Caution:
</section>
<section id="hlg-decoding-n-gram-lm-rescoring">
<h3>HLG decoding + n-gram LM rescoring<a class="headerlink" href="#hlg-decoding-n-gram-lm-rescoring" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ngram_lm_rescore <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
--g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt <span class="se">\</span>
--ngram_lm_scale <span class="m">1</span>.0 <span class="se">\</span>
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ngram_lm_rescore<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hlg<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--g<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--ngram_lm_scale<span class="w"> </span><span class="m">1</span>.0<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--word_table<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>The output is:</p>
@ -1047,21 +1047,21 @@ Caution:
</section>
<section id="hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring">
<h3>HLG decoding + n-gram LM rescoring + attention decoder rescoring<a class="headerlink" href="#hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/attention_rescore <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt <span class="se">\</span>
--g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt <span class="se">\</span>
--ngram_lm_scale <span class="m">2</span>.0 <span class="se">\</span>
--attention_scale <span class="m">2</span>.0 <span class="se">\</span>
--num_paths <span class="m">100</span> <span class="se">\</span>
--nbest_scale <span class="m">0</span>.5 <span class="se">\</span>
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt <span class="se">\</span>
--sos_id <span class="m">1</span> <span class="se">\</span>
--eos_id <span class="m">1</span> <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav <span class="se">\</span>
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/attention_rescore<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use_gpu<span class="w"> </span><span class="nb">true</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn_model<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hlg<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--g<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--ngram_lm_scale<span class="w"> </span><span class="m">2</span>.0<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--attention_scale<span class="w"> </span><span class="m">2</span>.0<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num_paths<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nbest_scale<span class="w"> </span><span class="m">0</span>.5<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--word_table<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--sos_id<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--eos_id<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
</pre></div>
</div>
<p>The output is:</p>

View File

@ -149,8 +149,8 @@ That is, it has no recurrent connections.</p>
<p>The data preparation is the same as other recipes on LibriSpeech dataset,
if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Training</span></code> directly.</p>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -165,13 +165,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -203,8 +203,8 @@ the following YouTube channel by <a class="reference external" href="https://www
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -256,26 +256,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -333,7 +333,7 @@ You will find the following files in that directory:</p>
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -343,7 +343,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-batch <span class="m">436000</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
@ -352,8 +352,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;pruned transducer training for LibriSpeech with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;pruned transducer training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -390,8 +390,8 @@ the following screenshot:</p>
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp/tensorboard
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
@ -416,14 +416,14 @@ you saw printed to the console during training.</p>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 6 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3,4,5&quot;</span>
./pruned_transducer_stateless4/train.py <span class="se">\</span>
--world-size <span class="m">6</span> <span class="se">\</span>
--num-epochs <span class="m">30</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--max-duration <span class="m">300</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3,4,5&quot;</span>
./pruned_transducer_stateless4/train.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">6</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">300</span>
</pre></div>
</div>
</section>
@ -448,37 +448,37 @@ every <code class="docutils literal notranslate"><span class="pre">--save-every-
that produces the lowest WERs.</p>
</div></blockquote>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.</p>
<p>The following shows two examples (for two types of checkpoints):</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="m">20</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
@ -547,11 +547,11 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<span class="nv">epoch</span><span class="o">=</span><span class="m">25</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">3</span>
./pruned_transducer_stateless4/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span>
./pruned_transducer_stateless4/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/exp/pretrained.pt</span></code>.</p>
@ -559,8 +559,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp
ln -s pretrained.pt epoch-999.pt
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp
ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
@ -568,23 +568,23 @@ ln -s pretrained.pt epoch-999.pt
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless4/exp/pretrained.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless4/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">25</span> <span class="se">\</span>
--avg <span class="m">3</span> <span class="se">\</span>
--jit <span class="m">1</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later

View File

@ -106,8 +106,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -122,13 +122,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<p>We provide the following YouTube video showing how to run <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
@ -149,9 +149,9 @@ the following YouTube channel by <a class="reference external" href="https://www
the <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc">tdnn_lstm_ctc</a>
folder.</p>
<p>The command to run the training part is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
$ ./tdnn_lstm_ctc/train.py --world-size <span class="m">4</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
<p>By default, it will run <code class="docutils literal notranslate"><span class="pre">20</span></code> epochs. Training logs and checkpoints are saved
@ -163,7 +163,7 @@ in <code class="docutils literal notranslate"><span class="pre">tdnn_lstm_ctc/ex
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -172,8 +172,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn_lstm_ctc/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;TDNN LSTM training for librispeech with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn_lstm_ctc/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;TDNN LSTM training for librispeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -185,7 +185,7 @@ you saw printed to the console during training.</p>
</ul>
</div></blockquote>
<p>To see available training options, you can use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>Other training options, e.g., learning rate, results dir, etc., are
@ -199,13 +199,13 @@ you want.</p>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<p>The command for decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$ ./tdnn_lstm_ctc/decode.py
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$<span class="w"> </span>./tdnn_lstm_ctc/decode.py
</pre></div>
</div>
<p>You will see the WER in the output log.</p>
<p>Decoded results are saved in <code class="docutils literal notranslate"><span class="pre">tdnn_lstm_ctc/exp</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the available decoding options.</p>
@ -222,7 +222,7 @@ For instance, <code class="docutils literal notranslate"><span class="pre">./tdn
to be averaged. The averaged model is used for decoding.
For example, the following command:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/decode.py --epoch <span class="m">10</span> --avg <span class="m">3</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--epoch<span class="w"> </span><span class="m">10</span><span class="w"> </span>--avg<span class="w"> </span><span class="m">3</span>
</pre></div>
</div>
</div></blockquote>
@ -251,11 +251,11 @@ at the same time.</p>
</section>
<section id="download-the-pre-trained-model">
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ mkdir tmp
$ <span class="nb">cd</span> tmp
$ git lfs install
$ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>mkdir<span class="w"> </span>tmp
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tmp
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
</pre></div>
</div>
<div class="admonition caution">
@ -267,29 +267,29 @@ $ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
</div>
<p>After downloading, you will have the following files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ tree tmp
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>tree<span class="w"> </span>tmp
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/
<span class="sb">`</span>-- icefall_asr_librispeech_tdnn-lstm_ctc
<span class="p">|</span>-- README.md
<span class="p">|</span>-- data
<span class="p">|</span> <span class="p">|</span>-- lang_phone
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG.pt
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- tokens.txt
<span class="p">|</span> <span class="p">|</span> <span class="sb">`</span>-- words.txt
<span class="p">|</span> <span class="sb">`</span>-- lm
<span class="p">|</span> <span class="sb">`</span>-- G_4_gram.pt
<span class="p">|</span>-- exp
<span class="p">|</span> <span class="sb">`</span>-- pretrained.pt
<span class="sb">`</span>-- test_wavs
<span class="p">|</span>-- <span class="m">1089</span>-134686-0001.flac
<span class="p">|</span>-- <span class="m">1221</span>-135766-0001.flac
<span class="p">|</span>-- <span class="m">1221</span>-135766-0002.flac
<span class="sb">`</span>-- trans.txt
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_librispeech_tdnn-lstm_ctc
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_phone
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lm
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G_4_gram.pt
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained.pt
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_wavs
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1089</span>-134686-0001.flac
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1221</span>-135766-0001.flac
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span><span class="m">1221</span>-135766-0002.flac
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
<span class="m">6</span> directories, <span class="m">10</span> files
<span class="m">6</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">10</span><span class="w"> </span>files
</pre></div>
</div>
<p><strong>File descriptions</strong>:</p>
@ -335,56 +335,56 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
</ul>
</div></blockquote>
<p>The information of the test sound files is listed below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/*.flac
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/*.flac
Input File : <span class="s1">&#39;tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:06.62 <span class="o">=</span> <span class="m">106000</span> samples ~ <span class="m">496</span>.875 CDDA sectors
File Size : 116k
Bit Rate : 140k
Sample Encoding: <span class="m">16</span>-bit FLAC
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:06.62<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">106000</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">496</span>.875<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>116k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>140k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>FLAC
Input File : <span class="s1">&#39;tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:16.71 <span class="o">=</span> <span class="m">267440</span> samples ~ <span class="m">1253</span>.62 CDDA sectors
File Size : 343k
Bit Rate : 164k
Sample Encoding: <span class="m">16</span>-bit FLAC
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:16.71<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">267440</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">1253</span>.62<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>343k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>164k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>FLAC
Input File : <span class="s1">&#39;tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">16000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:04.83 <span class="o">=</span> <span class="m">77200</span> samples ~ <span class="m">361</span>.875 CDDA sectors
File Size : 105k
Bit Rate : 174k
Sample Encoding: <span class="m">16</span>-bit FLAC
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">16000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:04.83<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">77200</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">361</span>.875<span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>105k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>174k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>FLAC
Total Duration of <span class="m">3</span> files: <span class="m">00</span>:00:28.16
Total<span class="w"> </span>Duration<span class="w"> </span>of<span class="w"> </span><span class="m">3</span><span class="w"> </span>files:<span class="w"> </span><span class="m">00</span>:00:28.16
</pre></div>
</div>
</section>
<section id="inference-with-a-pre-trained-model">
<h3>Inference with a pre-trained model<a class="headerlink" href="#inference-with-a-pre-trained-model" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./tdnn_lstm_ctc/pretrained.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the usage information of <code class="docutils literal notranslate"><span class="pre">./tdnn_lstm_ctc/pretrained.py</span></code>.</p>
<p>To decode with <code class="docutils literal notranslate"><span class="pre">1best</span></code> method, we can use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt <span class="se">\</span>
--words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt <span class="se">\</span>
--HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac <span class="se">\</span>
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac <span class="se">\</span>
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
</pre></div>
</div>
<p>The output is:</p>
@ -410,16 +410,16 @@ $ ./tdnn_lstm_ctc/pretrained.py --help
</pre></div>
</div>
<p>To decode with <code class="docutils literal notranslate"><span class="pre">whole-lattice-rescoring</span></code> methond, you can use</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt <span class="se">\</span>
--words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt <span class="se">\</span>
--HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
--method whole-lattice-rescoring <span class="se">\</span>
--G ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt <span class="se">\</span>
--ngram-lm-scale <span class="m">0</span>.8 <span class="se">\</span>
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac <span class="se">\</span>
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac <span class="se">\</span>
./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>whole-lattice-rescoring<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--G<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">0</span>.8<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
</pre></div>
</div>
<p>The decoding output is:</p>

View File

@ -116,8 +116,8 @@ similar to the one used in conformer (referred to as “LConv”) before the fra
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -136,13 +136,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -175,8 +175,8 @@ the following YouTube channel by <a class="reference external" href="https://www
<p>For stability, it doesn`t use blank skip method until model warm-up.</p>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -225,26 +225,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -292,7 +292,7 @@ You will find the following files in that directory:</p>
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_ctc_bs/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -302,7 +302,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_ctc_bs/train.py --start-batch <span class="m">436000</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
@ -311,8 +311,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;Zipformer-CTC co-training using blank skip for LibriSpeech with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;Zipformer-CTC co-training using blank skip for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -336,8 +336,8 @@ tensorboard.</p>
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp/tensorboard
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
@ -362,15 +362,15 @@ you saw printed to the console during training.</p>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 4 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./pruned_transducer_stateless7_ctc_bs/train.py <span class="se">\</span>
--world-size <span class="m">4</span> <span class="se">\</span>
--num-epochs <span class="m">30</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--use-fp16 <span class="m">1</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./pruned_transducer_stateless7_ctc_bs/train.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</section>
@ -395,30 +395,30 @@ every <code class="docutils literal notranslate"><span class="pre">--save-every-
that produces the lowest WERs.</p>
</div></blockquote>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.</p>
<p>The following shows the example using <code class="docutils literal notranslate"><span class="pre">epoch-*.pt</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">13</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="k">done</span>
</pre></div>
</div>
<p>To test CTC branch, you can use the following command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> ctc-decoding 1best<span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">13</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>ctc-decoding<span class="w"> </span>1best<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="k">done</span>
</pre></div>
</div>
@ -432,12 +432,12 @@ $ ./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py --help
<code class="docutils literal notranslate"><span class="pre">optimizer.state_dict()</span></code>. It is useful for resuming training. But after training,
we are interested only in <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>. You can use the following
command to extract <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">13</span> <span class="se">\</span>
--jit <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt</span></code>.</p>
@ -445,8 +445,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp
ln -s pretrained epoch-9999.pt
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_ctc_bs/exp
ln<span class="w"> </span>-s<span class="w"> </span>pretrained<span class="w"> </span>epoch-9999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
@ -454,33 +454,33 @@ ln -s pretrained epoch-9999.pt
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/pretrained.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
<p>To test CTC branch using the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/pretrained_ctc.py</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--method ctc-decoding <span class="se">\</span>
--sample-rate <span class="m">16000</span> <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--sample-rate<span class="w"> </span><span class="m">16000</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">13</span> <span class="se">\</span>
--jit <span class="m">1</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
@ -488,20 +488,20 @@ load it by <code class="docutils literal notranslate"><span class="pre">torch.ji
<p>Note <code class="docutils literal notranslate"><span class="pre">cpu</span></code> in the name <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> means the parameters when loaded into Python
are on CPU. You can use <code class="docutils literal notranslate"><span class="pre">to(&quot;cuda&quot;)</span></code> to move them to a CUDA device.</p>
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py <span class="se">\</span>
--nn-model-filename ./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn-model-filename<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
<p>To test CTC branch using the generated files with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py <span class="se">\</span>
--model-filename ./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--method ctc-decoding <span class="se">\</span>
--sample-rate <span class="m">16000</span> <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--model-filename<span class="w"> </span>./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>ctc-decoding<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--sample-rate<span class="w"> </span><span class="m">16000</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
</section>

View File

@ -113,8 +113,8 @@ with the <a class="reference external" href="https://www.openslr.org/12">LibriSp
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -133,13 +133,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -172,8 +172,8 @@ the following YouTube channel by <a class="reference external" href="https://www
<p>For stability, it uses CTC loss for model warm-up and then switches to MMI loss.</p>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./zipformer_mmi/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -222,26 +222,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./zipformer_mmi/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./zipformer_mmi/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./zipformer_mmi/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -289,7 +289,7 @@ You will find the following files in that directory:</p>
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./zipformer_mmi/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -299,7 +299,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./zipformer_mmi/train.py --start-batch <span class="m">436000</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./zipformer_mmi/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
@ -308,8 +308,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> zipformer_mmi/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;Zipformer MMI training for LibriSpeech with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>zipformer_mmi/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;Zipformer MMI training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -333,8 +333,8 @@ tensorboard.</p>
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> zipformer_mmi/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>zipformer_mmi/exp/tensorboard
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
@ -359,16 +359,16 @@ you saw printed to the console during training.</p>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 4 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./zipformer_mmi/train.py <span class="se">\</span>
--world-size <span class="m">4</span> <span class="se">\</span>
--num-epochs <span class="m">30</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--exp-dir zipformer_mmi/exp <span class="se">\</span>
--max-duration <span class="m">500</span> <span class="se">\</span>
--use-fp16 <span class="m">1</span> <span class="se">\</span>
--num-workers <span class="m">2</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./zipformer_mmi/train.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>zipformer_mmi/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-workers<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</section>
@ -393,22 +393,22 @@ every <code class="docutils literal notranslate"><span class="pre">--save-every-
that produces the lowest WERs.</p>
</div></blockquote>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./zipformer_mmi/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./zipformer_mmi/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.</p>
<p>The following shows the example using <code class="docutils literal notranslate"><span class="pre">epoch-*.pt</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> nbest nbest-rescoring-LG nbest-rescoring-3-gram nbest-rescoring-4-gram<span class="p">;</span> <span class="k">do</span>
./zipformer_mmi/decode.py <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">10</span> <span class="se">\</span>
--exp-dir ./zipformer_mmi/exp/ <span class="se">\</span>
--max-duration <span class="m">100</span> <span class="se">\</span>
--lang-dir data/lang_bpe_500 <span class="se">\</span>
--nbest-scale <span class="m">1</span>.2 <span class="se">\</span>
--hp-scale <span class="m">1</span>.0 <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>nbest<span class="w"> </span>nbest-rescoring-LG<span class="w"> </span>nbest-rescoring-3-gram<span class="w"> </span>nbest-rescoring-4-gram<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./zipformer_mmi/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./zipformer_mmi/exp/<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lang-dir<span class="w"> </span>data/lang_bpe_500<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nbest-scale<span class="w"> </span><span class="m">1</span>.2<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hp-scale<span class="w"> </span><span class="m">1</span>.0<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="k">done</span>
</pre></div>
</div>
@ -422,12 +422,12 @@ $ ./zipformer_mmi/decode.py --help
<code class="docutils literal notranslate"><span class="pre">optimizer.state_dict()</span></code>. It is useful for resuming training. But after training,
we are interested only in <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>. You can use the following
command to extract <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/export.py <span class="se">\</span>
--exp-dir ./zipformer_mmi/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">9</span> <span class="se">\</span>
--jit <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./zipformer_mmi/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./zipformer_mmi/exp/pretrained.pt</span></code>.</p>
@ -435,8 +435,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">zipformer_mmi/decode.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> zipformer_mmi/exp
ln -s pretrained epoch-9999.pt
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>zipformer_mmi/exp
ln<span class="w"> </span>-s<span class="w"> </span>pretrained<span class="w"> </span>epoch-9999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
@ -444,23 +444,23 @@ ln -s pretrained epoch-9999.pt
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./zipformer_mmi/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/pretrained.py <span class="se">\</span>
--checkpoint ./zipformer_mmi/exp/pretrained.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method 1best <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./zipformer_mmi/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/export.py <span class="se">\</span>
--exp-dir ./zipformer_mmi/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">9</span> <span class="se">\</span>
--jit <span class="m">1</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./zipformer_mmi/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
@ -468,12 +468,12 @@ load it by <code class="docutils literal notranslate"><span class="pre">torch.ji
<p>Note <code class="docutils literal notranslate"><span class="pre">cpu</span></code> in the name <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> means the parameters when loaded into Python
are on CPU. You can use <code class="docutils literal notranslate"><span class="pre">to(&quot;cuda&quot;)</span></code> to move them to a CUDA device.</p>
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./zipformer_mmi/jit_pretrained.py</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/jit_pretrained.py <span class="se">\</span>
--nn-model-filename ./zipformer_mmi/exp/cpu_jit.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method 1best <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./zipformer_mmi/jit_pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--nn-model-filename<span class="w"> </span>./zipformer_mmi/exp/cpu_jit.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>1best<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
</section>

View File

@ -103,8 +103,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -119,13 +119,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
</section>
@ -139,9 +139,9 @@ folder.</p>
<p>TIMIT is a very small dataset. So one GPU is enough.</p>
</div>
<p>The command to run the training part is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$ ./tdnn_ligru_ctc/train.py
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$<span class="w"> </span>./tdnn_ligru_ctc/train.py
</pre></div>
</div>
<p>By default, it will run <code class="docutils literal notranslate"><span class="pre">25</span></code> epochs. Training logs and checkpoints are saved
@ -153,7 +153,7 @@ in <code class="docutils literal notranslate"><span class="pre">tdnn_ligru_ctc/e
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_ligru_ctc/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_ligru_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -162,8 +162,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn_ligru_ctc/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;TDNN ligru training for timit with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn_ligru_ctc/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;TDNN ligru training for timit with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -175,7 +175,7 @@ you saw printed to the console during training.</p>
</ul>
</div></blockquote>
<p>To see available training options, you can use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_ligru_ctc/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_ligru_ctc/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>Other training options, e.g., learning rate, results dir, etc., are
@ -189,13 +189,13 @@ you want.</p>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<p>The command for decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$ ./tdnn_ligru_ctc/decode.py
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$<span class="w"> </span>./tdnn_ligru_ctc/decode.py
</pre></div>
</div>
<p>You will see the WER in the output log.</p>
<p>Decoded results are saved in <code class="docutils literal notranslate"><span class="pre">tdnn_ligru_ctc/exp</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_ligru_ctc/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_ligru_ctc/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the available decoding options.</p>
@ -212,7 +212,7 @@ For instance, <code class="docutils literal notranslate"><span class="pre">./tdn
to be averaged. The averaged model is used for decoding.
For example, the following command:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_ligru_ctc/decode.py --epoch <span class="m">25</span> --avg <span class="m">17</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_ligru_ctc/decode.py<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span>--avg<span class="w"> </span><span class="m">17</span>
</pre></div>
</div>
</div></blockquote>
@ -245,11 +245,11 @@ at the same time.</p>
</section>
<section id="download-the-pre-trained-model">
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ mkdir tmp-ligru
$ <span class="nb">cd</span> tmp-ligru
$ git lfs install
$ git clone https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_ligru_ctc
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>mkdir<span class="w"> </span>tmp-ligru
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tmp-ligru
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_ligru_ctc
</pre></div>
</div>
<div class="admonition caution">
@ -261,29 +261,29 @@ $ git clone https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_ligru_ct
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
</div>
<p>After downloading, you will have the following files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ tree tmp-ligru
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>tree<span class="w"> </span>tmp-ligru
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp-ligru/
<span class="sb">`</span>-- icefall_asr_timit_tdnn_ligru_ctc
<span class="p">|</span>-- README.md
<span class="p">|</span>-- data
<span class="p">|</span> <span class="p">|</span>-- lang_phone
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG.pt
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- tokens.txt
<span class="p">|</span> <span class="p">|</span> <span class="sb">`</span>-- words.txt
<span class="p">|</span> <span class="sb">`</span>-- lm
<span class="p">|</span> <span class="sb">`</span>-- G_4_gram.pt
<span class="p">|</span>-- exp
<span class="p">|</span> <span class="sb">`</span>-- pretrained_average_9_25.pt
<span class="sb">`</span>-- test_wavs
<span class="p">|</span>-- FDHC0_SI1559.WAV
<span class="p">|</span>-- FELC0_SI756.WAV
<span class="p">|</span>-- FMGD0_SI1564.WAV
<span class="sb">`</span>-- trans.txt
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_timit_tdnn_ligru_ctc
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_phone
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lm
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G_4_gram.pt
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained_average_9_25.pt
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_wavs
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FDHC0_SI1559.WAV
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FELC0_SI756.WAV
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FMGD0_SI1564.WAV
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
<span class="m">6</span> directories, <span class="m">10</span> files
<span class="m">6</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">10</span><span class="w"> </span>files
</pre></div>
</div>
<p><strong>File descriptions</strong>:</p>
@ -329,60 +329,60 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
</ul>
</div></blockquote>
<p>The information of the test sound files is listed below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ffprobe -show_format tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
Input <span class="c1">#0, nistsphere, from &#39;tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV&#39;:</span>
Input<span class="w"> </span><span class="c1">#0, nistsphere, from &#39;tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV&#39;:</span>
Metadata:
database_id : TIMIT
database_version: <span class="m">1</span>.0
utterance_id : dhc0_si1559
sample_min : -4176
sample_max : <span class="m">5984</span>
Duration: <span class="m">00</span>:00:03.40, bitrate: <span class="m">258</span> kb/s
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>dhc0_si1559
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-4176
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">5984</span>
Duration:<span class="w"> </span><span class="m">00</span>:00:03.40,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">258</span><span class="w"> </span>kb/s
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
$ ffprobe -show_format tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
Input <span class="c1">#0, nistsphere, from &#39;tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV&#39;:</span>
Input<span class="w"> </span><span class="c1">#0, nistsphere, from &#39;tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV&#39;:</span>
Metadata:
database_id : TIMIT
database_version: <span class="m">1</span>.0
utterance_id : elc0_si756
sample_min : -1546
sample_max : <span class="m">1989</span>
Duration: <span class="m">00</span>:00:04.19, bitrate: <span class="m">257</span> kb/s
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>elc0_si756
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-1546
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">1989</span>
Duration:<span class="w"> </span><span class="m">00</span>:00:04.19,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">257</span><span class="w"> </span>kb/s
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
$ ffprobe -show_format tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
Input <span class="c1">#0, nistsphere, from &#39;tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV&#39;:</span>
Input<span class="w"> </span><span class="c1">#0, nistsphere, from &#39;tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV&#39;:</span>
Metadata:
database_id : TIMIT
database_version: <span class="m">1</span>.0
utterance_id : mgd0_si1564
sample_min : -7626
sample_max : <span class="m">10573</span>
Duration: <span class="m">00</span>:00:04.44, bitrate: <span class="m">257</span> kb/s
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>mgd0_si1564
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-7626
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">10573</span>
Duration:<span class="w"> </span><span class="m">00</span>:00:04.44,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">257</span><span class="w"> </span>kb/s
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
</pre></div>
</div>
</section>
<section id="inference-with-a-pre-trained-model">
<h3>Inference with a pre-trained model<a class="headerlink" href="#inference-with-a-pre-trained-model" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ ./tdnn_ligru_ctc/pretrained.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>./tdnn_ligru_ctc/pretrained.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the usage information of <code class="docutils literal notranslate"><span class="pre">./tdnn_ligru_ctc/pretrained.py</span></code>.</p>
<p>To decode with <code class="docutils literal notranslate"><span class="pre">1best</span></code> method, we can use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_ligru_ctc/pretrained.py
--method 1best
--checkpoint ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/exp/pretrained_average_9_25.pt
--words-file ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/words.txt
--HLG ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/HLG.pt
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
<span class="w"> </span>--method<span class="w"> </span>1best
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/exp/pretrained_average_9_25.pt
<span class="w"> </span>--words-file<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/words.txt
<span class="w"> </span>--HLG<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/HLG.pt
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
</pre></div>
</div>
<p>The output is:</p>
@ -408,16 +408,16 @@ $ ./tdnn_ligru_ctc/pretrained.py --help
</pre></div>
</div>
<p>To decode with <code class="docutils literal notranslate"><span class="pre">whole-lattice-rescoring</span></code> methond, you can use</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_ligru_ctc/pretrained.py <span class="se">\</span>
--method whole-lattice-rescoring <span class="se">\</span>
--checkpoint ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/exp/pretrained_average_9_25.pt <span class="se">\</span>
--words-file ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/words.txt <span class="se">\</span>
--HLG ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
--G ./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lm/G_4_gram.pt <span class="se">\</span>
--ngram-lm-scale <span class="m">0</span>.1 <span class="se">\</span>
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_ligru_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>whole-lattice-rescoring<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/exp/pretrained_average_9_25.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--G<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">0</span>.1<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FDHC0_SI1559.WAV
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FELC0_SI756.WAV
<span class="w"> </span>./tmp-ligru/icefall_asr_timit_tdnn_ligru_ctc/test_waves/FMGD0_SI1564.WAV
</pre></div>
</div>
<p>The decoding output is:</p>

View File

@ -103,8 +103,8 @@ the environment for <code class="docutils literal notranslate"><span class="pre"
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -119,13 +119,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
</section>
@ -139,9 +139,9 @@ folder.</p>
<p>TIMIT is a very small dataset. So one GPU for training is enough.</p>
</div>
<p>The command to run the training part is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$ ./tdnn_lstm_ctc/train.py
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$<span class="w"> </span>./tdnn_lstm_ctc/train.py
</pre></div>
</div>
<p>By default, it will run <code class="docutils literal notranslate"><span class="pre">25</span></code> epochs. Training logs and checkpoints are saved
@ -153,7 +153,7 @@ in <code class="docutils literal notranslate"><span class="pre">tdnn_lstm_ctc/ex
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -162,8 +162,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn_lstm_ctc/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;TDNN LSTM training for timit with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn_lstm_ctc/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;TDNN LSTM training for timit with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -175,7 +175,7 @@ you saw printed to the console during training.</p>
</ul>
</div></blockquote>
<p>To see available training options, you can use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>Other training options, e.g., learning rate, results dir, etc., are
@ -189,13 +189,13 @@ you want.</p>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<p>The command for decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$ ./tdnn_lstm_ctc/decode.py
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0&quot;</span>
$<span class="w"> </span>./tdnn_lstm_ctc/decode.py
</pre></div>
</div>
<p>You will see the WER in the output log.</p>
<p>Decoded results are saved in <code class="docutils literal notranslate"><span class="pre">tdnn_lstm_ctc/exp</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the available decoding options.</p>
@ -212,7 +212,7 @@ For instance, <code class="docutils literal notranslate"><span class="pre">./tdn
to be averaged. The averaged model is used for decoding.
For example, the following command:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn_lstm_ctc/decode.py --epoch <span class="m">25</span> --avg <span class="m">10</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn_lstm_ctc/decode.py<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span>--avg<span class="w"> </span><span class="m">10</span>
</pre></div>
</div>
</div></blockquote>
@ -243,11 +243,11 @@ at the same time.</p>
</section>
<section id="download-the-pre-trained-model">
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ mkdir tmp-lstm
$ <span class="nb">cd</span> tmp-lstm
$ git lfs install
$ git clone https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_lstm_ctc
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>mkdir<span class="w"> </span>tmp-lstm
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tmp-lstm
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_lstm_ctc
</pre></div>
</div>
<div class="admonition caution">
@ -259,29 +259,29 @@ $ git clone https://huggingface.co/luomingshuang/icefall_asr_timit_tdnn_lstm_ctc
<p>In order to use this pre-trained model, your k2 version has to be v1.7 or later.</p>
</div>
<p>After downloading, you will have the following files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ tree tmp-lstm
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>tree<span class="w"> </span>tmp-lstm
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp-lstm/
<span class="sb">`</span>-- icefall_asr_timit_tdnn_lstm_ctc
<span class="p">|</span>-- README.md
<span class="p">|</span>-- data
<span class="p">|</span> <span class="p">|</span>-- lang_phone
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- HLG.pt
<span class="p">|</span> <span class="p">|</span> <span class="p">|</span>-- tokens.txt
<span class="p">|</span> <span class="p">|</span> <span class="sb">`</span>-- words.txt
<span class="p">|</span> <span class="sb">`</span>-- lm
<span class="p">|</span> <span class="sb">`</span>-- G_4_gram.pt
<span class="p">|</span>-- exp
<span class="p">|</span> <span class="sb">`</span>-- pretrained_average_16_25.pt
<span class="sb">`</span>-- test_wavs
<span class="p">|</span>-- FDHC0_SI1559.WAV
<span class="p">|</span>-- FELC0_SI756.WAV
<span class="p">|</span>-- FMGD0_SI1564.WAV
<span class="sb">`</span>-- trans.txt
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_timit_tdnn_lstm_ctc
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>data
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_phone
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>lm
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G_4_gram.pt
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>exp
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>pretrained_average_16_25.pt
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_wavs
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FDHC0_SI1559.WAV
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FELC0_SI756.WAV
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>FMGD0_SI1564.WAV
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>trans.txt
<span class="m">6</span> directories, <span class="m">10</span> files
<span class="m">6</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">10</span><span class="w"> </span>files
</pre></div>
</div>
<p><strong>File descriptions</strong>:</p>
@ -327,60 +327,60 @@ Note: We have removed optimizer <code class="docutils literal notranslate"><span
</ul>
</div></blockquote>
<p>The information of the test sound files is listed below:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ffprobe -show_format tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
Input <span class="c1">#0, nistsphere, from &#39;tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV&#39;:</span>
Input<span class="w"> </span><span class="c1">#0, nistsphere, from &#39;tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV&#39;:</span>
Metadata:
database_id : TIMIT
database_version: <span class="m">1</span>.0
utterance_id : dhc0_si1559
sample_min : -4176
sample_max : <span class="m">5984</span>
Duration: <span class="m">00</span>:00:03.40, bitrate: <span class="m">258</span> kb/s
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>dhc0_si1559
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-4176
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">5984</span>
Duration:<span class="w"> </span><span class="m">00</span>:00:03.40,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">258</span><span class="w"> </span>kb/s
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
$ ffprobe -show_format tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
Input <span class="c1">#0, nistsphere, from &#39;tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV&#39;:</span>
Input<span class="w"> </span><span class="c1">#0, nistsphere, from &#39;tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV&#39;:</span>
Metadata:
database_id : TIMIT
database_version: <span class="m">1</span>.0
utterance_id : elc0_si756
sample_min : -1546
sample_max : <span class="m">1989</span>
Duration: <span class="m">00</span>:00:04.19, bitrate: <span class="m">257</span> kb/s
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>elc0_si756
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-1546
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">1989</span>
Duration:<span class="w"> </span><span class="m">00</span>:00:04.19,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">257</span><span class="w"> </span>kb/s
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
$ ffprobe -show_format tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
$<span class="w"> </span>ffprobe<span class="w"> </span>-show_format<span class="w"> </span>tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
Input <span class="c1">#0, nistsphere, from &#39;tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV&#39;:</span>
Input<span class="w"> </span><span class="c1">#0, nistsphere, from &#39;tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV&#39;:</span>
Metadata:
database_id : TIMIT
database_version: <span class="m">1</span>.0
utterance_id : mgd0_si1564
sample_min : -7626
sample_max : <span class="m">10573</span>
Duration: <span class="m">00</span>:00:04.44, bitrate: <span class="m">257</span> kb/s
Stream <span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
<span class="w"> </span>database_id<span class="w"> </span>:<span class="w"> </span>TIMIT
<span class="w"> </span>database_version:<span class="w"> </span><span class="m">1</span>.0
<span class="w"> </span>utterance_id<span class="w"> </span>:<span class="w"> </span>mgd0_si1564
<span class="w"> </span>sample_min<span class="w"> </span>:<span class="w"> </span>-7626
<span class="w"> </span>sample_max<span class="w"> </span>:<span class="w"> </span><span class="m">10573</span>
Duration:<span class="w"> </span><span class="m">00</span>:00:04.44,<span class="w"> </span>bitrate:<span class="w"> </span><span class="m">257</span><span class="w"> </span>kb/s
<span class="w"> </span>Stream<span class="w"> </span><span class="c1">#0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s</span>
</pre></div>
</div>
</section>
<section id="inference-with-a-pre-trained-model">
<h3>Inference with a pre-trained model<a class="headerlink" href="#inference-with-a-pre-trained-model" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/timit/ASR
$ ./tdnn_lstm_ctc/pretrained.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/timit/ASR
$<span class="w"> </span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the usage information of <code class="docutils literal notranslate"><span class="pre">./tdnn_lstm_ctc/pretrained.py</span></code>.</p>
<p>To decode with <code class="docutils literal notranslate"><span class="pre">1best</span></code> method, we can use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py
--method 1best
--checkpoint ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/exp/pretrained_average_16_25.pt
--words-file ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/words.txt
--HLG ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/HLG.pt
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
<span class="w"> </span>--method<span class="w"> </span>1best
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/exp/pretrained_average_16_25.pt
<span class="w"> </span>--words-file<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/words.txt
<span class="w"> </span>--HLG<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/HLG.pt
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
</pre></div>
</div>
<p>The output is:</p>
@ -406,16 +406,16 @@ $ ./tdnn_lstm_ctc/pretrained.py --help
</pre></div>
</div>
<p>To decode with <code class="docutils literal notranslate"><span class="pre">whole-lattice-rescoring</span></code> methond, you can use</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py <span class="se">\</span>
--method whole-lattice-rescoring <span class="se">\</span>
--checkpoint ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/exp/pretrained_average_16_25.pt <span class="se">\</span>
--words-file ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/words.txt <span class="se">\</span>
--HLG ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/HLG.pt <span class="se">\</span>
--G ./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lm/G_4_gram.pt <span class="se">\</span>
--ngram-lm-scale <span class="m">0</span>.08 <span class="se">\</span>
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn_lstm_ctc/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>whole-lattice-rescoring<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/exp/pretrained_average_16_25.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--G<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/data/lm/G_4_gram.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--ngram-lm-scale<span class="w"> </span><span class="m">0</span>.08<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FDHC0_SI1559.WAV
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FELC0_SI756.WAV
<span class="w"> </span>./tmp-lstm/icefall_asr_timit_tdnn_lstm_ctc/test_waves/FMGD0_SI1564.WAV
</pre></div>
</div>
<p>The decoding output is:</p>

View File

@ -204,8 +204,8 @@ the following WER at the end:</p>
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -220,13 +220,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
</section>
@ -236,9 +236,9 @@ $ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</
the <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/yesno/ASR/tdnn">tdnn</a>
folder, for <code class="docutils literal notranslate"><span class="pre">yesno</span></code>.</p>
<p>The command to run the training part is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
$ ./tdnn/train.py
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
$<span class="w"> </span>./tdnn/train.py
</pre></div>
</div>
<p>By default, it will run <code class="docutils literal notranslate"><span class="pre">15</span></code> epochs. Training logs and checkpoints are saved
@ -250,7 +250,7 @@ in <code class="docutils literal notranslate"><span class="pre">tdnn/exp</span><
<p>These are checkpoint files, containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -259,8 +259,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> tdnn/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;TDNN training for yesno with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tdnn/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;TDNN training for yesno with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -302,15 +302,15 @@ you saw printed to the console during training.</p>
If you have two GPUs, say, GPU 0 and GPU 1, and you want to use GPU 1 for
training, you can run:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;1&quot;</span>
$ ./tdnn/train.py
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;1&quot;</span>
$<span class="w"> </span>./tdnn/train.py
</pre></div>
</div>
</div></blockquote>
<p>Since the <code class="docutils literal notranslate"><span class="pre">yesno</span></code> dataset is very small, containing only 30 sound files
for training, and the model in use is also very small, we use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -319,7 +319,7 @@ for training, and the model in use is also very small, we use:</p>
run <code class="docutils literal notranslate"><span class="pre">export</span> <span class="pre">CUDA_VISIBLE_DEVICES=&quot;&quot;</span></code>.</p>
</div>
<p>To see available training options, you can use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>Other training options, e.g., learning rate, results dir, etc., are
@ -333,13 +333,13 @@ you want.</p>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<p>The command for decoding is:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
$ ./tdnn/decode.py
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;&quot;</span>
$<span class="w"> </span>./tdnn/decode.py
</pre></div>
</div>
<p>You will see the WER in the output log.</p>
<p>Decoded results are saved in <code class="docutils literal notranslate"><span class="pre">tdnn/exp</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the available decoding options.</p>
@ -356,7 +356,7 @@ For instance, <code class="docutils literal notranslate"><span class="pre">./tdn
to be averaged. The averaged model is used for decoding.
For example, the following command:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./tdnn/decode.py --epoch <span class="m">10</span> --avg <span class="m">3</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./tdnn/decode.py<span class="w"> </span>--epoch<span class="w"> </span><span class="m">10</span><span class="w"> </span>--avg<span class="w"> </span><span class="m">3</span>
</pre></div>
</div>
</div></blockquote>
@ -378,11 +378,11 @@ See <a class="reference internal" href="#yesno-use-a-pre-trained-model"><span cl
<p>The following shows you how to use the pre-trained model.</p>
<section id="download-the-pre-trained-model">
<h3>Download the pre-trained model<a class="headerlink" href="#download-the-pre-trained-model" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
$ mkdir tmp
$ <span class="nb">cd</span> tmp
$ git lfs install
$ git clone https://huggingface.co/csukuangfj/icefall_asr_yesno_tdnn
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
$<span class="w"> </span>mkdir<span class="w"> </span>tmp
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>tmp
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/csukuangfj/icefall_asr_yesno_tdnn
</pre></div>
</div>
<div class="admonition caution">
@ -390,71 +390,71 @@ $ git clone https://huggingface.co/csukuangfj/icefall_asr_yesno_tdnn
<p>You have to use <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">lfs</span></code> to download the pre-trained model.</p>
</div>
<p>After downloading, you will have the following files:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
$ tree tmp
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
$<span class="w"> </span>tree<span class="w"> </span>tmp
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tmp/
<span class="sb">`</span>-- icefall_asr_yesno_tdnn
<span class="p">|</span>-- README.md
<span class="p">|</span>-- lang_phone
<span class="p">|</span> <span class="p">|</span>-- HLG.pt
<span class="p">|</span> <span class="p">|</span>-- L.pt
<span class="p">|</span> <span class="p">|</span>-- L_disambig.pt
<span class="p">|</span> <span class="p">|</span>-- Linv.pt
<span class="p">|</span> <span class="p">|</span>-- lexicon.txt
<span class="p">|</span> <span class="p">|</span>-- lexicon_disambig.txt
<span class="p">|</span> <span class="p">|</span>-- tokens.txt
<span class="p">|</span> <span class="sb">`</span>-- words.txt
<span class="p">|</span>-- lm
<span class="p">|</span> <span class="p">|</span>-- G.arpa
<span class="p">|</span> <span class="sb">`</span>-- G.fst.txt
<span class="p">|</span>-- pretrained.pt
<span class="sb">`</span>-- test_waves
<span class="p">|</span>-- 0_0_0_1_0_0_0_1.wav
<span class="p">|</span>-- 0_0_1_0_0_0_1_0.wav
<span class="p">|</span>-- 0_0_1_0_0_1_1_1.wav
<span class="p">|</span>-- 0_0_1_0_1_0_0_1.wav
<span class="p">|</span>-- 0_0_1_1_0_0_0_1.wav
<span class="p">|</span>-- 0_0_1_1_0_1_1_0.wav
<span class="p">|</span>-- 0_0_1_1_1_0_0_0.wav
<span class="p">|</span>-- 0_0_1_1_1_1_0_0.wav
<span class="p">|</span>-- 0_1_0_0_0_1_0_0.wav
<span class="p">|</span>-- 0_1_0_0_1_0_1_0.wav
<span class="p">|</span>-- 0_1_0_1_0_0_0_0.wav
<span class="p">|</span>-- 0_1_0_1_1_1_0_0.wav
<span class="p">|</span>-- 0_1_1_0_0_1_1_1.wav
<span class="p">|</span>-- 0_1_1_1_0_0_1_0.wav
<span class="p">|</span>-- 0_1_1_1_1_0_1_0.wav
<span class="p">|</span>-- 1_0_0_0_0_0_0_0.wav
<span class="p">|</span>-- 1_0_0_0_0_0_1_1.wav
<span class="p">|</span>-- 1_0_0_1_0_1_1_1.wav
<span class="p">|</span>-- 1_0_1_1_0_1_1_1.wav
<span class="p">|</span>-- 1_0_1_1_1_1_0_1.wav
<span class="p">|</span>-- 1_1_0_0_0_1_1_1.wav
<span class="p">|</span>-- 1_1_0_0_1_0_1_1.wav
<span class="p">|</span>-- 1_1_0_1_0_1_0_0.wav
<span class="p">|</span>-- 1_1_0_1_1_0_0_1.wav
<span class="p">|</span>-- 1_1_0_1_1_1_1_0.wav
<span class="p">|</span>-- 1_1_1_0_0_1_0_1.wav
<span class="p">|</span>-- 1_1_1_0_1_0_1_0.wav
<span class="p">|</span>-- 1_1_1_1_0_0_1_0.wav
<span class="p">|</span>-- 1_1_1_1_1_0_0_0.wav
<span class="sb">`</span>-- 1_1_1_1_1_1_1_1.wav
<span class="sb">`</span>--<span class="w"> </span>icefall_asr_yesno_tdnn
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>README.md
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lang_phone
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>HLG.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>L.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>L_disambig.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>Linv.pt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lexicon.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lexicon_disambig.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>tokens.txt
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>words.txt
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>lm
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="p">|</span>--<span class="w"> </span>G.arpa
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>G.fst.txt
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>pretrained.pt
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>test_waves
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_0_1_0_0_0_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_0_0_0_1_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_0_0_1_1_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_0_1_0_0_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_1_0_0_0_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_1_0_1_1_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_1_1_0_0_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_0_1_1_1_1_0_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_0_0_0_1_0_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_0_0_1_0_1_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_0_1_0_0_0_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_0_1_1_1_0_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_1_0_0_1_1_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_1_1_0_0_1_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>0_1_1_1_1_0_1_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_0_0_0_0_0_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_0_0_0_0_1_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_0_1_0_1_1_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_1_1_0_1_1_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_0_1_1_1_1_0_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_0_0_1_1_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_0_1_0_1_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_1_0_1_0_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_1_1_0_0_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_0_1_1_1_1_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_1_0_0_1_0_1.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_1_0_1_0_1_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_1_1_0_0_1_0.wav
<span class="w"> </span><span class="p">|</span>--<span class="w"> </span>1_1_1_1_1_0_0_0.wav
<span class="w"> </span><span class="sb">`</span>--<span class="w"> </span>1_1_1_1_1_1_1_1.wav
<span class="m">4</span> directories, <span class="m">42</span> files
<span class="m">4</span><span class="w"> </span>directories,<span class="w"> </span><span class="m">42</span><span class="w"> </span>files
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ soxi tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>soxi<span class="w"> </span>tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav
Input File : <span class="s1">&#39;tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav&#39;</span>
Channels : <span class="m">1</span>
Sample Rate : <span class="m">8000</span>
Precision : <span class="m">16</span>-bit
Duration : <span class="m">00</span>:00:06.76 <span class="o">=</span> <span class="m">54080</span> samples ~ <span class="m">507</span> CDDA sectors
File Size : 108k
Bit Rate : 128k
Sample Encoding: <span class="m">16</span>-bit Signed Integer PCM
Input<span class="w"> </span>File<span class="w"> </span>:<span class="w"> </span><span class="s1">&#39;tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav&#39;</span>
Channels<span class="w"> </span>:<span class="w"> </span><span class="m">1</span>
Sample<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span><span class="m">8000</span>
Precision<span class="w"> </span>:<span class="w"> </span><span class="m">16</span>-bit
Duration<span class="w"> </span>:<span class="w"> </span><span class="m">00</span>:00:06.76<span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">54080</span><span class="w"> </span>samples<span class="w"> </span>~<span class="w"> </span><span class="m">507</span><span class="w"> </span>CDDA<span class="w"> </span>sectors
File<span class="w"> </span>Size<span class="w"> </span>:<span class="w"> </span>108k
Bit<span class="w"> </span>Rate<span class="w"> </span>:<span class="w"> </span>128k
Sample<span class="w"> </span>Encoding:<span class="w"> </span><span class="m">16</span>-bit<span class="w"> </span>Signed<span class="w"> </span>Integer<span class="w"> </span>PCM
</pre></div>
</div>
<ul>
@ -475,17 +475,17 @@ features from a single or multiple sound files. Please refer to
</section>
<section id="inference-with-a-pre-trained-model">
<h3>Inference with a pre-trained model<a class="headerlink" href="#inference-with-a-pre-trained-model" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/yesno/ASR
$ ./tdnn/pretrained.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/yesno/ASR
$<span class="w"> </span>./tdnn/pretrained.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the usage information of <code class="docutils literal notranslate"><span class="pre">./tdnn/pretrained.py</span></code>.</p>
<p>To decode a single file, we can use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall_asr_yesno_tdnn/pretrained.pt <span class="se">\</span>
--words-file ./tmp/icefall_asr_yesno_tdnn/lang_phone/words.txt <span class="se">\</span>
--HLG ./tmp/icefall_asr_yesno_tdnn/lang_phone/HLG.pt <span class="se">\</span>
./tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav
</pre></div>
</div>
<p>The output is:</p>
@ -507,12 +507,12 @@ $ ./tdnn/pretrained.py --help
<p>You can see that for the sound file <code class="docutils literal notranslate"><span class="pre">0_0_1_0_1_0_0_1.wav</span></code>, the decoding result is
<code class="docutils literal notranslate"><span class="pre">NO</span> <span class="pre">NO</span> <span class="pre">YES</span> <span class="pre">NO</span> <span class="pre">YES</span> <span class="pre">NO</span> <span class="pre">NO</span> <span class="pre">YES</span></code>.</p>
<p>To decode <strong>multiple</strong> files at the same time, you can use</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn/pretrained.py <span class="se">\</span>
--checkpoint ./tmp/icefall_asr_yesno_tdnn/pretrained.pt <span class="se">\</span>
--words-file ./tmp/icefall_asr_yesno_tdnn/lang_phone/words.txt <span class="se">\</span>
--HLG ./tmp/icefall_asr_yesno_tdnn/lang_phone/HLG.pt <span class="se">\</span>
./tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav <span class="se">\</span>
./tmp/icefall_asr_yesno_tdnn/test_waves/1_0_1_1_0_1_1_1.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./tdnn/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--words-file<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/lang_phone/words.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--HLG<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/lang_phone/HLG.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/test_waves/0_0_1_0_1_0_0_1.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>./tmp/icefall_asr_yesno_tdnn/test_waves/1_0_1_1_0_1_1_1.wav
</pre></div>
</div>
<p>The decoding output is:</p>

View File

@ -151,11 +151,11 @@ to run <code class="docutils literal notranslate"><span class="pre">(2)</span></
</section>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh
<span class="c1"># If you use (1), you can **skip** the following command</span>
$ ./prepare_giga_speech.sh
$<span class="w"> </span>./prepare_giga_speech.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -174,13 +174,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -212,8 +212,8 @@ the following YouTube channel by <a class="reference external" href="https://www
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./lstm_transducer_stateless2/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -262,26 +262,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -333,7 +333,7 @@ You will find the following files in that directory:</p>
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./lstm_transducer_stateless2/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -343,7 +343,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./lstm_transducer_stateless2/train.py --start-batch <span class="m">436000</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./lstm_transducer_stateless2/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
@ -352,8 +352,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> lstm_transducer_stateless2/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;LSTM transducer training for LibriSpeech with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>lstm_transducer_stateless2/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;LSTM transducer training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -390,8 +390,8 @@ the following screenshot:</p>
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> lstm_transducer_stateless2/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>lstm_transducer_stateless2/exp/tensorboard
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
@ -416,18 +416,18 @@ you saw printed to the console during training.</p>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 8 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3,4,5,6,7&quot;</span>
./lstm_transducer_stateless2/train.py <span class="se">\</span>
--world-size <span class="m">8</span> <span class="se">\</span>
--num-epochs <span class="m">35</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
--max-duration <span class="m">500</span> <span class="se">\</span>
--use-fp16 <span class="m">0</span> <span class="se">\</span>
--lr-epochs <span class="m">10</span> <span class="se">\</span>
--num-workers <span class="m">2</span> <span class="se">\</span>
--giga-prob <span class="m">0</span>.9
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3,4,5,6,7&quot;</span>
./lstm_transducer_stateless2/train.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">35</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lr-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-workers<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--giga-prob<span class="w"> </span><span class="m">0</span>.9
</pre></div>
</div>
</section>
@ -452,51 +452,51 @@ every <code class="docutils literal notranslate"><span class="pre">--save-every-
that produces the lowest WERs.</p>
</div></blockquote>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./lstm_transducer_stateless2/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./lstm_transducer_stateless2/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.</p>
<p>The following shows two examples:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">17</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">1</span> <span class="m">2</span><span class="p">;</span> <span class="k">do</span>
./lstm_transducer_stateless2/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--num-encoder-layers <span class="m">12</span> <span class="se">\</span>
--rnn-hidden-size <span class="m">1024</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span> <span class="se">\</span>
--use-averaged-model True <span class="se">\</span>
--beam <span class="m">4</span> <span class="se">\</span>
--max-contexts <span class="m">4</span> <span class="se">\</span>
--max-states <span class="m">8</span> <span class="se">\</span>
--beam-size <span class="m">4</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">17</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">2</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./lstm_transducer_stateless2/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-encoder-layers<span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--rnn-hidden-size<span class="w"> </span><span class="m">1024</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-averaged-model<span class="w"> </span>True<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--beam<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-contexts<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-states<span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--beam-size<span class="w"> </span><span class="m">4</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./lstm_transducer_stateless2/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--num-encoder-layers <span class="m">12</span> <span class="se">\</span>
--rnn-hidden-size <span class="m">1024</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span> <span class="se">\</span>
--use-averaged-model True <span class="se">\</span>
--beam <span class="m">4</span> <span class="se">\</span>
--max-contexts <span class="m">4</span> <span class="se">\</span>
--max-states <span class="m">8</span> <span class="se">\</span>
--beam-size <span class="m">4</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./lstm_transducer_stateless2/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-encoder-layers<span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--rnn-hidden-size<span class="w"> </span><span class="m">1024</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-averaged-model<span class="w"> </span>True<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--beam<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-contexts<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-states<span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--beam-size<span class="w"> </span><span class="m">4</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
@ -516,11 +516,11 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
./lstm_transducer_stateless2/export.py <span class="se">\</span>
--exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span>
./lstm_transducer_stateless2/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/pretrained.pt</span></code>.</p>
@ -528,8 +528,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/decode.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> lstm_transducer_stateless2/exp
ln -s pretrained epoch-9999.pt
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>lstm_transducer_stateless2/exp
ln<span class="w"> </span>-s<span class="w"> </span>pretrained<span class="w"> </span>epoch-9999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
@ -537,12 +537,12 @@ ln -s pretrained epoch-9999.pt
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/pretrained.py <span class="se">\</span>
--checkpoint ./lstm_transducer_stateless2/exp/pretrained.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./lstm_transducer_stateless2/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
</section>
@ -551,12 +551,12 @@ can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
./lstm_transducer_stateless2/export.py <span class="se">\</span>
--exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--jit-trace <span class="m">1</span>
./lstm_transducer_stateless2/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit-trace<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<p>It will generate 3 files:</p>
@ -568,13 +568,13 @@ can run:</p>
</ul>
</div></blockquote>
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/jit_pretrained</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/jit_pretrained.py <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt <span class="se">\</span>
--decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt <span class="se">\</span>
--joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/jit_pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--encoder-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoder-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-model-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
<div class="admonition hint">
@ -589,19 +589,19 @@ for how to use the exported models in <code class="docutils literal notranslate"
<a class="reference external" href="https://github.com/tencent/ncnn">ncnn</a> using
<a class="reference external" href="https://github.com/Tencent/ncnn/tree/master/tools/pnnx">pnnx</a>.</p>
<p>First, let us install a modified version of <code class="docutils literal notranslate"><span class="pre">ncnn</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git clone https://github.com/csukuangfj/ncnn
<span class="nb">cd</span> ncnn
git submodule update --recursive --init
python3 setup.py bdist_wheel
ls -lh dist/
pip install ./dist/*.whl
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/csukuangfj/ncnn
<span class="nb">cd</span><span class="w"> </span>ncnn
git<span class="w"> </span>submodule<span class="w"> </span>update<span class="w"> </span>--recursive<span class="w"> </span>--init
python3<span class="w"> </span>setup.py<span class="w"> </span>bdist_wheel
ls<span class="w"> </span>-lh<span class="w"> </span>dist/
pip<span class="w"> </span>install<span class="w"> </span>./dist/*.whl
<span class="c1"># now build pnnx</span>
<span class="nb">cd</span> tools/pnnx
mkdir build
<span class="nb">cd</span> build
make -j4
<span class="nb">export</span> <span class="nv">PATH</span><span class="o">=</span><span class="nv">$PWD</span>/src:<span class="nv">$PATH</span>
<span class="nb">cd</span><span class="w"> </span>tools/pnnx
mkdir<span class="w"> </span>build
<span class="nb">cd</span><span class="w"> </span>build
make<span class="w"> </span>-j4
<span class="nb">export</span><span class="w"> </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$PWD</span>/src:<span class="nv">$PATH</span>
./src/pnnx
</pre></div>
@ -616,12 +616,12 @@ for <code class="docutils literal notranslate"><span class="pre">pnnx</span></co
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
./lstm_transducer_stateless2/export.py <span class="se">\</span>
--exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--pnnx <span class="m">1</span>
./lstm_transducer_stateless2/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./lstm_transducer_stateless2/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--pnnx<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<p>It will generate 3 files:</p>
@ -650,26 +650,26 @@ for <code class="docutils literal notranslate"><span class="pre">pnnx</span></co
</ul>
</div></blockquote>
<p>To use the above generated files, run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/ncnn-decode.py <span class="se">\</span>
--bpe-model-filename ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param <span class="se">\</span>
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
/path/to/foo.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/ncnn-decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model-filename<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--encoder-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--encoder-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoder-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoder-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/streaming-ncnn-decode.py <span class="se">\</span>
--bpe-model-filename ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param <span class="se">\</span>
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
/path/to/foo.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/streaming-ncnn-decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model-filename<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--encoder-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--encoder-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoder-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoder-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-param-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-bin-filename<span class="w"> </span>./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav
</pre></div>
</div>
<p>To use the above generated files in C++, please see

View File

@ -145,8 +145,8 @@ That is, it has no recurrent connections.</p>
<p>The data preparation is the same as other recipes on LibriSpeech dataset,
if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Training</span></code> directly.</p>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -161,13 +161,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -206,8 +206,8 @@ You can see the configurable options below for their meanings or read <a class="
</div>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -259,26 +259,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -353,7 +353,7 @@ You will find the following files in that directory:</p>
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -363,7 +363,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-batch <span class="m">436000</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless4/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
@ -372,8 +372,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;pruned transducer training for LibriSpeech with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;pruned transducer training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -410,8 +410,8 @@ the following screenshot:</p>
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp/tensorboard
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
@ -436,16 +436,16 @@ you saw printed to the console during training.</p>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 4 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./pruned_transducer_stateless4/train.py <span class="se">\</span>
--world-size <span class="m">4</span> <span class="se">\</span>
--dynamic-chunk-training <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--num-epochs <span class="m">30</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--max-duration <span class="m">300</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./pruned_transducer_stateless4/train.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--dynamic-chunk-training<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">300</span>
</pre></div>
</div>
<div class="admonition note">
@ -489,8 +489,8 @@ produce almost the same results given the same <code class="docutils literal not
</div>
<section id="simulate-streaming-decoding">
<h3>Simulate streaming decoding<a class="headerlink" href="#simulate-streaming-decoding" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.
@ -525,47 +525,47 @@ the attention mask.</p>
</div></blockquote>
</div></blockquote>
<p>The following shows two examples (for the two types of checkpoints):</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
--left-context <span class="m">64</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="m">20</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--simulate-streaming<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-size<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--left-context<span class="w"> </span><span class="m">64</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
--left-context <span class="m">64</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--simulate-streaming<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-size<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--left-context<span class="w"> </span><span class="m">64</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
</section>
<section id="real-streaming-decoding">
<h3>Real streaming decoding<a class="headerlink" href="#real-streaming-decoding" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/streaming_decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless4/streaming_decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.
@ -599,37 +599,37 @@ the performance for all the models, the reasons might be the training and decodi
can try decoding with <code class="docutils literal notranslate"><span class="pre">--right-context</span></code> to see if it helps. The default value is 0.</p>
</div>
<p>The following shows two examples (for the two types of checkpoints):</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
--left-context <span class="m">64</span> <span class="se">\</span>
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="m">20</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-size<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--left-context<span class="w"> </span><span class="m">64</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-decode-streams<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
--left-context <span class="m">64</span> <span class="se">\</span>
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless4/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-size<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--left-context<span class="w"> </span><span class="m">64</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-decode-streams<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
@ -704,13 +704,13 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<span class="nv">epoch</span><span class="o">=</span><span class="m">25</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">3</span>
./pruned_transducer_stateless4/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
--streaming-model <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span>
./pruned_transducer_stateless4/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--streaming-model<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span>
</pre></div>
</div>
<div class="admonition caution">
@ -723,8 +723,8 @@ a streaming mdoel.</p>
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp
ln -s pretrained.pt epoch-999.pt
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless4/exp
ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
@ -732,27 +732,27 @@ ln -s pretrained.pt epoch-999.pt
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless4/exp/pretrained.pt <span class="se">\</span>
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless4/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--simulate-streaming<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
--streaming-model <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">25</span> <span class="se">\</span>
--avg <span class="m">3</span> <span class="se">\</span>
--jit <span class="m">1</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless4/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--streaming-model<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--causal-convolution<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">25</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<div class="admonition caution">

View File

@ -141,8 +141,8 @@ That is, it has no recurrent connections.</p>
<p>The data preparation is the same as other recipes on LibriSpeech dataset,
if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Training</span></code> directly.</p>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
@ -157,13 +157,13 @@ options:</p>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
@ -195,8 +195,8 @@ the following YouTube channel by <a class="reference external" href="https://www
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_streaming/train.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
@ -248,26 +248,26 @@ training from epoch 10, based on the state from epoch 9.</p>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./pruned_transducer_stateless7_streaming/train.py --world-size <span class="m">2</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_streaming/train.py --world-size <span class="m">4</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./pruned_transducer_stateless7_streaming/train.py --world-size <span class="m">1</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--world-size<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
@ -334,7 +334,7 @@ You will find the following files in that directory:</p>
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_streaming/train.py --start-epoch <span class="m">11</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
@ -344,7 +344,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_streaming/train.py --start-batch <span class="m">436000</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span>--start-batch<span class="w"> </span><span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
@ -353,8 +353,8 @@ To resume training from some checkpoint, say <code class="docutils literal notra
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless7_streaming/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;pruned transducer training for LibriSpeech with icefall&quot;</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_streaming/exp/tensorboard
$<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w"> </span>upload<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--description<span class="w"> </span><span class="s2">&quot;pruned transducer training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
@ -365,8 +365,8 @@ $ tensorboard dev upload --logdir . --description <span class="s2">&quot;pruned
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_streaming/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_streaming/exp/tensorboard
tensorboard<span class="w"> </span>--logdir<span class="w"> </span>.<span class="w"> </span>--port<span class="w"> </span><span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
@ -391,15 +391,15 @@ you saw printed to the console during training.</p>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 4 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./pruned_transducer_stateless7_streaming/train.py <span class="se">\</span>
--world-size <span class="m">4</span> <span class="se">\</span>
--num-epochs <span class="m">30</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--use-fp16 <span class="m">1</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--max-duration <span class="m">550</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./pruned_transducer_stateless7_streaming/train.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--full-libri<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">550</span>
</pre></div>
</div>
</section>
@ -438,8 +438,8 @@ produce almost the same results given the same <code class="docutils literal not
</div>
<section id="simulate-streaming-decoding">
<h3>Simulate streaming decoding<a class="headerlink" href="#simulate-streaming-decoding" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_streaming/decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.
@ -452,41 +452,41 @@ The default value is 32 (i.e., 320ms).</p>
</div></blockquote>
</div></blockquote>
<p>The following shows two examples (for the two types of checkpoints):</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">30</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">12</span> <span class="m">11</span> <span class="m">10</span> <span class="m">9</span> <span class="m">8</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless7_streaming/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">30</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="m">8</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless7_streaming/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
</section>
<section id="real-streaming-decoding">
<h3>Real streaming decoding<a class="headerlink" href="#real-streaming-decoding" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_streaming/streaming_decode.py --help
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
$<span class="w"> </span>./pruned_transducer_stateless7_streaming/streaming_decode.py<span class="w"> </span>--help
</pre></div>
</div>
<p>shows the options for decoding.
@ -507,33 +507,33 @@ suppose sequence 1 and 2 are done, so, sequence 3 to 12 will be processed parall
</div></blockquote>
</div></blockquote>
<p>The following shows two examples (for the two types of checkpoints):</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">30</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">12</span> <span class="m">11</span> <span class="m">10</span> <span class="m">9</span> <span class="m">8</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless7_streaming/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>epoch<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">30</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="m">8</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-decode-streams<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless7_streaming/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--decode-chunk-len <span class="m">16</span> <span class="se">\</span>
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span>m<span class="w"> </span><span class="k">in</span><span class="w"> </span>greedy_search<span class="w"> </span>fast_beam_search<span class="w"> </span>modified_beam_search<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>iter<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">474000</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span>avg<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">12</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="m">18</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--iter<span class="w"> </span><span class="nv">$iter</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">16</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-decode-streams<span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoding-method<span class="w"> </span><span class="nv">$m</span>
<span class="w"> </span><span class="k">done</span>
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
@ -608,13 +608,13 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<span class="nv">epoch</span><span class="o">=</span><span class="m">30</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">9</span>
./pruned_transducer_stateless7_streaming/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--use-averaged-model<span class="o">=</span>True <span class="se">\</span>
--decode-chunk-len <span class="m">32</span>
./pruned_transducer_stateless7_streaming/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-averaged-model<span class="o">=</span>True<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_streaming/exp/pretrained.pt</span></code>.</p>
@ -622,8 +622,8 @@ command to extract <code class="docutils literal notranslate"><span class="pre">
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_streaming/decode.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_streaming/exp
ln -s pretrained.pt epoch-999.pt
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>pruned_transducer_stateless7_streaming/exp
ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
@ -631,25 +631,25 @@ ln -s pretrained.pt epoch-999.pt
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_streaming/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/pretrained.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless7_streaming/exp/pretrained.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--checkpoint<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp/pretrained.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--method<span class="w"> </span>greedy_search<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/bar.wav
</pre></div>
</div>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">9</span> <span class="se">\</span>
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
--jit <span class="m">1</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="m">9</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--jit<span class="w"> </span><span class="m">1</span>
</pre></div>
</div>
<div class="admonition caution">
@ -666,13 +666,13 @@ are on CPU. You can use <code class="docutils literal notranslate"><span class="
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">epoch</span><span class="o">=</span><span class="m">30</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">9</span>
./pruned_transducer_stateless7_streaming/jit_trace_export.py <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--use-averaged-model<span class="o">=</span>True <span class="se">\</span>
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless7_streaming/exp <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span>
./pruned_transducer_stateless7_streaming/jit_trace_export.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-averaged-model<span class="o">=</span>True<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--epoch<span class="w"> </span><span class="nv">$epoch</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--avg<span class="w"> </span><span class="nv">$avg</span>
</pre></div>
</div>
<div class="admonition caution">
@ -688,13 +688,13 @@ are on CPU. You can use <code class="docutils literal notranslate"><span class="
</ul>
</div></blockquote>
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_streaming/jit_trace_pretrained.py</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/jit_trace_pretrained.py <span class="se">\</span>
--encoder-model-filename ./pruned_transducer_stateless7_streaming/exp/encoder_jit_trace.pt <span class="se">\</span>
--decoder-model-filename ./pruned_transducer_stateless7_streaming/exp/decoder_jit_trace.pt <span class="se">\</span>
--joiner-model-filename ./pruned_transducer_stateless7_streaming/exp/joiner_jit_trace.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--decode-chunk-len <span class="m">32</span> <span class="se">\</span>
/path/to/foo.wav
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_streaming/jit_trace_pretrained.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--encoder-model-filename<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp/encoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decoder-model-filename<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp/decoder_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--joiner-model-filename<span class="w"> </span>./pruned_transducer_stateless7_streaming/exp/joiner_jit_trace.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>./data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>/path/to/foo.wav
</pre></div>
</div>
</section>