deploy: 97f9b9c33b9e3d4a7152c45f28dec397202aabb6

This commit is contained in:
marcoyang1998 2023-09-25 03:55:03 +00:00
parent b846e1a5c6
commit e77ea61c46
38 changed files with 509 additions and 15 deletions

View File

@ -2,12 +2,13 @@ Decoding with language models
=============================
This section describes how to use external langugage models
during decoding to improve the WER of transducer models.
during decoding to improve the WER of transducer models. To train an external language model,
please refer to this tutorial: :ref:`train_nnlm`.
The following decoding methods with external langugage models are available:
.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)
.. list-table::
:widths: 25 50
:header-rows: 1

View File

@ -0,0 +1,7 @@
RNN-LM
======
.. toctree::
:maxdepth: 2
librispeech/lm-training

View File

@ -0,0 +1,104 @@
.. _train_nnlm:
Train an RNN langugage model
======================================
If you have enough text data, you can train a neural network language model (NNLM) to improve
the WER of your E2E ASR system. This tutorial shows you how to train an RNNLM from
scratch.
.. HINT::
For how to use an NNLM during decoding, please refer to the following tutorials:
:ref:`shallow_fusion`, :ref:`LODR`, :ref:`rescoring`
.. note::
This tutorial is based on the LibriSpeech recipe. Please check it out for the necessary
python scripts for this tutorial. We use the LibriSpeech LM-corpus as the LM training set
for illustration purpose. You can also collect your own data. The data format is quite simple:
each line should contain a complete sentence, and words should be separated by space.
First, let's download the training data for the RNNLM. This can be done via the
following command:
.. code-block:: bash
$ wget https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz
$ gzip -d librispeech-lm-norm.txt.gz
As we are training a BPE-level RNNLM, we need to tokenize the training text, which requires a
BPE tokenizer. This can be achieved by executing the following command:
.. code-block:: bash
$ # if you don't have the BPE
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15
$ cd icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500
$ git lfs pull --include bpe.model
$ cd ../../..
$ ./local/prepare_lm_training_data.py \
--bpe-model icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500/bpe.model \
--lm-data librispeech-lm-norm.txt \
--lm-archive data/lang_bpe_500/lm_data.pt
Now, you should have a file name ``lm_data.pt`` file store under the directory ``data/lang_bpe_500``.
This is the packed training data for the RNNLM. We then sort the training data according to its
sentence length.
.. code-block:: bash
$ # This could take a while (~ 20 minutes), feel free to grab a cup of coffee :)
$ ./local/sort_lm_training_data.py \
--in-lm-data data/lang_bpe_500/lm_data.pt \
--out-lm-data data/lang_bpe_500/sorted_lm_data.pt \
--out-statistics data/lang_bpe_500/lm_data_stats.txt
The aforementioned steps can be repeated to create a a validation set for you RNNLM. Let's say
you have a validation set in ``valid.txt``, you can just set ``--lm-data valid.txt``
and ``--lm-archive data/lang_bpe_500/lm-data-valid.pt`` when calling ``./local/prepare_lm_training_data.py``.
After completing the previous steps, the training and testing sets for training RNNLM are ready.
The next step is to train the RNNLM model. The training command is as follows:
.. code-block:: bash
$ # assume you are in the icefall root directory
$ cd rnn_lm
$ ln -s ../../egs/librispeech/ASR/data .
$ cd ..
$ ./rnn_lm/train.py \
--world-size 4 \
--exp-dir ./rnn_lm/exp \
--start-epoch 0 \
--num-epochs 10 \
--use-fp16 0 \
--tie-weights 1 \
--embedding-dim 2048 \
--hidden_dim 2048 \
--num-layers 3 \
--batch-size 300 \
--lm-data rnn_lm/data/lang_bpe_500/sorted_lm_data.pt \
--lm-data-valid rnn_lm/data/lang_bpe_500/sorted_lm_data.pt
.. note::
You can adjust the RNNLM hyper parameters to control the size of the RNNLM,
such as embedding dimension and hidden state dimension. For more details, please
run ``./rnn_lm/train.py --help``.
.. note::
The training of RNNLM can take a long time (usually a couple of days).

View File

@ -15,3 +15,4 @@ We may add recipes for other tasks as well in the future.
Non-streaming-ASR/index
Streaming-ASR/index
RNN-LM/index

View File

@ -20,7 +20,7 @@
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Contributing to Documentation" href="doc.html" />
<link rel="prev" title="Zipformer Transducer" href="../recipes/Streaming-ASR/librispeech/zipformer_transducer.html" />
<link rel="prev" title="Train an RNN langugage model" href="../recipes/RNN-LM/librispeech/lm-training.html" />
</head>
<body class="wy-body-for-nav">
@ -133,7 +133,7 @@ and code to <code class="docutils literal notranslate"><span class="pre">icefall
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../recipes/Streaming-ASR/librispeech/zipformer_transducer.html" class="btn btn-neutral float-left" title="Zipformer Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../recipes/RNN-LM/librispeech/lm-training.html" class="btn btn-neutral float-left" title="Train an RNN langugage model" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="doc.html" class="btn btn-neutral float-right" title="Contributing to Documentation" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>

View File

@ -234,7 +234,7 @@ $ beam_size_4 6.74 best for test-other
<p>Recall that the lowest WER we obtained in <a class="reference internal" href="shallow-fusion.html#shallow-fusion"><span class="std std-ref">Shallow fusion for Transducer</span></a> with beam size of 4 is <code class="docutils literal notranslate"><span class="pre">2.77/7.08</span></code>, LODR
indeed <strong>further improves</strong> the WER. We can do even better if we increase <code class="docutils literal notranslate"><span class="pre">--beam-size</span></code>:</p>
<table class="docutils align-default" id="id1">
<caption><span class="caption-number">Table 3 </span><span class="caption-text">WER of LODR with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 2 </span><span class="caption-text">WER of LODR with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 25%" />
<col style="width: 25%" />

View File

@ -93,10 +93,10 @@
<section id="decoding-with-language-models">
<h1>Decoding with language models<a class="headerlink" href="#decoding-with-language-models" title="Permalink to this heading"></a></h1>
<p>This section describes how to use external langugage models
during decoding to improve the WER of transducer models.</p>
during decoding to improve the WER of transducer models. To train an external language model,
please refer to this tutorial: <a class="reference internal" href="../recipes/RNN-LM/librispeech/lm-training.html#train-nnlm"><span class="std std-ref">Train an RNN langugage model</span></a>.</p>
<p>The following decoding methods with external langugage models are available:</p>
<table class="docutils align-default" id="id1">
<caption><span class="caption-number">Table 1 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<table class="docutils align-default">
<colgroup>
<col style="width: 33%" />
<col style="width: 67%" />

View File

@ -201,7 +201,7 @@ $ beam_size_4 7.6 best for test-other
<p>Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance,
see the following table:</p>
<table class="docutils align-default" id="id1">
<caption><span class="caption-number">Table 4 </span><span class="caption-text">WERs of LM rescoring with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 3 </span><span class="caption-text">WERs of LM rescoring with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 33%" />
<col style="width: 33%" />
@ -279,7 +279,7 @@ $ beam_size_4 7.57 best for test-other
<p>Its slightly better than LM rescoring. If we further increase the beam size, we will see
further improvements from LM rescoring + LODR:</p>
<table class="docutils align-default" id="id2">
<caption><span class="caption-number">Table 5 </span><span class="caption-text">WERs of LM rescoring + LODR with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 4 </span><span class="caption-text">WERs of LM rescoring + LODR with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 33%" />
<col style="width: 33%" />
@ -309,7 +309,7 @@ further improvements from LM rescoring + LODR:</p>
<p>As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods.
Here, we benchmark the WERs and decoding speed of them:</p>
<table class="docutils align-default" id="id3">
<caption><span class="caption-number">Table 6 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 5 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 25%" />
<col style="width: 25%" />

View File

@ -232,7 +232,7 @@ the LM score may dominant during decoding, leading to bad WER. A typical value o
</ul>
<p>Here, we also show how <cite>beam-size</cite> effect the WER and decoding time:</p>
<table class="docutils align-default" id="id2">
<caption><span class="caption-number">Table 2 </span><span class="caption-text">WERs and decoding time (on test-clean) of shallow fusion with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 1 </span><span class="caption-text">WERs and decoding time (on test-clean) of shallow fusion with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 25%" />
<col style="width: 25%" />

View File

@ -151,6 +151,10 @@ speech recognition recipes using <a class="reference external" href="https://git
<li class="toctree-l3"><a class="reference internal" href="recipes/Streaming-ASR/librispeech/index.html">LibriSpeech</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="recipes/RNN-LM/index.html">RNN-LM</a><ul>
<li class="toctree-l3"><a class="reference internal" href="recipes/RNN-LM/librispeech/lm-training.html">Train an RNN langugage model</a></li>
</ul>
</li>
</ul>
</li>
</ul>

Binary file not shown.

View File

@ -65,6 +65,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -65,6 +65,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -65,6 +65,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -65,6 +65,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -60,6 +60,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -68,6 +68,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -68,6 +68,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -68,6 +68,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -68,6 +68,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -68,6 +68,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -68,6 +68,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -68,6 +68,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -64,6 +64,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -64,6 +64,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -64,6 +64,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -63,6 +63,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -63,6 +63,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

137
recipes/RNN-LM/index.html Normal file
View File

@ -0,0 +1,137 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>RNN-LM &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script src="../../_static/jquery.js?v=5d32c60e"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js?v=e031e9a9"></script>
<script src="../../_static/doctools.js?v=888ff710"></script>
<script src="../../_static/sphinx_highlight.js?v=4825356b"></script>
<script src="../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="Train an RNN langugage model" href="librispeech/lm-training.html" />
<link rel="prev" title="Zipformer Transducer" href="../Streaming-ASR/librispeech/zipformer_transducer.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home">
icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../for-dummies/index.html">Icefall for dummies tutorial</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../docker/index.html">Docker</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../faqs.html">Frequently Asked Questions (FAQs)</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">RNN-LM</a><ul>
<li class="toctree-l3"><a class="reference internal" href="librispeech/lm-training.html">Train an RNN langugage model</a></li>
</ul>
</li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li>
<li class="breadcrumb-item active">RNN-LM</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/RNN-LM/index.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="rnn-lm">
<h1>RNN-LM<a class="headerlink" href="#rnn-lm" title="Permalink to this heading"></a></h1>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="librispeech/lm-training.html">Train an RNN langugage model</a></li>
</ul>
</div>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../Streaming-ASR/librispeech/zipformer_transducer.html" class="btn btn-neutral float-left" title="Zipformer Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="librispeech/lm-training.html" class="btn btn-neutral float-right" title="Train an RNN langugage model" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -0,0 +1,212 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Train an RNN langugage model &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script src="../../../_static/jquery.js?v=5d32c60e"></script>
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js?v=e031e9a9"></script>
<script src="../../../_static/doctools.js?v=888ff710"></script>
<script src="../../../_static/sphinx_highlight.js?v=4825356b"></script>
<script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Contributing" href="../../../contributing/index.html" />
<link rel="prev" title="RNN-LM" href="../index.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../../index.html" class="icon icon-home">
icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../for-dummies/index.html">Icefall for dummies tutorial</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../docker/index.html">Docker</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../faqs.html">Frequently Asked Questions (FAQs)</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">RNN-LM</a><ul class="current">
<li class="toctree-l3 current"><a class="current reference internal" href="#">Train an RNN langugage model</a></li>
</ul>
</li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../../index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">RNN-LM</a></li>
<li class="breadcrumb-item active">Train an RNN langugage model</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/RNN-LM/librispeech/lm-training.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="train-an-rnn-langugage-model">
<span id="train-nnlm"></span><h1>Train an RNN langugage model<a class="headerlink" href="#train-an-rnn-langugage-model" title="Permalink to this heading"></a></h1>
<p>If you have enough text data, you can train a neural network language model (NNLM) to improve
the WER of your E2E ASR system. This tutorial shows you how to train an RNNLM from
scratch.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>For how to use an NNLM during decoding, please refer to the following tutorials:
<a class="reference internal" href="../../../decoding-with-langugage-models/shallow-fusion.html#shallow-fusion"><span class="std std-ref">Shallow fusion for Transducer</span></a>, <a class="reference internal" href="../../../decoding-with-langugage-models/LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>, <a class="reference internal" href="../../../decoding-with-langugage-models/rescoring.html#rescoring"><span class="std std-ref">LM rescoring for Transducer</span></a></p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>This tutorial is based on the LibriSpeech recipe. Please check it out for the necessary
python scripts for this tutorial. We use the LibriSpeech LM-corpus as the LM training set
for illustration purpose. You can also collect your own data. The data format is quite simple:
each line should contain a complete sentence, and words should be separated by space.</p>
</div>
<p>First, lets download the training data for the RNNLM. This can be done via the
following command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>wget<span class="w"> </span>https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz
$<span class="w"> </span>gzip<span class="w"> </span>-d<span class="w"> </span>librispeech-lm-norm.txt.gz
</pre></div>
</div>
<p>As we are training a BPE-level RNNLM, we need to tokenize the training text, which requires a
BPE tokenizer. This can be achieved by executing the following command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># if you don&#39;t have the BPE</span>
$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500
$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span>bpe.model
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>../../..
$<span class="w"> </span>./local/prepare_lm_training_data.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--bpe-model<span class="w"> </span>icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500/bpe.model<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lm-data<span class="w"> </span>librispeech-lm-norm.txt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lm-archive<span class="w"> </span>data/lang_bpe_500/lm_data.pt
</pre></div>
</div>
<p>Now, you should have a file name <code class="docutils literal notranslate"><span class="pre">lm_data.pt</span></code> file store under the directory <code class="docutils literal notranslate"><span class="pre">data/lang_bpe_500</span></code>.
This is the packed training data for the RNNLM. We then sort the training data according to its
sentence length.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># This could take a while (~ 20 minutes), feel free to grab a cup of coffee :)</span>
$<span class="w"> </span>./local/sort_lm_training_data.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--in-lm-data<span class="w"> </span>data/lang_bpe_500/lm_data.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--out-lm-data<span class="w"> </span>data/lang_bpe_500/sorted_lm_data.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--out-statistics<span class="w"> </span>data/lang_bpe_500/lm_data_stats.txt
</pre></div>
</div>
<p>The aforementioned steps can be repeated to create a a validation set for you RNNLM. Lets say
you have a validation set in <code class="docutils literal notranslate"><span class="pre">valid.txt</span></code>, you can just set <code class="docutils literal notranslate"><span class="pre">--lm-data</span> <span class="pre">valid.txt</span></code>
and <code class="docutils literal notranslate"><span class="pre">--lm-archive</span> <span class="pre">data/lang_bpe_500/lm-data-valid.pt</span></code> when calling <code class="docutils literal notranslate"><span class="pre">./local/prepare_lm_training_data.py</span></code>.</p>
<p>After completing the previous steps, the training and testing sets for training RNNLM are ready.
The next step is to train the RNNLM model. The training command is as follows:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># assume you are in the icefall root directory</span>
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>rnn_lm
$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>../../egs/librispeech/ASR/data<span class="w"> </span>.
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>..
$<span class="w"> </span>./rnn_lm/train.py<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--world-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--exp-dir<span class="w"> </span>./rnn_lm/exp<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--start-epoch<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-epochs<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--use-fp16<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--tie-weights<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--hidden_dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--batch-size<span class="w"> </span><span class="m">300</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lm-data<span class="w"> </span>rnn_lm/data/lang_bpe_500/sorted_lm_data.pt<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--lm-data-valid<span class="w"> </span>rnn_lm/data/lang_bpe_500/sorted_lm_data.pt
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>You can adjust the RNNLM hyper parameters to control the size of the RNNLM,
such as embedding dimension and hidden state dimension. For more details, please
run <code class="docutils literal notranslate"><span class="pre">./rnn_lm/train.py</span> <span class="pre">--help</span></code>.</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The training of RNNLM can take a long time (usually a couple of days).</p>
</div>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../index.html" class="btn btn-neutral float-left" title="RNN-LM" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../../../contributing/index.html" class="btn btn-neutral float-right" title="Contributing" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -58,6 +58,7 @@
<li class="toctree-l3"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -62,6 +62,7 @@
<li class="toctree-l3"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -63,6 +63,7 @@
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -63,6 +63,7 @@
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -63,6 +63,7 @@
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>

View File

@ -19,7 +19,7 @@
<script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Contributing" href="../../../contributing/index.html" />
<link rel="next" title="RNN-LM" href="../../RNN-LM/index.html" />
<link rel="prev" title="LSTM Transducer" href="lstm_pruned_stateless_transducer.html" />
</head>
@ -63,6 +63,7 @@
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>
@ -732,7 +733,7 @@ for how to deploy the models in <code class="docutils literal notranslate"><span
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-left" title="LSTM Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../../../contributing/index.html" class="btn btn-neutral float-right" title="Contributing" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
<a href="../../RNN-LM/index.html" class="btn btn-neutral float-right" title="RNN-LM" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>

View File

@ -54,6 +54,7 @@
<li class="toctree-l1 current"><a class="current reference internal" href="#">Recipes</a><ul>
<li class="toctree-l2"><a class="reference internal" href="Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="RNN-LM/index.html">RNN-LM</a></li>
</ul>
</li>
</ul>
@ -109,6 +110,10 @@ Currently, only speech recognition recipes are provided.</p>
<li class="toctree-l2"><a class="reference internal" href="Streaming-ASR/librispeech/index.html">LibriSpeech</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="RNN-LM/index.html">RNN-LM</a><ul>
<li class="toctree-l2"><a class="reference internal" href="RNN-LM/librispeech/lm-training.html">Train an RNN langugage model</a></li>
</ul>
</li>
</ul>
</div>
</section>

File diff suppressed because one or more lines are too long