deploy: e2fcb42f5f176d9e39eb38506ab99d0a3adaf202

2025-12-11 06:55:27 +00:00 · 2024-01-09 07:44:23 +00:00 · 2024-01-09 07:44:23 +00:00 · a1d1f2e434
commit a1d1f2e434
parent 6dfba8b4a3
3 changed files with 11 additions and 20 deletions
--- a/_sources/recipes/RNN-LM/librispeech/lm-training.rst.txt
+++ b/_sources/recipes/RNN-LM/librispeech/lm-training.rst.txt
@ -4,7 +4,7 @@ Train an RNN language model
 ======================================

 If you have enough text data, you can train a neural network language model (NNLM) to improve
-the WER of your E2E ASR system. This tutorial shows you how to train an RNNLM from 
+the WER of your E2E ASR system. This tutorial shows you how to train an RNNLM from
 scratch.

 .. HINT::
@ -15,23 +15,23 @@ scratch.
 .. note::

    This tutorial is based on the LibriSpeech recipe. Please check it out for the necessary
-    python scripts for this tutorial. We use the LibriSpeech LM-corpus as the LM training set 
+    python scripts for this tutorial. We use the LibriSpeech LM-corpus as the LM training set
    for illustration purpose. You can also collect your own data. The data format is quite simple:
    each line should contain a complete sentence, and words should be separated by space.

-First, let's download the training data for the RNNLM. This can be done via the 
+First, let's download the training data for the RNNLM. This can be done via the
 following command:

 .. code-block:: bash

-    $ wget https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz 
+    $ wget https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz
    $ gzip -d librispeech-lm-norm.txt.gz

 As we are training a BPE-level RNNLM, we need to tokenize the training text, which requires a
 BPE tokenizer. This can be achieved by executing the following command:

 .. code-block:: bash
-    
+
    $ # if you don't have the BPE
    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15
    $ cd icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500
@ -56,11 +56,11 @@ sentence length.
        --out-statistics data/lang_bpe_500/lm_data_stats.txt


-The aforementioned steps can be repeated to create a a validation set for you RNNLM. Let's say 
-you have a validation set in ``valid.txt``, you can just set ``--lm-data valid.txt`` 
+The aforementioned steps can be repeated to create a a validation set for you RNNLM. Let's say
+you have a validation set in ``valid.txt``, you can just set ``--lm-data valid.txt``
 and ``--lm-archive data/lang_bpe_500/lm-data-valid.pt`` when calling ``./local/prepare_lm_training_data.py``.

-After completing the previous steps, the training and testing sets for training RNNLM are ready. 
+After completing the previous steps, the training and testing sets for training RNNLM are ready.
 The next step is to train the RNNLM model. The training command is as follows:

 .. code-block:: bash
@ -77,7 +77,7 @@ The next step is to train the RNNLM model. The training command is as follows:
        --use-fp16 0 \
        --tie-weights 1 \
        --embedding-dim 2048 \
-        --hidden_dim 2048 \
+        --hidden-dim 2048 \
        --num-layers 3 \
        --batch-size 300 \
        --lm-data rnn_lm/data/lang_bpe_500/sorted_lm_data.pt \
@ -93,12 +93,3 @@ The next step is to train the RNNLM model. The training command is as follows:
 .. note::

    The training of RNNLM can take a long time (usually a couple of days).
-
-
-
-
-
-
-
-
-
--- a/recipes/RNN-LM/librispeech/lm-training.html
+++ b/recipes/RNN-LM/librispeech/lm-training.html
@ -162,7 +162,7 @@ $<span class="w"> </span>./rnn_lm/train.py<span class="w"> </span><span class="s
 <span class="w">    </span>--use-fp16<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
 <span class="w">    </span>--tie-weights<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
 <span class="w">    </span>--embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
-<span class="w">    </span>--hidden_dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
 <span class="w">    </span>--num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
 <span class="w">    </span>--batch-size<span class="w"> </span><span class="m">300</span><span class="w"> </span><span class="se">\</span>
 <span class="w">    </span>--lm-data<span class="w"> </span>rnn_lm/data/lang_bpe_500/sorted_lm_data.pt<span class="w"> </span><span class="se">\</span>
--- a/searchindex.js
+++ b/searchindex.js