From 97f9b9c33b9e3d4a7152c45f28dec397202aabb6 Mon Sep 17 00:00:00 2001 From: marcoyang1998 <45973641+marcoyang1998@users.noreply.github.com> Date: Mon, 25 Sep 2023 10:48:50 +0800 Subject: [PATCH 1/3] Add documentation for RNNLM training (#1267) * add documentation for training an RNNLM --- .../decoding-with-langugage-models/index.rst | 5 +- docs/source/recipes/RNN-LM/index.rst | 7 ++ .../RNN-LM/librispeech/lm-training.rst | 104 ++++++++++++++++++ docs/source/recipes/index.rst | 1 + 4 files changed, 115 insertions(+), 2 deletions(-) create mode 100644 docs/source/recipes/RNN-LM/index.rst create mode 100644 docs/source/recipes/RNN-LM/librispeech/lm-training.rst diff --git a/docs/source/decoding-with-langugage-models/index.rst b/docs/source/decoding-with-langugage-models/index.rst index 6e5e3a4d9..c49da9a4e 100644 --- a/docs/source/decoding-with-langugage-models/index.rst +++ b/docs/source/decoding-with-langugage-models/index.rst @@ -2,12 +2,13 @@ Decoding with language models ============================= This section describes how to use external langugage models -during decoding to improve the WER of transducer models. +during decoding to improve the WER of transducer models. To train an external language model, +please refer to this tutorial: :ref:`train_nnlm`. The following decoding methods with external langugage models are available: -.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean) +.. list-table:: :widths: 25 50 :header-rows: 1 diff --git a/docs/source/recipes/RNN-LM/index.rst b/docs/source/recipes/RNN-LM/index.rst new file mode 100644 index 000000000..4b74e64c7 --- /dev/null +++ b/docs/source/recipes/RNN-LM/index.rst @@ -0,0 +1,7 @@ +RNN-LM +====== + +.. toctree:: + :maxdepth: 2 + + librispeech/lm-training \ No newline at end of file diff --git a/docs/source/recipes/RNN-LM/librispeech/lm-training.rst b/docs/source/recipes/RNN-LM/librispeech/lm-training.rst new file mode 100644 index 000000000..736120275 --- /dev/null +++ b/docs/source/recipes/RNN-LM/librispeech/lm-training.rst @@ -0,0 +1,104 @@ +.. _train_nnlm: + +Train an RNN langugage model +====================================== + +If you have enough text data, you can train a neural network language model (NNLM) to improve +the WER of your E2E ASR system. This tutorial shows you how to train an RNNLM from +scratch. + +.. HINT:: + + For how to use an NNLM during decoding, please refer to the following tutorials: + :ref:`shallow_fusion`, :ref:`LODR`, :ref:`rescoring` + +.. note:: + + This tutorial is based on the LibriSpeech recipe. Please check it out for the necessary + python scripts for this tutorial. We use the LibriSpeech LM-corpus as the LM training set + for illustration purpose. You can also collect your own data. The data format is quite simple: + each line should contain a complete sentence, and words should be separated by space. + +First, let's download the training data for the RNNLM. This can be done via the +following command: + +.. code-block:: bash + + $ wget https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz + $ gzip -d librispeech-lm-norm.txt.gz + +As we are training a BPE-level RNNLM, we need to tokenize the training text, which requires a +BPE tokenizer. This can be achieved by executing the following command: + +.. code-block:: bash + + $ # if you don't have the BPE + $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15 + $ cd icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500 + $ git lfs pull --include bpe.model + $ cd ../../.. + + $ ./local/prepare_lm_training_data.py \ + --bpe-model icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500/bpe.model \ + --lm-data librispeech-lm-norm.txt \ + --lm-archive data/lang_bpe_500/lm_data.pt + +Now, you should have a file name ``lm_data.pt`` file store under the directory ``data/lang_bpe_500``. +This is the packed training data for the RNNLM. We then sort the training data according to its +sentence length. + +.. code-block:: bash + + $ # This could take a while (~ 20 minutes), feel free to grab a cup of coffee :) + $ ./local/sort_lm_training_data.py \ + --in-lm-data data/lang_bpe_500/lm_data.pt \ + --out-lm-data data/lang_bpe_500/sorted_lm_data.pt \ + --out-statistics data/lang_bpe_500/lm_data_stats.txt + + +The aforementioned steps can be repeated to create a a validation set for you RNNLM. Let's say +you have a validation set in ``valid.txt``, you can just set ``--lm-data valid.txt`` +and ``--lm-archive data/lang_bpe_500/lm-data-valid.pt`` when calling ``./local/prepare_lm_training_data.py``. + +After completing the previous steps, the training and testing sets for training RNNLM are ready. +The next step is to train the RNNLM model. The training command is as follows: + +.. code-block:: bash + + $ # assume you are in the icefall root directory + $ cd rnn_lm + $ ln -s ../../egs/librispeech/ASR/data . + $ cd .. + $ ./rnn_lm/train.py \ + --world-size 4 \ + --exp-dir ./rnn_lm/exp \ + --start-epoch 0 \ + --num-epochs 10 \ + --use-fp16 0 \ + --tie-weights 1 \ + --embedding-dim 2048 \ + --hidden_dim 2048 \ + --num-layers 3 \ + --batch-size 300 \ + --lm-data rnn_lm/data/lang_bpe_500/sorted_lm_data.pt \ + --lm-data-valid rnn_lm/data/lang_bpe_500/sorted_lm_data.pt + + +.. note:: + + You can adjust the RNNLM hyper parameters to control the size of the RNNLM, + such as embedding dimension and hidden state dimension. For more details, please + run ``./rnn_lm/train.py --help``. + +.. note:: + + The training of RNNLM can take a long time (usually a couple of days). + + + + + + + + + diff --git a/docs/source/recipes/index.rst b/docs/source/recipes/index.rst index 63793275c..7265e1cf6 100644 --- a/docs/source/recipes/index.rst +++ b/docs/source/recipes/index.rst @@ -15,3 +15,4 @@ We may add recipes for other tasks as well in the future. Non-streaming-ASR/index Streaming-ASR/index + RNN-LM/index From e17f884ace2dba7561d4d4eaaac6726234cad20f Mon Sep 17 00:00:00 2001 From: marcoyang1998 <45973641+marcoyang1998@users.noreply.github.com> Date: Mon, 25 Sep 2023 15:36:40 +0800 Subject: [PATCH 2/3] Fix docs for MVQ (#1272) * typo fix --- .../librispeech/distillation.rst | 16 ++++++++-------- egs/librispeech/ASR/distillation_with_hubert.sh | 2 ++ 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/docs/source/recipes/Non-streaming-ASR/librispeech/distillation.rst b/docs/source/recipes/Non-streaming-ASR/librispeech/distillation.rst index 2e8d0893a..37edf7de9 100644 --- a/docs/source/recipes/Non-streaming-ASR/librispeech/distillation.rst +++ b/docs/source/recipes/Non-streaming-ASR/librispeech/distillation.rst @@ -47,7 +47,7 @@ The data preparation contains several stages, you can use the following two options: - ``--stage`` - - ``--stop-stage`` + - ``--stop_stage`` to control which stage(s) should be run. By default, all stages are executed. @@ -56,8 +56,8 @@ For example, .. code-block:: bash $ cd egs/librispeech/ASR - $ ./prepare.sh --stage 0 --stop-stage 0 # run only stage 0 - $ ./prepare.sh --stage 2 --stop-stage 5 # run from stage 2 to stage 5 + $ ./prepare.sh --stage 0 --stop_stage 0 # run only stage 0 + $ ./prepare.sh --stage 2 --stop_stage 5 # run from stage 2 to stage 5 .. HINT:: @@ -108,15 +108,15 @@ As usual, you can control the stages you want to run by specifying the following two options: - ``--stage`` - - ``--stop-stage`` + - ``--stop_stage`` For example, .. code-block:: bash $ cd egs/librispeech/ASR - $ ./distillation_with_hubert.sh --stage 0 --stop-stage 0 # run only stage 0 - $ ./distillation_with_hubert.sh --stage 2 --stop-stage 4 # run from stage 2 to stage 5 + $ ./distillation_with_hubert.sh --stage 0 --stop_stage 0 # run only stage 0 + $ ./distillation_with_hubert.sh --stage 2 --stop_stage 4 # run from stage 2 to stage 5 Here are a few options in `./distillation_with_hubert.sh `_ you need to know before you proceed. @@ -134,7 +134,7 @@ and prepares MVQ-augmented training manifests. .. code-block:: bash - $ ./distillation_with_hubert.sh --stage 2 --stop-stage 2 # run only stage 2 + $ ./distillation_with_hubert.sh --stage 2 --stop_stage 2 # run only stage 2 Please see the following screenshot for the output of an example execution. @@ -172,7 +172,7 @@ To perform training, please run stage 3 by executing the following command. .. code-block:: bash - $ ./prepare.sh --stage 3 --stop-stage 3 # run MVQ training + $ ./prepare.sh --stage 3 --stop_stage 3 # run MVQ training Here is the code snippet for training: diff --git a/egs/librispeech/ASR/distillation_with_hubert.sh b/egs/librispeech/ASR/distillation_with_hubert.sh index 6aaa0333b..a5b0b85af 100755 --- a/egs/librispeech/ASR/distillation_with_hubert.sh +++ b/egs/librispeech/ASR/distillation_with_hubert.sh @@ -56,6 +56,8 @@ use_extracted_codebook=True # "hubert_xtralarge_ll60k" -> pretrained model without fintuing teacher_model_id=hubert_xtralarge_ll60k_finetune_ls960 +. shared/parse_options.sh || exit 1 + log() { # This function is from espnet local fname=${BASH_SOURCE[1]##*/} From 1b565dd25198f700bcfe88e86a0f6a435e11a429 Mon Sep 17 00:00:00 2001 From: zr_jin Date: Tue, 26 Sep 2023 15:41:39 +0800 Subject: [PATCH 3/3] added softlinks to local dir (#1273) --- egs/tedlium3/ASR/conformer_ctc2/local | 1 + egs/tedlium3/ASR/pruned_transducer_stateless/local | 1 + egs/tedlium3/ASR/transducer_stateless/local | 1 + egs/tedlium3/ASR/zipformer/local | 1 + 4 files changed, 4 insertions(+) create mode 120000 egs/tedlium3/ASR/conformer_ctc2/local create mode 120000 egs/tedlium3/ASR/pruned_transducer_stateless/local create mode 120000 egs/tedlium3/ASR/transducer_stateless/local create mode 120000 egs/tedlium3/ASR/zipformer/local diff --git a/egs/tedlium3/ASR/conformer_ctc2/local b/egs/tedlium3/ASR/conformer_ctc2/local new file mode 120000 index 000000000..c820590c5 --- /dev/null +++ b/egs/tedlium3/ASR/conformer_ctc2/local @@ -0,0 +1 @@ +../local \ No newline at end of file diff --git a/egs/tedlium3/ASR/pruned_transducer_stateless/local b/egs/tedlium3/ASR/pruned_transducer_stateless/local new file mode 120000 index 000000000..c820590c5 --- /dev/null +++ b/egs/tedlium3/ASR/pruned_transducer_stateless/local @@ -0,0 +1 @@ +../local \ No newline at end of file diff --git a/egs/tedlium3/ASR/transducer_stateless/local b/egs/tedlium3/ASR/transducer_stateless/local new file mode 120000 index 000000000..c820590c5 --- /dev/null +++ b/egs/tedlium3/ASR/transducer_stateless/local @@ -0,0 +1 @@ +../local \ No newline at end of file diff --git a/egs/tedlium3/ASR/zipformer/local b/egs/tedlium3/ASR/zipformer/local new file mode 120000 index 000000000..c820590c5 --- /dev/null +++ b/egs/tedlium3/ASR/zipformer/local @@ -0,0 +1 @@ +../local \ No newline at end of file