mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-09-07 08:04:18 +00:00
Merge branch 'k2-fsa:master' into dev_swbd
This commit is contained in:
commit
02791de220
@ -2,12 +2,13 @@ Decoding with language models
|
|||||||
=============================
|
=============================
|
||||||
|
|
||||||
This section describes how to use external langugage models
|
This section describes how to use external langugage models
|
||||||
during decoding to improve the WER of transducer models.
|
during decoding to improve the WER of transducer models. To train an external language model,
|
||||||
|
please refer to this tutorial: :ref:`train_nnlm`.
|
||||||
|
|
||||||
The following decoding methods with external langugage models are available:
|
The following decoding methods with external langugage models are available:
|
||||||
|
|
||||||
|
|
||||||
.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)
|
.. list-table::
|
||||||
:widths: 25 50
|
:widths: 25 50
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
|
|
||||||
|
@ -47,7 +47,7 @@ The data preparation contains several stages, you can use the following two
|
|||||||
options:
|
options:
|
||||||
|
|
||||||
- ``--stage``
|
- ``--stage``
|
||||||
- ``--stop-stage``
|
- ``--stop_stage``
|
||||||
|
|
||||||
to control which stage(s) should be run. By default, all stages are executed.
|
to control which stage(s) should be run. By default, all stages are executed.
|
||||||
|
|
||||||
@ -56,8 +56,8 @@ For example,
|
|||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
$ cd egs/librispeech/ASR
|
$ cd egs/librispeech/ASR
|
||||||
$ ./prepare.sh --stage 0 --stop-stage 0 # run only stage 0
|
$ ./prepare.sh --stage 0 --stop_stage 0 # run only stage 0
|
||||||
$ ./prepare.sh --stage 2 --stop-stage 5 # run from stage 2 to stage 5
|
$ ./prepare.sh --stage 2 --stop_stage 5 # run from stage 2 to stage 5
|
||||||
|
|
||||||
.. HINT::
|
.. HINT::
|
||||||
|
|
||||||
@ -108,15 +108,15 @@ As usual, you can control the stages you want to run by specifying the following
|
|||||||
two options:
|
two options:
|
||||||
|
|
||||||
- ``--stage``
|
- ``--stage``
|
||||||
- ``--stop-stage``
|
- ``--stop_stage``
|
||||||
|
|
||||||
For example,
|
For example,
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
$ cd egs/librispeech/ASR
|
$ cd egs/librispeech/ASR
|
||||||
$ ./distillation_with_hubert.sh --stage 0 --stop-stage 0 # run only stage 0
|
$ ./distillation_with_hubert.sh --stage 0 --stop_stage 0 # run only stage 0
|
||||||
$ ./distillation_with_hubert.sh --stage 2 --stop-stage 4 # run from stage 2 to stage 5
|
$ ./distillation_with_hubert.sh --stage 2 --stop_stage 4 # run from stage 2 to stage 5
|
||||||
|
|
||||||
Here are a few options in `./distillation_with_hubert.sh <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/distillation_with_hubert.sh>`_
|
Here are a few options in `./distillation_with_hubert.sh <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/distillation_with_hubert.sh>`_
|
||||||
you need to know before you proceed.
|
you need to know before you proceed.
|
||||||
@ -134,7 +134,7 @@ and prepares MVQ-augmented training manifests.
|
|||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
$ ./distillation_with_hubert.sh --stage 2 --stop-stage 2 # run only stage 2
|
$ ./distillation_with_hubert.sh --stage 2 --stop_stage 2 # run only stage 2
|
||||||
|
|
||||||
Please see the
|
Please see the
|
||||||
following screenshot for the output of an example execution.
|
following screenshot for the output of an example execution.
|
||||||
@ -172,7 +172,7 @@ To perform training, please run stage 3 by executing the following command.
|
|||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
$ ./prepare.sh --stage 3 --stop-stage 3 # run MVQ training
|
$ ./prepare.sh --stage 3 --stop_stage 3 # run MVQ training
|
||||||
|
|
||||||
Here is the code snippet for training:
|
Here is the code snippet for training:
|
||||||
|
|
||||||
|
7
docs/source/recipes/RNN-LM/index.rst
Normal file
7
docs/source/recipes/RNN-LM/index.rst
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
RNN-LM
|
||||||
|
======
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
librispeech/lm-training
|
104
docs/source/recipes/RNN-LM/librispeech/lm-training.rst
Normal file
104
docs/source/recipes/RNN-LM/librispeech/lm-training.rst
Normal file
@ -0,0 +1,104 @@
|
|||||||
|
.. _train_nnlm:
|
||||||
|
|
||||||
|
Train an RNN langugage model
|
||||||
|
======================================
|
||||||
|
|
||||||
|
If you have enough text data, you can train a neural network language model (NNLM) to improve
|
||||||
|
the WER of your E2E ASR system. This tutorial shows you how to train an RNNLM from
|
||||||
|
scratch.
|
||||||
|
|
||||||
|
.. HINT::
|
||||||
|
|
||||||
|
For how to use an NNLM during decoding, please refer to the following tutorials:
|
||||||
|
:ref:`shallow_fusion`, :ref:`LODR`, :ref:`rescoring`
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This tutorial is based on the LibriSpeech recipe. Please check it out for the necessary
|
||||||
|
python scripts for this tutorial. We use the LibriSpeech LM-corpus as the LM training set
|
||||||
|
for illustration purpose. You can also collect your own data. The data format is quite simple:
|
||||||
|
each line should contain a complete sentence, and words should be separated by space.
|
||||||
|
|
||||||
|
First, let's download the training data for the RNNLM. This can be done via the
|
||||||
|
following command:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ wget https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz
|
||||||
|
$ gzip -d librispeech-lm-norm.txt.gz
|
||||||
|
|
||||||
|
As we are training a BPE-level RNNLM, we need to tokenize the training text, which requires a
|
||||||
|
BPE tokenizer. This can be achieved by executing the following command:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ # if you don't have the BPE
|
||||||
|
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15
|
||||||
|
$ cd icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500
|
||||||
|
$ git lfs pull --include bpe.model
|
||||||
|
$ cd ../../..
|
||||||
|
|
||||||
|
$ ./local/prepare_lm_training_data.py \
|
||||||
|
--bpe-model icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500/bpe.model \
|
||||||
|
--lm-data librispeech-lm-norm.txt \
|
||||||
|
--lm-archive data/lang_bpe_500/lm_data.pt
|
||||||
|
|
||||||
|
Now, you should have a file name ``lm_data.pt`` file store under the directory ``data/lang_bpe_500``.
|
||||||
|
This is the packed training data for the RNNLM. We then sort the training data according to its
|
||||||
|
sentence length.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ # This could take a while (~ 20 minutes), feel free to grab a cup of coffee :)
|
||||||
|
$ ./local/sort_lm_training_data.py \
|
||||||
|
--in-lm-data data/lang_bpe_500/lm_data.pt \
|
||||||
|
--out-lm-data data/lang_bpe_500/sorted_lm_data.pt \
|
||||||
|
--out-statistics data/lang_bpe_500/lm_data_stats.txt
|
||||||
|
|
||||||
|
|
||||||
|
The aforementioned steps can be repeated to create a a validation set for you RNNLM. Let's say
|
||||||
|
you have a validation set in ``valid.txt``, you can just set ``--lm-data valid.txt``
|
||||||
|
and ``--lm-archive data/lang_bpe_500/lm-data-valid.pt`` when calling ``./local/prepare_lm_training_data.py``.
|
||||||
|
|
||||||
|
After completing the previous steps, the training and testing sets for training RNNLM are ready.
|
||||||
|
The next step is to train the RNNLM model. The training command is as follows:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
$ # assume you are in the icefall root directory
|
||||||
|
$ cd rnn_lm
|
||||||
|
$ ln -s ../../egs/librispeech/ASR/data .
|
||||||
|
$ cd ..
|
||||||
|
$ ./rnn_lm/train.py \
|
||||||
|
--world-size 4 \
|
||||||
|
--exp-dir ./rnn_lm/exp \
|
||||||
|
--start-epoch 0 \
|
||||||
|
--num-epochs 10 \
|
||||||
|
--use-fp16 0 \
|
||||||
|
--tie-weights 1 \
|
||||||
|
--embedding-dim 2048 \
|
||||||
|
--hidden_dim 2048 \
|
||||||
|
--num-layers 3 \
|
||||||
|
--batch-size 300 \
|
||||||
|
--lm-data rnn_lm/data/lang_bpe_500/sorted_lm_data.pt \
|
||||||
|
--lm-data-valid rnn_lm/data/lang_bpe_500/sorted_lm_data.pt
|
||||||
|
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
You can adjust the RNNLM hyper parameters to control the size of the RNNLM,
|
||||||
|
such as embedding dimension and hidden state dimension. For more details, please
|
||||||
|
run ``./rnn_lm/train.py --help``.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The training of RNNLM can take a long time (usually a couple of days).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -15,3 +15,4 @@ We may add recipes for other tasks as well in the future.
|
|||||||
|
|
||||||
Non-streaming-ASR/index
|
Non-streaming-ASR/index
|
||||||
Streaming-ASR/index
|
Streaming-ASR/index
|
||||||
|
RNN-LM/index
|
||||||
|
@ -56,6 +56,8 @@ use_extracted_codebook=True
|
|||||||
# "hubert_xtralarge_ll60k" -> pretrained model without fintuing
|
# "hubert_xtralarge_ll60k" -> pretrained model without fintuing
|
||||||
teacher_model_id=hubert_xtralarge_ll60k_finetune_ls960
|
teacher_model_id=hubert_xtralarge_ll60k_finetune_ls960
|
||||||
|
|
||||||
|
. shared/parse_options.sh || exit 1
|
||||||
|
|
||||||
log() {
|
log() {
|
||||||
# This function is from espnet
|
# This function is from espnet
|
||||||
local fname=${BASH_SOURCE[1]##*/}
|
local fname=${BASH_SOURCE[1]##*/}
|
||||||
|
1
egs/tedlium3/ASR/conformer_ctc2/local
Symbolic link
1
egs/tedlium3/ASR/conformer_ctc2/local
Symbolic link
@ -0,0 +1 @@
|
|||||||
|
../local
|
1
egs/tedlium3/ASR/pruned_transducer_stateless/local
Symbolic link
1
egs/tedlium3/ASR/pruned_transducer_stateless/local
Symbolic link
@ -0,0 +1 @@
|
|||||||
|
../local
|
1
egs/tedlium3/ASR/transducer_stateless/local
Symbolic link
1
egs/tedlium3/ASR/transducer_stateless/local
Symbolic link
@ -0,0 +1 @@
|
|||||||
|
../local
|
1
egs/tedlium3/ASR/zipformer/local
Symbolic link
1
egs/tedlium3/ASR/zipformer/local
Symbolic link
@ -0,0 +1 @@
|
|||||||
|
../local
|
Loading…
x
Reference in New Issue
Block a user