mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-08 09:32:20 +00:00
Add docs for distillation (#812)
* add README to docs * update documents for distillation * upload png files
This commit is contained in:
parent
8582b6e41a
commit
142420b3af
@ -0,0 +1,220 @@
|
||||
Distillation with HuBERT
|
||||
========================
|
||||
|
||||
This totorial shows you how to perform knowledge distillation in ``icefall``
|
||||
with the `LibriSpeech <https://www.openslr.org/12>`_ dataset. The distillation method
|
||||
used here is called "Multi Vector Quantization Knowledge Distillation" (MVQ-KD).
|
||||
Please have a look at our paper `Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation <https://arxiv.org/abs/2211.00508>`_
|
||||
for more details about MVQ-KD.
|
||||
|
||||
.. note::
|
||||
|
||||
This tutorial is based on recipe
|
||||
`pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_.
|
||||
Currently, we only implement MVQ-KD in this recipe. However, MVQ-KD is theoretically applicable to all recipes
|
||||
with only minor changes needed. Feel free to try out MVQ-KD in different recipes. If you
|
||||
encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`_.
|
||||
|
||||
.. note::
|
||||
|
||||
We assume you have read the page :ref:`install icefall` and have setup
|
||||
the environment for ``icefall``.
|
||||
|
||||
.. HINT::
|
||||
|
||||
We recommend you to use a GPU or several GPUs to run this recipe.
|
||||
|
||||
Data preparation
|
||||
----------------
|
||||
|
||||
We first prepare necessary training data for ``LibriSpeech``.
|
||||
This is the same as in `Pruned_transducer_statelessX <./pruned_transducer_stateless.rst>`_.
|
||||
|
||||
.. hint::
|
||||
|
||||
The data preparation is the same as other recipes on LibriSpeech dataset,
|
||||
if you have finished this step, you can skip to ``Codebook index preparation`` directly.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ cd egs/librispeech/ASR
|
||||
$ ./prepare.sh
|
||||
|
||||
The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
|
||||
All you need to do is to run it.
|
||||
|
||||
The data preparation contains several stages, you can use the following two
|
||||
options:
|
||||
|
||||
- ``--stage``
|
||||
- ``--stop-stage``
|
||||
|
||||
to control which stage(s) should be run. By default, all stages are executed.
|
||||
|
||||
For example,
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ cd egs/librispeech/ASR
|
||||
$ ./prepare.sh --stage 0 --stop-stage 0 # run only stage 0
|
||||
$ ./prepare.sh --stage 2 --stop-stage 5 # run from stage 2 to stage 5
|
||||
|
||||
.. HINT::
|
||||
|
||||
If you have pre-downloaded the `LibriSpeech <https://www.openslr.org/12>`_
|
||||
dataset and the `musan <http://www.openslr.org/17/>`_ dataset, say,
|
||||
they are saved in ``/tmp/LibriSpeech`` and ``/tmp/musan``, you can modify
|
||||
the ``dl_dir`` variable in ``./prepare.sh`` to point to ``/tmp`` so that
|
||||
``./prepare.sh`` won't re-download them.
|
||||
|
||||
.. NOTE::
|
||||
|
||||
All generated files by ``./prepare.sh``, e.g., features, lexicon, etc,
|
||||
are saved in ``./data`` directory.
|
||||
|
||||
We provide the following YouTube video showing how to run ``./prepare.sh``.
|
||||
|
||||
.. note::
|
||||
|
||||
To get the latest news of `next-gen Kaldi <https://github.com/k2-fsa>`_, please subscribe
|
||||
the following YouTube channel by `Nadira Povey <https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_:
|
||||
|
||||
`<https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_
|
||||
|
||||
.. youtube:: ofEIoJL-mGM
|
||||
|
||||
|
||||
Codebook index preparation
|
||||
--------------------------
|
||||
|
||||
Here, we prepare necessary data for MVQ-KD. This requires the generation
|
||||
of codebook indexes (please read our `paper <https://arxiv.org/abs/2211.00508>`_.
|
||||
if you are interested in details). In this tutorial, we use the pre-computed
|
||||
codebook indexes for convenience. The only thing you need to do is to
|
||||
run ``./distillation_with_hubert.sh``.
|
||||
|
||||
.. note::
|
||||
There are 5 stages in total, the first and second stage will be automatically skipped
|
||||
when choosing to downloaded codebook indexes prepared by `icefall`_.
|
||||
Of course, you can extract and compute the codebook indexes by yourself. This
|
||||
will require you downloading a HuBERT-XL model and it can take a while for
|
||||
the extraction of codebook indexes.
|
||||
|
||||
|
||||
As usual, you can control the stages you want to run by specifying the following
|
||||
two options:
|
||||
|
||||
- ``--stage``
|
||||
- ``--stop-stage``
|
||||
|
||||
For example,
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ cd egs/librispeech/ASR
|
||||
$ ./distillation_with_hubert.sh --stage 0 --stop-stage 0 # run only stage 0
|
||||
$ ./distillation_with_hubert.sh --stage 2 --stop-stage 4 # run from stage 2 to stage 5
|
||||
|
||||
Here are a few options in ``./distillation_with_hubert.sh``
|
||||
you need to know before you proceed.
|
||||
|
||||
- ``--full_libri`` If True, use full 960h data. Otherwise only ``train-clean-100`` will be used
|
||||
- ``--use_extracted_codebook`` If True, the first two stages will be skipped and the codebook
|
||||
indexes uploaded by us will be downloaded.
|
||||
|
||||
Since we are using the pre-computed codebook indexes, we set
|
||||
``use_extracted_codebook=True``. If you want to do full `LibriSpeech`_
|
||||
experiments, please set ``full_libri=True``.
|
||||
|
||||
The following command downloads the pre-computed codebook indexes
|
||||
and prepares MVQ-augmented training manifests.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ ./distillation_with_hubert.sh --stage 2 --stop-stage 2 # run only stage 2
|
||||
|
||||
Please see the
|
||||
following screenshot for the output of an example execution.
|
||||
|
||||
.. figure:: ./images/distillation_codebook.png
|
||||
:width: 800
|
||||
:alt: Downloading codebook indexes and preparing training manifest.
|
||||
:align: center
|
||||
|
||||
Downloading codebook indexes and preparing training manifest.
|
||||
|
||||
.. hint::
|
||||
|
||||
The codebook indexes we prepared for you in this tutorial
|
||||
are extracted from the 36-th layer of a fine-tuned HuBERT-XL model
|
||||
with 8 codebooks. If you want to try other configurations, please
|
||||
set ``use_extracted_codebook=False`` and set ``embedding_layer`` and
|
||||
``num_codebooks`` by yourself.
|
||||
|
||||
Now, you should see the following files under the direcory ``./data/vq_fbank_layer36_cb8``.
|
||||
|
||||
.. figure:: ./images/distillation_directory.png
|
||||
:width: 800
|
||||
:alt: MVQ-augmented training manifests
|
||||
:align: center
|
||||
|
||||
MVQ-augmented training manifests.
|
||||
|
||||
Whola! You are ready to perform knowledge distillation training now!
|
||||
|
||||
Training
|
||||
--------
|
||||
|
||||
To perform training, please run stage 3 by executing the following command.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ ./prepare.sh --stage 3 --stop-stage 3 # run MVQ training
|
||||
|
||||
Here is the code snippet for training:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
WORLD_SIZE=$(echo ${CUDA_VISIBLE_DEVICES} | awk '{n=split($1, _, ","); print n}')
|
||||
|
||||
./pruned_transducer_stateless6/train.py \
|
||||
--manifest-dir ./data/vq_fbank_layer36_cb8 \
|
||||
--master-port 12359 \
|
||||
--full-libri $full_libri \
|
||||
--spec-aug-time-warp-factor -1 \
|
||||
--max-duration 300 \
|
||||
--world-size ${WORLD_SIZE} \
|
||||
--num-epochs 30 \
|
||||
--exp-dir $exp_dir \
|
||||
--enable-distillation True \
|
||||
--codebook-loss-scale 0.01
|
||||
|
||||
There are a few training arguments in the following
|
||||
training commands that should be paid attention to.
|
||||
- ``--enable-distillation`` If True, knowledge distillation training is enabled.
|
||||
- ``--codebook-loss-scale`` The scale of the knowledge distillation loss.
|
||||
- ``--manifest-dir`` The path to the MVQ-augmented manifest.
|
||||
|
||||
|
||||
Decoding
|
||||
--------
|
||||
|
||||
After training finished, you can test the performance on using
|
||||
the following command.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
export CUDA_VISIBLE_DEVICES=0
|
||||
./pruned_transducer_stateless6/train.py \
|
||||
--decoding-method "modified_beam_search" \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--max-duration 200 \
|
||||
--exp-dir $exp_dir \
|
||||
--enable-distillation True
|
||||
|
||||
You should get similar results as `here <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md#distillation-with-hubert>`_.
|
||||
|
||||
That's all! Feel free to experiment with your own setups and report your results.
|
||||
If you encounter any problems during training, please open up an issue `here <https://github.com/k2-fsa/icefall/issues>`_.
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 56 KiB |
Binary file not shown.
After Width: | Height: | Size: 43 KiB |
@ -9,3 +9,4 @@ LibriSpeech
|
||||
pruned_transducer_stateless
|
||||
zipformer_mmi
|
||||
zipformer_ctc_blankskip
|
||||
distillation
|
||||
|
@ -150,7 +150,7 @@ if [ $stage -le 2 ] && [ $stop_stage -ge 2 ]; then
|
||||
num_codebooks=8
|
||||
|
||||
mkdir -p $exp_dir/vq
|
||||
codebook_dir=$exp_dir/vq/${teacher_model_id}_layer${embedding_layer}_cb${num_codebooks}
|
||||
codebook_dir=$exp_dir/vq/${teacher_model_id}
|
||||
mkdir -p codebook_dir
|
||||
codebook_download_dir=$exp_dir/download_codebook
|
||||
if [ -d $codebook_download_dir ]; then
|
||||
@ -180,9 +180,9 @@ if [ $stage -le 2 ] && [ $stop_stage -ge 2 ]; then
|
||||
./pruned_transducer_stateless6/extract_codebook_index.py \
|
||||
--full-libri $full_libri \
|
||||
--exp-dir $exp_dir \
|
||||
--embedding-layer 36 \
|
||||
--embedding-layer $embedding_layer \
|
||||
--num-utts 1000 \
|
||||
--num-codebooks 8 \
|
||||
--num-codebooks $num_codebooks \
|
||||
--max-duration 100 \
|
||||
--teacher-model-id $teacher_model_id \
|
||||
--use-extracted-codebook $use_extracted_codebook
|
||||
|
Loading…
x
Reference in New Issue
Block a user