deploy: 142420b3afa7b07c95f733c2e72ee80078364a44

2025-12-11 06:55:27 +00:00 · 2023-01-11 08:46:03 +00:00 · 2023-01-11 08:46:03 +00:00 · c811b26c11
commit c811b26c11
parent 28468e226c
20 changed files with 574 additions and 11 deletions
--- a/_images/distillation_codebook.png
+++ b/_images/distillation_codebook.png
--- a/_images/distillation_directory.png
+++ b/_images/distillation_directory.png
--- a/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt
+++ b/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt
@ -0,0 +1,220 @@
+Distillation with HuBERT
+========================
+
+This totorial shows you how to perform knowledge distillation in ``icefall`` 
+with the `LibriSpeech <https://www.openslr.org/12>`_ dataset. The distillation method
+used here is called "Multi Vector Quantization Knowledge Distillation" (MVQ-KD). 
+Please have a look at our paper `Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation <https://arxiv.org/abs/2211.00508>`_
+for more details about MVQ-KD.
+
+.. note::
+
+    This tutorial is based on recipe
+     `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_.
+    Currently, we only implement MVQ-KD in this recipe. However, MVQ-KD is theoretically applicable to all recipes
+    with only minor changes needed. Feel free to try out MVQ-KD in different recipes. If you
+    encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`_.
+
+.. note::
+
+  We assume you have read the page :ref:`install icefall` and have setup
+  the environment for ``icefall``.
+
+.. HINT::
+
+  We recommend you to use a GPU or several GPUs to run this recipe.
+
+Data preparation
+----------------
+
+We first prepare necessary training data for ``LibriSpeech``. 
+This is the same as in `Pruned_transducer_statelessX <./pruned_transducer_stateless.rst>`_.
+
+.. hint::
+
+   The data preparation is the same as other recipes on LibriSpeech dataset,
+   if you have finished this step, you can skip to ``Codebook index preparation`` directly.
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./prepare.sh
+
+The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
+All you need to do is to run it.
+
+The data preparation contains several stages, you can use the following two
+options:
+
+  - ``--stage``
+  - ``--stop-stage``
+
+to control which stage(s) should be run. By default, all stages are executed.
+
+For example,
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./prepare.sh --stage 0 --stop-stage 0 # run only stage 0
+  $ ./prepare.sh --stage 2 --stop-stage 5 # run from stage 2 to stage 5
+
+.. HINT::
+
+  If you have pre-downloaded the `LibriSpeech <https://www.openslr.org/12>`_
+  dataset and the `musan <http://www.openslr.org/17/>`_ dataset, say,
+  they are saved in ``/tmp/LibriSpeech`` and ``/tmp/musan``, you can modify
+  the ``dl_dir`` variable in ``./prepare.sh`` to point to ``/tmp`` so that
+  ``./prepare.sh`` won't re-download them.
+
+.. NOTE::
+
+  All generated files by ``./prepare.sh``, e.g., features, lexicon, etc,
+  are saved in ``./data`` directory.
+
+We provide the following YouTube video showing how to run ``./prepare.sh``.
+
+.. note::
+
+   To get the latest news of `next-gen Kaldi <https://github.com/k2-fsa>`_, please subscribe
+   the following YouTube channel by `Nadira Povey <https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_:
+
+      `<https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_
+
+..  youtube:: ofEIoJL-mGM
+
+
+Codebook index preparation
+--------------------------
+
+Here, we prepare necessary data for MVQ-KD. This requires the generation
+of codebook indexes (please read our `paper <https://arxiv.org/abs/2211.00508>`_.
+if you are interested in details). In this tutorial, we use the pre-computed 
+codebook indexes for convenience. The only thing you need to do is to 
+run ``./distillation_with_hubert.sh``. 
+
+.. note::
+  There are 5 stages in total, the first and second stage will be automatically skipped 
+  when choosing to downloaded codebook indexes prepared by `icefall`_. 
+  Of course, you can extract and compute the codebook indexes by yourself. This 
+  will require you downloading a HuBERT-XL model and it can take a while for 
+  the extraction of codebook indexes.
+  
+
+As usual, you can control the stages you want to run by specifying the following 
+two options:
+
+  - ``--stage``
+  - ``--stop-stage``
+
+For example,
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./distillation_with_hubert.sh --stage 0 --stop-stage 0 # run only stage 0
+  $ ./distillation_with_hubert.sh --stage 2 --stop-stage 4 # run from stage 2 to stage 5
+
+Here are a few options in ``./distillation_with_hubert.sh`` 
+you need to know before you proceed.
+
+- ``--full_libri`` If True, use full 960h data. Otherwise only ``train-clean-100`` will be used
+- ``--use_extracted_codebook`` If True, the first two stages will be skipped and the codebook
+  indexes uploaded by us will be downloaded.
+
+Since we are using the pre-computed codebook indexes, we set
+``use_extracted_codebook=True``. If you want to do full `LibriSpeech`_
+experiments, please set ``full_libri=True``.
+
+The following command downloads the pre-computed codebook indexes 
+and prepares MVQ-augmented training manifests. 
+
+.. code-block:: bash
+
+  $ ./distillation_with_hubert.sh --stage 2 --stop-stage 2 # run only stage 2
+
+Please see the 
+following screenshot for the output of an example execution.
+
+.. figure:: ./images/distillation_codebook.png
+  :width: 800
+  :alt: Downloading codebook indexes and preparing training manifest.
+  :align: center
+
+  Downloading codebook indexes and preparing training manifest.
+
+.. hint::
+
+  The codebook indexes we prepared for you in this tutorial
+  are extracted from the 36-th layer of a fine-tuned HuBERT-XL model 
+  with 8 codebooks. If you want to try other configurations, please
+  set ``use_extracted_codebook=False`` and set ``embedding_layer`` and 
+  ``num_codebooks`` by yourself.
+
+Now, you should see the following files under the direcory ``./data/vq_fbank_layer36_cb8``.
+
+.. figure:: ./images/distillation_directory.png
+  :width: 800
+  :alt: MVQ-augmented training manifests
+  :align: center
+
+  MVQ-augmented training manifests.
+
+Whola! You are ready to perform knowledge distillation training now!
+
+Training
+--------
+
+To perform training, please run stage 3 by executing the following command. 
+
+.. code-block:: bash
+
+  $ ./prepare.sh --stage 3 --stop-stage 3 # run MVQ training
+
+Here is the code snippet for training:
+
+.. code-block:: bash
+
+  WORLD_SIZE=$(echo ${CUDA_VISIBLE_DEVICES} | awk '{n=split($1, _, ","); print n}')
+  
+  ./pruned_transducer_stateless6/train.py \
+    --manifest-dir ./data/vq_fbank_layer36_cb8 \
+    --master-port 12359 \
+    --full-libri $full_libri \
+    --spec-aug-time-warp-factor -1 \
+    --max-duration 300 \
+    --world-size ${WORLD_SIZE} \
+    --num-epochs 30 \
+    --exp-dir $exp_dir \
+    --enable-distillation True \
+    --codebook-loss-scale 0.01
+
+There are a few training arguments in the following
+training commands that should be paid attention to.
+  - ``--enable-distillation`` If True, knowledge distillation training is enabled.
+  - ``--codebook-loss-scale`` The scale of the knowledge distillation loss.
+  - ``--manifest-dir`` The path to the MVQ-augmented manifest.
+
+
+Decoding
+--------
+
+After training finished, you can test the performance on using
+the following command.
+
+.. code-block:: bash
+
+  export CUDA_VISIBLE_DEVICES=0  
+  ./pruned_transducer_stateless6/train.py \
+    --decoding-method "modified_beam_search" \
+    --epoch 30 \
+    --avg 10 \
+    --max-duration 200 \
+    --exp-dir $exp_dir \
+    --enable-distillation True
+
+You should get similar results as `here <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md#distillation-with-hubert>`_.
+
+That's all! Feel free to experiment with your own setups and report your results.
+If you encounter any problems during training, please open up an issue `here <https://github.com/k2-fsa/icefall/issues>`_.
+
--- a/_sources/recipes/Non-streaming-ASR/librispeech/index.rst.txt
+++ b/_sources/recipes/Non-streaming-ASR/librispeech/index.rst.txt
@ -9,3 +9,4 @@ LibriSpeech
   pruned_transducer_stateless
   zipformer_mmi
   zipformer_ctc_blankskip
+   distillation
--- a/contributing/code-style.html
+++ b/contributing/code-style.html
@ -115,7 +115,7 @@ $ pre-commit install
 <div><figure class="align-center" id="id2">
 <a class="reference internal image-reference" href="../_images/pre-commit-check.png"><img alt="../_images/pre-commit-check.png" src="../_images/pre-commit-check.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 10 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Failed).</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 12 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Failed).</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
@ -134,7 +134,7 @@ it should succeed this time:</p>
 <div><figure class="align-center" id="id3">
 <a class="reference internal image-reference" href="../_images/pre-commit-check-success.png"><img alt="../_images/pre-commit-check-success.png" src="../_images/pre-commit-check-success.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 11 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Succeeded).</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 13 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Succeeded).</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/contributing/doc.html
+++ b/contributing/doc.html
@ -123,7 +123,7 @@ the following:</p>
 <div><figure class="align-center" id="id1">
 <a class="reference internal image-reference" href="../_images/doc-contrib.png"><img alt="../_images/doc-contrib.png" src="../_images/doc-contrib.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 9 </span><span class="caption-text">View generated documentation locally with <code class="docutils literal notranslate"><span class="pre">python3</span> <span class="pre">-m</span> <span class="pre">http.server</span></code>.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 11 </span><span class="caption-text">View generated documentation locally with <code class="docutils literal notranslate"><span class="pre">python3</span> <span class="pre">-m</span> <span class="pre">http.server</span></code>.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/objects.inv
+++ b/objects.inv
--- a/recipes/Non-streaming-ASR/index.html
+++ b/recipes/Non-streaming-ASR/index.html
@ -104,6 +104,7 @@
 <li class="toctree-l2"><a class="reference internal" href="librispeech/pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
 <li class="toctree-l2"><a class="reference internal" href="librispeech/zipformer_mmi.html">Zipformer MMI</a></li>
 <li class="toctree-l2"><a class="reference internal" href="librispeech/zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l2"><a class="reference internal" href="librispeech/distillation.html">Distillation with HuBERT</a></li>
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="timit/index.html">TIMIT</a><ul>
--- a/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html
+++ b/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html
@ -55,6 +55,7 @@
 <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l4"><a class="reference internal" href="distillation.html">Distillation with HuBERT</a></li>
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
--- a/recipes/Non-streaming-ASR/librispeech/distillation.html
+++ b/recipes/Non-streaming-ASR/librispeech/distillation.html
@ -0,0 +1,334 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>Distillation with HuBERT &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../../../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
+        <script src="../../../_static/jquery.js"></script>
+        <script src="../../../_static/underscore.js"></script>
+        <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script src="../../../_static/doctools.js"></script>
+        <script src="../../../_static/sphinx_highlight.js"></script>
+    <script src="../../../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../../../genindex.html" />
+    <link rel="search" title="Search" href="../../../search.html" />
+    <link rel="next" title="TIMIT" href="../timit/index.html" />
+    <link rel="prev" title="Zipformer CTC Blank Skip" href="zipformer_ctc_blankskip.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+            <a href="../../../index.html" class="icon icon-home"> icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
+<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
+<li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
+<li class="toctree-l3 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
+<li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
+<li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
+<li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
+<li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
+<li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l4 current"><a class="current reference internal" href="#">Distillation with HuBERT</a></li>
+</ul>
+</li>
+<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
+</ul>
+</li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../../../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../../../index.html" class="icon icon-home"></a></li>
+          <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
+          <li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
+          <li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li>
+      <li class="breadcrumb-item active">Distillation with HuBERT</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/librispeech/distillation.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="distillation-with-hubert">
+<h1>Distillation with HuBERT<a class="headerlink" href="#distillation-with-hubert" title="Permalink to this heading"></a></h1>
+<p>This totorial shows you how to perform knowledge distillation in <code class="docutils literal notranslate"><span class="pre">icefall</span></code>
+with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset. The distillation method
+used here is called “Multi Vector Quantization Knowledge Distillation” (MVQ-KD).
+Please have a look at our paper <a class="reference external" href="https://arxiv.org/abs/2211.00508">Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation</a>
+for more details about MVQ-KD.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<dl class="simple">
+<dt>This tutorial is based on recipe</dt><dd><p><a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4">pruned_transducer_stateless4</a>.</p>
+</dd>
+</dl>
+<p>Currently, we only implement MVQ-KD in this recipe. However, MVQ-KD is theoretically applicable to all recipes
+with only minor changes needed. Feel free to try out MVQ-KD in different recipes. If you
+encounter any problems, please open an issue here <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">icefall</a>.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
+the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
+</div>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>We recommend you to use a GPU or several GPUs to run this recipe.</p>
+</div>
+<section id="data-preparation">
+<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
+<p>We first prepare necessary training data for <code class="docutils literal notranslate"><span class="pre">LibriSpeech</span></code>.
+This is the same as in <a class="reference external" href="./pruned_transducer_stateless.rst">Pruned_transducer_statelessX</a>.</p>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>The data preparation is the same as other recipes on LibriSpeech dataset,
+if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Codebook</span> <span class="pre">index</span> <span class="pre">preparation</span></code> directly.</p>
+</div>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
+$<span class="w"> </span>./prepare.sh
+</pre></div>
+</div>
+<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
+All you need to do is to run it.</p>
+<p>The data preparation contains several stages, you can use the following two
+options:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">--stage</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--stop-stage</span></code></p></li>
+</ul>
+</div></blockquote>
+<p>to control which stage(s) should be run. By default, all stages are executed.</p>
+<p>For example,</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
+$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1"># run only stage 0</span>
+$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="c1"># run from stage 2 to stage 5</span>
+</pre></div>
+</div>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>If you have pre-downloaded the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>
+dataset and the <a class="reference external" href="http://www.openslr.org/17/">musan</a> dataset, say,
+they are saved in <code class="docutils literal notranslate"><span class="pre">/tmp/LibriSpeech</span></code> and <code class="docutils literal notranslate"><span class="pre">/tmp/musan</span></code>, you can modify
+the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></code> variable in <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> to point to <code class="docutils literal notranslate"><span class="pre">/tmp</span></code> so that
+<code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> won’t re-download them.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>All generated files by <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>, e.g., features, lexicon, etc,
+are saved in <code class="docutils literal notranslate"><span class="pre">./data</span></code> directory.</p>
+</div>
+<p>We provide the following YouTube video showing how to run <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>To get the latest news of <a class="reference external" href="https://github.com/k2-fsa">next-gen Kaldi</a>, please subscribe
+the following YouTube channel by <a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">Nadira Povey</a>:</p>
+<blockquote>
+<div><p><a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw</a></p>
+</div></blockquote>
+</div>
+<div class="video_wrapper" style="">
+<iframe allowfullscreen="true" src="https://www.youtube.com/embed/ofEIoJL-mGM" style="border: 0; height: 345px; width: 560px">
+</iframe></div></section>
+<section id="codebook-index-preparation">
+<h2>Codebook index preparation<a class="headerlink" href="#codebook-index-preparation" title="Permalink to this heading"></a></h2>
+<p>Here, we prepare necessary data for MVQ-KD. This requires the generation
+of codebook indexes (please read our <a class="reference external" href="https://arxiv.org/abs/2211.00508">paper</a>.
+if you are interested in details). In this tutorial, we use the pre-computed
+codebook indexes for convenience. The only thing you need to do is to
+run <code class="docutils literal notranslate"><span class="pre">./distillation_with_hubert.sh</span></code>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>There are 5 stages in total, the first and second stage will be automatically skipped
+when choosing to downloaded codebook indexes prepared by <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">icefall</a>.
+Of course, you can extract and compute the codebook indexes by yourself. This
+will require you downloading a HuBERT-XL model and it can take a while for
+the extraction of codebook indexes.</p>
+</div>
+<p>As usual, you can control the stages you want to run by specifying the following
+two options:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">--stage</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--stop-stage</span></code></p></li>
+</ul>
+</div></blockquote>
+<p>For example,</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>egs/librispeech/ASR
+$<span class="w"> </span>./distillation_with_hubert.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">0</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1"># run only stage 0</span>
+$<span class="w"> </span>./distillation_with_hubert.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="c1"># run from stage 2 to stage 5</span>
+</pre></div>
+</div>
+<p>Here are a few options in <code class="docutils literal notranslate"><span class="pre">./distillation_with_hubert.sh</span></code>
+you need to know before you proceed.</p>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">--full_libri</span></code> If True, use full 960h data. Otherwise only <code class="docutils literal notranslate"><span class="pre">train-clean-100</span></code> will be used</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--use_extracted_codebook</span></code> If True, the first two stages will be skipped and the codebook
+indexes uploaded by us will be downloaded.</p></li>
+</ul>
+<p>Since we are using the pre-computed codebook indexes, we set
+<code class="docutils literal notranslate"><span class="pre">use_extracted_codebook=True</span></code>. If you want to do full <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>
+experiments, please set <code class="docutils literal notranslate"><span class="pre">full_libri=True</span></code>.</p>
+<p>The following command downloads the pre-computed codebook indexes
+and prepares MVQ-augmented training manifests.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./distillation_with_hubert.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">2</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="c1"># run only stage 2</span>
+</pre></div>
+</div>
+<p>Please see the
+following screenshot for the output of an example execution.</p>
+<figure class="align-center" id="id3">
+<a class="reference internal image-reference" href="../../../_images/distillation_codebook.png"><img alt="Downloading codebook indexes and preparing training manifest." src="../../../_images/distillation_codebook.png" style="width: 800px;" /></a>
+<figcaption>
+<p><span class="caption-number">Fig. 6 </span><span class="caption-text">Downloading codebook indexes and preparing training manifest.</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
+</figcaption>
+</figure>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>The codebook indexes we prepared for you in this tutorial
+are extracted from the 36-th layer of a fine-tuned HuBERT-XL model
+with 8 codebooks. If you want to try other configurations, please
+set <code class="docutils literal notranslate"><span class="pre">use_extracted_codebook=False</span></code> and set <code class="docutils literal notranslate"><span class="pre">embedding_layer</span></code> and
+<code class="docutils literal notranslate"><span class="pre">num_codebooks</span></code> by yourself.</p>
+</div>
+<p>Now, you should see the following files under the direcory <code class="docutils literal notranslate"><span class="pre">./data/vq_fbank_layer36_cb8</span></code>.</p>
+<figure class="align-center" id="id4">
+<a class="reference internal image-reference" href="../../../_images/distillation_directory.png"><img alt="MVQ-augmented training manifests" src="../../../_images/distillation_directory.png" style="width: 800px;" /></a>
+<figcaption>
+<p><span class="caption-number">Fig. 7 </span><span class="caption-text">MVQ-augmented training manifests.</span><a class="headerlink" href="#id4" title="Permalink to this image"></a></p>
+</figcaption>
+</figure>
+<p>Whola! You are ready to perform knowledge distillation training now!</p>
+</section>
+<section id="training">
+<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
+<p>To perform training, please run stage 3 by executing the following command.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>./prepare.sh<span class="w"> </span>--stage<span class="w"> </span><span class="m">3</span><span class="w"> </span>--stop-stage<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="c1"># run MVQ training</span>
+</pre></div>
+</div>
+<p>Here is the code snippet for training:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">WORLD_SIZE</span><span class="o">=</span><span class="k">$(</span><span class="nb">echo</span><span class="w"> </span><span class="si">${</span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="si">}</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>awk<span class="w"> </span><span class="s1">&#39;{n=split($1, _, &quot;,&quot;); print n}&#39;</span><span class="k">)</span>
+
+./pruned_transducer_stateless6/train.py<span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--manifest-dir<span class="w"> </span>./data/vq_fbank_layer36_cb8<span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--master-port<span class="w"> </span><span class="m">12359</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--full-libri<span class="w"> </span><span class="nv">$full_libri</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--spec-aug-time-warp-factor<span class="w"> </span>-1<span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--max-duration<span class="w"> </span><span class="m">300</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--world-size<span class="w"> </span><span class="si">${</span><span class="nv">WORLD_SIZE</span><span class="si">}</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--num-epochs<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--enable-distillation<span class="w"> </span>True<span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--codebook-loss-scale<span class="w"> </span><span class="m">0</span>.01
+</pre></div>
+</div>
+<p>There are a few training arguments in the following
+training commands that should be paid attention to.</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">--enable-distillation</span></code> If True, knowledge distillation training is enabled.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--codebook-loss-scale</span></code> The scale of the knowledge distillation loss.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--manifest-dir</span></code> The path to the MVQ-augmented manifest.</p></li>
+</ul>
+</div></blockquote>
+</section>
+<section id="decoding">
+<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
+<p>After training finished, you can test the performance on using
+the following command.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span><span class="w"> </span><span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="m">0</span>
+./pruned_transducer_stateless6/train.py<span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--decoding-method<span class="w"> </span><span class="s2">&quot;modified_beam_search&quot;</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--epoch<span class="w"> </span><span class="m">30</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--avg<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--max-duration<span class="w"> </span><span class="m">200</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>--enable-distillation<span class="w"> </span>True
+</pre></div>
+</div>
+<p>You should get similar results as <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md#distillation-with-hubert">here</a>.</p>
+<p>That’s all! Feel free to experiment with your own setups and report your results.
+If you encounter any problems during training, please open up an issue <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">here</a>.</p>
+</section>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="zipformer_ctc_blankskip.html" class="btn btn-neutral float-left" title="Zipformer CTC Blank Skip" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="../timit/index.html" class="btn btn-neutral float-right" title="TIMIT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
--- a/recipes/Non-streaming-ASR/librispeech/index.html
+++ b/recipes/Non-streaming-ASR/librispeech/index.html
@ -55,6 +55,7 @@
 <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l4"><a class="reference internal" href="distillation.html">Distillation with HuBERT</a></li>
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
@ -105,6 +106,7 @@
 <li class="toctree-l1"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
 <li class="toctree-l1"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
 <li class="toctree-l1"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l1"><a class="reference internal" href="distillation.html">Distillation with HuBERT</a></li>
 </ul>
 </div>
 </section>
--- a/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html
+++ b/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html
@ -55,6 +55,7 @@
 <li class="toctree-l4 current"><a class="current reference internal" href="#">Pruned transducer statelessX</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l4"><a class="reference internal" href="distillation.html">Distillation with HuBERT</a></li>
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
--- a/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html
+++ b/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html
@ -55,6 +55,7 @@
 <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l4"><a class="reference internal" href="distillation.html">Distillation with HuBERT</a></li>
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
--- a/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html
+++ b/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html
@ -20,7 +20,7 @@
    <script src="../../../_static/js/theme.js"></script>
    <link rel="index" title="Index" href="../../../genindex.html" />
    <link rel="search" title="Search" href="../../../search.html" />
-    <link rel="next" title="TIMIT" href="../timit/index.html" />
+    <link rel="next" title="Distillation with HuBERT" href="distillation.html" />
    <link rel="prev" title="Zipformer MMI" href="zipformer_mmi.html" /> 
 </head>

@ -55,6 +55,7 @@
 <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
 <li class="toctree-l4 current"><a class="current reference internal" href="#">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l4"><a class="reference internal" href="distillation.html">Distillation with HuBERT</a></li>
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
@ -526,7 +527,7 @@ for the details of the above pretrained models</p>
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="zipformer_mmi.html" class="btn btn-neutral float-left" title="Zipformer MMI" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
-        <a href="../timit/index.html" class="btn btn-neutral float-right" title="TIMIT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+        <a href="distillation.html" class="btn btn-neutral float-right" title="Distillation with HuBERT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>
--- a/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html
+++ b/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html
@ -55,6 +55,7 @@
 <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
 <li class="toctree-l4 current"><a class="current reference internal" href="#">Zipformer MMI</a></li>
 <li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
+<li class="toctree-l4"><a class="reference internal" href="distillation.html">Distillation with HuBERT</a></li>
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
--- a/recipes/Non-streaming-ASR/timit/index.html
+++ b/recipes/Non-streaming-ASR/timit/index.html
@ -21,7 +21,7 @@
    <link rel="index" title="Index" href="../../../genindex.html" />
    <link rel="search" title="Search" href="../../../search.html" />
    <link rel="next" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" />
-    <link rel="prev" title="Zipformer CTC Blank Skip" href="../librispeech/zipformer_ctc_blankskip.html" /> 
+    <link rel="prev" title="Distillation with HuBERT" href="../librispeech/distillation.html" /> 
 </head>

 <body class="wy-body-for-nav"> 
@ -107,7 +107,7 @@
           </div>
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
-        <a href="../librispeech/zipformer_ctc_blankskip.html" class="btn btn-neutral float-left" title="Zipformer CTC Blank Skip" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="../librispeech/distillation.html" class="btn btn-neutral float-left" title="Distillation with HuBERT" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="tdnn_ligru_ctc.html" class="btn btn-neutral float-right" title="TDNN-LiGRU-CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

--- a/recipes/Non-streaming-ASR/yesno/tdnn.html
+++ b/recipes/Non-streaming-ASR/yesno/tdnn.html
@ -286,7 +286,7 @@ the following screenshot:</p>
 <div><figure class="align-center" id="id1">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/"><img alt="TensorBoard screenshot" src="../../../_images/tdnn-tensorboard-log.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 6 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 8 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html
+++ b/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html
@ -380,7 +380,7 @@ the following screenshot:</p>
 <div><figure class="align-center" id="id3">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-lstm-transducer-tensorboard-log.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 8 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 10 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html
+++ b/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html
@ -400,7 +400,7 @@ the following screenshot:</p>
 <div><figure class="align-center" id="id7">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/97VKXf80Ru61CnP2ALWZZg/"><img alt="TensorBoard screenshot" src="../../../_images/streaming-librispeech-pruned-transducer-tensorboard-log.jpg" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 7 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id7" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 9 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id7" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/searchindex.js
+++ b/searchindex.js