From 717f98cc83bfa544fc29802d567b630255519496 Mon Sep 17 00:00:00 2001 From: csukuangfj Date: Wed, 16 Aug 2023 15:06:23 +0000 Subject: [PATCH] deploy: fc2df07841b3edbd7bffddfcc2e016515aa75247 --- _sources/for-dummies/data-preparation.rst.txt | 180 ++++++++ _sources/for-dummies/decoding.rst.txt | 39 ++ .../for-dummies/environment-setup.rst.txt | 121 ++++++ _sources/for-dummies/index.rst.txt | 34 ++ _sources/for-dummies/model-export.rst.txt | 310 +++++++++++++ _sources/for-dummies/training.rst.txt | 39 ++ _sources/index.rst.txt | 1 + contributing/code-style.html | 1 + contributing/doc.html | 1 + contributing/how-to-create-a-recipe.html | 1 + contributing/index.html | 1 + decoding-with-langugage-models/LODR.html | 1 + decoding-with-langugage-models/index.html | 1 + decoding-with-langugage-models/rescoring.html | 1 + .../shallow-fusion.html | 1 + docker/index.html | 1 + docker/intro.html | 1 + faqs.html | 1 + for-dummies/data-preparation.html | 299 +++++++++++++ for-dummies/decoding.html | 163 +++++++ for-dummies/environment-setup.html | 238 ++++++++++ for-dummies/index.html | 181 ++++++++ for-dummies/model-export.html | 410 ++++++++++++++++++ for-dummies/training.html | 159 +++++++ genindex.html | 1 + huggingface/index.html | 1 + huggingface/pretrained-models.html | 1 + huggingface/spaces.html | 1 + index.html | 13 +- installation/index.html | 5 +- model-export/export-model-state-dict.html | 1 + model-export/export-ncnn-conv-emformer.html | 1 + model-export/export-ncnn-lstm.html | 1 + model-export/export-ncnn-zipformer.html | 1 + model-export/export-ncnn.html | 1 + model-export/export-onnx.html | 1 + .../export-with-torch-jit-script.html | 1 + model-export/export-with-torch-jit-trace.html | 1 + model-export/index.html | 1 + objects.inv | Bin 1682 -> 1860 bytes .../aishell/conformer_ctc.html | 1 + recipes/Non-streaming-ASR/aishell/index.html | 1 + .../aishell/stateless_transducer.html | 1 + .../aishell/tdnn_lstm_ctc.html | 1 + recipes/Non-streaming-ASR/index.html | 1 + .../librispeech/conformer_ctc.html | 5 +- .../librispeech/distillation.html | 9 +- .../Non-streaming-ASR/librispeech/index.html | 1 + .../pruned_transducer_stateless.html | 5 +- .../librispeech/tdnn_lstm_ctc.html | 1 + .../librispeech/zipformer_ctc_blankskip.html | 1 + .../librispeech/zipformer_mmi.html | 1 + recipes/Non-streaming-ASR/timit/index.html | 1 + .../timit/tdnn_ligru_ctc.html | 1 + .../timit/tdnn_lstm_ctc.html | 1 + recipes/Non-streaming-ASR/yesno/index.html | 1 + recipes/Non-streaming-ASR/yesno/tdnn.html | 1 + recipes/Streaming-ASR/index.html | 1 + recipes/Streaming-ASR/introduction.html | 1 + recipes/Streaming-ASR/librispeech/index.html | 1 + .../lstm_pruned_stateless_transducer.html | 5 +- .../pruned_transducer_stateless.html | 5 +- .../librispeech/zipformer_transducer.html | 1 + recipes/index.html | 1 + search.html | 1 + searchindex.js | 2 +- 66 files changed, 2250 insertions(+), 17 deletions(-) create mode 100644 _sources/for-dummies/data-preparation.rst.txt create mode 100644 _sources/for-dummies/decoding.rst.txt create mode 100644 _sources/for-dummies/environment-setup.rst.txt create mode 100644 _sources/for-dummies/index.rst.txt create mode 100644 _sources/for-dummies/model-export.rst.txt create mode 100644 _sources/for-dummies/training.rst.txt create mode 100644 for-dummies/data-preparation.html create mode 100644 for-dummies/decoding.html create mode 100644 for-dummies/environment-setup.html create mode 100644 for-dummies/index.html create mode 100644 for-dummies/model-export.html create mode 100644 for-dummies/training.html diff --git a/_sources/for-dummies/data-preparation.rst.txt b/_sources/for-dummies/data-preparation.rst.txt new file mode 100644 index 000000000..f03d44e79 --- /dev/null +++ b/_sources/for-dummies/data-preparation.rst.txt @@ -0,0 +1,180 @@ +.. _dummies_tutorial_data_preparation: + +Data Preparation +================ + +After :ref:`dummies_tutorial_environment_setup`, we can start preparing the +data for training and decoding. + +The first step is to prepare the data for training. We have already provided +`prepare.sh `_ +that would prepare everything required for training. + +.. code-block:: + + cd /tmp/icefall + export PYTHONPATH=/tmp/icefall:$PYTHONPATH + cd egs/yesno/ASR + + ./prepare.sh + +Note that in each recipe from `icefall`_, there exists a file ``prepare.sh``, +which you should run before you run anything else. + +That is all you need for data preparation. + +For the more curious +-------------------- + +If you are wondering how to prepare your own dataset, please refer to the following +URLs for more details: + + - ``_ + + It contains recipes for a variety of dataset. If you want to add your own + dataset, please read recipes in this folder first. + + - ``_ + + The `yesno`_ recipe in `lhotse`_. + +If you already have a `Kaldi`_ dataset directory, which contains files like +``wav.scp``, ``feats.scp``, then you can refer to ``_. + +A quick look to the generated files +----------------------------------- + +``./prepare.sh`` puts generated files into two directories: + + - ``download`` + - ``data`` + +download +^^^^^^^^ + +The ``download`` directory contains downloaded dataset files: + +.. code-block:: bas + + tree -L 1 ./download/ + + ./download/ + |-- waves_yesno + `-- waves_yesno.tar.gz + +.. hint:: + + Please refer to ``_ + for how the data is downloaded and extracted. + +data +^^^^ + +.. code-block:: bash + + tree ./data/ + + ./data/ + |-- fbank + | |-- yesno_cuts_test.jsonl.gz + | |-- yesno_cuts_train.jsonl.gz + | |-- yesno_feats_test.lca + | `-- yesno_feats_train.lca + |-- lang_phone + | |-- HLG.pt + | |-- L.pt + | |-- L_disambig.pt + | |-- Linv.pt + | |-- lexicon.txt + | |-- lexicon_disambig.txt + | |-- tokens.txt + | `-- words.txt + |-- lm + | |-- G.arpa + | `-- G.fst.txt + `-- manifests + |-- yesno_recordings_test.jsonl.gz + |-- yesno_recordings_train.jsonl.gz + |-- yesno_supervisions_test.jsonl.gz + `-- yesno_supervisions_train.jsonl.gz + + 4 directories, 18 files + +**data/manifests**: + + This directory contains manifests. They are used to generate files in + ``data/fbank``. + + To give you an idea of what it contains, we examine the first few lines of + the manifests related to the ``train`` dataset. + + .. code-block:: bash + + cd data/manifests + gunzip -c yesno_recordings_train.jsonl.gz | head -n 3 + + The output is given below: + + .. code-block:: bash + + {"id": "0_0_0_0_1_1_1_1", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_0_0_1_1_1_1.wav"}], "sampling_rate": 8000, "num_samples": 50800, "duration": 6.35, "channel_ids": [0]} + {"id": "0_0_0_1_0_1_1_0", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_0_1_0_1_1_0.wav"}], "sampling_rate": 8000, "num_samples": 48880, "duration": 6.11, "channel_ids": [0]} + {"id": "0_0_1_0_0_1_1_0", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_1_0_0_1_1_0.wav"}], "sampling_rate": 8000, "num_samples": 48160, "duration": 6.02, "channel_ids": [0]} + + Please refer to ``_ + for the meaning of each field per line. + + .. code-block:: bash + + gunzip -c yesno_supervisions_train.jsonl.gz | head -n 3 + + The output is given below: + + .. code-block:: bash + + {"id": "0_0_0_0_1_1_1_1", "recording_id": "0_0_0_0_1_1_1_1", "start": 0.0, "duration": 6.35, "channel": 0, "text": "NO NO NO NO YES YES YES YES", "language": "Hebrew"} + {"id": "0_0_0_1_0_1_1_0", "recording_id": "0_0_0_1_0_1_1_0", "start": 0.0, "duration": 6.11, "channel": 0, "text": "NO NO NO YES NO YES YES NO", "language": "Hebrew"} + {"id": "0_0_1_0_0_1_1_0", "recording_id": "0_0_1_0_0_1_1_0", "start": 0.0, "duration": 6.02, "channel": 0, "text": "NO NO YES NO NO YES YES NO", "language": "Hebrew"} + + Please refer to ``_ + for the meaning of each field per line. + +**data/fbank**: + + This directory contains everything from ``data/manifests``. Furthermore, it also contains features + for training. + + ``data/fbank/yesno_feats_train.lca`` contains the features for the train dataset. + Features are compressed using `lilcom`_. + + ``data/fbank/yesno_cuts_train.jsonl.gz`` stores the `CutSet `_, + which stores `RecordingSet `_, + `SupervisionSet `_, + and `FeatureSet `_. + + To give you an idea about what it looks like, we can run the following command: + + .. code-block:: bash + + cd data/fbank + + gunzip -c yesno_cuts_train.jsonl.gz | head -n 3 + + The output is given below: + + .. code-block:: bash + + {"id": "0_0_0_0_1_1_1_1-0", "start": 0, "duration": 6.35, "channel": 0, "supervisions": [{"id": "0_0_0_0_1_1_1_1", "recording_id": "0_0_0_0_1_1_1_1", "start": 0.0, "duration": 6.35, "channel": 0, "text": "NO NO NO NO YES YES YES YES", "language": "Hebrew"}], "features": {"type": "kaldi-fbank", "num_frames": 635, "num_features": 23, "frame_shift": 0.01, "sampling_rate": 8000, "start": 0, "duration": 6.35, "storage_type": "lilcom_chunky", "storage_path": "data/fbank/yesno_feats_train.lca", "storage_key": "0,13000,3570", "channels": 0}, "recording": {"id": "0_0_0_0_1_1_1_1", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_0_0_1_1_1_1.wav"}], "sampling_rate": 8000, "num_samples": 50800, "duration": 6.35, "channel_ids": [0]}, "type": "MonoCut"} + {"id": "0_0_0_1_0_1_1_0-1", "start": 0, "duration": 6.11, "channel": 0, "supervisions": [{"id": "0_0_0_1_0_1_1_0", "recording_id": "0_0_0_1_0_1_1_0", "start": 0.0, "duration": 6.11, "channel": 0, "text": "NO NO NO YES NO YES YES NO", "language": "Hebrew"}], "features": {"type": "kaldi-fbank", "num_frames": 611, "num_features": 23, "frame_shift": 0.01, "sampling_rate": 8000, "start": 0, "duration": 6.11, "storage_type": "lilcom_chunky", "storage_path": "data/fbank/yesno_feats_train.lca", "storage_key": "16570,12964,2929", "channels": 0}, "recording": {"id": "0_0_0_1_0_1_1_0", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_0_1_0_1_1_0.wav"}], "sampling_rate": 8000, "num_samples": 48880, "duration": 6.11, "channel_ids": [0]}, "type": "MonoCut"} + {"id": "0_0_1_0_0_1_1_0-2", "start": 0, "duration": 6.02, "channel": 0, "supervisions": [{"id": "0_0_1_0_0_1_1_0", "recording_id": "0_0_1_0_0_1_1_0", "start": 0.0, "duration": 6.02, "channel": 0, "text": "NO NO YES NO NO YES YES NO", "language": "Hebrew"}], "features": {"type": "kaldi-fbank", "num_frames": 602, "num_features": 23, "frame_shift": 0.01, "sampling_rate": 8000, "start": 0, "duration": 6.02, "storage_type": "lilcom_chunky", "storage_path": "data/fbank/yesno_feats_train.lca", "storage_key": "32463,12936,2696", "channels": 0}, "recording": {"id": "0_0_1_0_0_1_1_0", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_1_0_0_1_1_0.wav"}], "sampling_rate": 8000, "num_samples": 48160, "duration": 6.02, "channel_ids": [0]}, "type": "MonoCut"} + + Note that ``yesno_cuts_train.jsonl.gz`` only stores the information about how to read the features. + The actual features are stored separately in ``data/fbank/yesno_feats_train.lca``. + +**data/lang**: + + This directory contains the lexicon. + +**data/lm**: + + This directory contains language models. diff --git a/_sources/for-dummies/decoding.rst.txt b/_sources/for-dummies/decoding.rst.txt new file mode 100644 index 000000000..3e48e8bfd --- /dev/null +++ b/_sources/for-dummies/decoding.rst.txt @@ -0,0 +1,39 @@ +.. _dummies_tutorial_decoding: + +Decoding +======== + +After :ref:`dummies_tutorial_training`, we can start decoding. + +The command to start the decoding is quite simple: + +.. code-block:: bash + + cd /tmp/icefall + export PYTHONPATH=/tmp/icefall:$PYTHONPATH + cd egs/yesno/ASR + + # We use CPU for decoding by setting the following environment variable + export CUDA_VISIBLE_DEVICES="" + + ./tdnn/decode.py + +The output logs are given below: + +.. literalinclude:: ./code/decoding-yesno.txt + +For the more curious +-------------------- + +.. code-block:: bash + + ./tdnn/decode.py --help + +will print the usage information about ``./tdnn/decode.py``. For instance, you +can specify: + + - ``--epoch`` to use which checkpoint for decoding + - ``--avg`` to select how many checkpoints to use for model averaging + +You usually try different combinations of ``--epoch`` and ``--avg`` and select +one that leads to the lowest WER (`Word Error Rate `_). diff --git a/_sources/for-dummies/environment-setup.rst.txt b/_sources/for-dummies/environment-setup.rst.txt new file mode 100644 index 000000000..0cb8ecc1d --- /dev/null +++ b/_sources/for-dummies/environment-setup.rst.txt @@ -0,0 +1,121 @@ +.. _dummies_tutorial_environment_setup: + +Environment setup +================= + +We will create an environment for `Next-gen Kaldi`_ that runs on ``CPU`` +in this tutorial. + +.. note:: + + Since the `yesno`_ dataset used in this tutorial is very tiny, training on + ``CPU`` works very well for it. + + If your dataset is very large, e.g., hundreds or thousands of hours of + training data, please follow :ref:`install icefall` to install `icefall`_ + that works with ``GPU``. + + +Create a virtual environment +---------------------------- + +.. code-block:: bash + + virtualenv -p python3 /tmp/icefall_env + +The above command creates a virtual environment in the directory ``/tmp/icefall_env``. +You can select any directory you want. + +The output of the above command is given below: + +.. code-block:: bash + + Already using interpreter /usr/bin/python3 + Using base prefix '/usr' + New python executable in /tmp/icefall_env/bin/python3 + Also creating executable in /tmp/icefall_env/bin/python + Installing setuptools, pkg_resources, pip, wheel...done. + +Now we can activate the environment using: + +.. code-block:: bash + + source /tmp/icefall_env/bin/activate + +Install dependencies +-------------------- + +.. warning:: + + Remeber to activate your virtual environment before you continue! + +After activating the virtual environment, we can use the following command +to install dependencies of `icefall`_: + +.. hint:: + + Remeber that we will run this tutorial on ``CPU``, so we install + dependencies required only by running on ``CPU``. + +.. code-block:: bash + + # Caution: Installation order matters! + + # We use torch 2.0.0 and torchaduio 2.0.0 in this tutorial. + # Other versions should also work. + + pip install torch==2.0.0+cpu torchaudio==2.0.0+cpu -f https://download.pytorch.org/whl/torch_stable.html + + # If you are using macOS or Windows, please use the following command to install torch and torchaudio + # pip install torch==2.0.0 torchaudio==2.0.0 -f https://download.pytorch.org/whl/torch_stable.html + + # Now install k2 + # Please refer to https://k2-fsa.github.io/k2/installation/from_wheels.html#linux-cpu-example + + pip install k2==1.24.3.dev20230726+cpu.torch2.0.0 -f https://k2-fsa.github.io/k2/cpu.html + + # Install the latest version of lhotse + + pip install git+https://github.com/lhotse-speech/lhotse + + +Install icefall +--------------- + +We will put the source code of `icefall`_ into the directory ``/tmp`` +You can select any directory you want. + +.. code-block:: bash + + cd /tmp + git clone https://github.com/k2-fsa/icefall + cd icefall + pip install -r ./requirements.txt + +.. code-block:: bash + + # Anytime we want to use icefall, we have to set the following + # environment variable + + export PYTHONPATH=/tmp/icefall:$PYTHONPATH + +.. hint:: + + If you get the following error during this tutorial: + + .. code-block:: bash + + ModuleNotFoundError: No module named 'icefall' + + please set the above environment variable to fix it. + + +Congratulations! You have installed `icefall`_ successfully. + +For the more curious +-------------------- + +`icefall`_ contains a collection of Python scripts and you don't need to +use ``python3 setup.py install`` or ``pip install icefall`` to install it. +All you need to do is to download the code and set the environment variable +``PYTHONPATH``. diff --git a/_sources/for-dummies/index.rst.txt b/_sources/for-dummies/index.rst.txt new file mode 100644 index 000000000..7c0a3d8ee --- /dev/null +++ b/_sources/for-dummies/index.rst.txt @@ -0,0 +1,34 @@ +Icefall for dummies tutorial +============================ + +This tutorial walks you step by step about how to create a simple +ASR (`Automatic Speech Recognition `_) +system with `Next-gen Kaldi`_. + +We use the `yesno`_ dataset for demonstration. We select it out of two reasons: + + - It is quite tiny, containing only about 12 minutes of data + - The training can be finished within 20 seconds on ``CPU``. + +That also means you don't need a ``GPU`` to run this tutorial. + +Let's get started! + +Please follow items below **sequentially**. + +.. note:: + + The :ref:`dummies_tutorial_data_preparation` runs only on Linux and on macOS. + All other parts run on Linux, macOS, and Windows. + + Help from the community is appreciated to port the :ref:`dummies_tutorial_data_preparation` + to Windows. + +.. toctree:: + :maxdepth: 2 + + ./environment-setup.rst + ./data-preparation.rst + ./training.rst + ./decoding.rst + ./model-export.rst diff --git a/_sources/for-dummies/model-export.rst.txt b/_sources/for-dummies/model-export.rst.txt new file mode 100644 index 000000000..079ebc712 --- /dev/null +++ b/_sources/for-dummies/model-export.rst.txt @@ -0,0 +1,310 @@ +Model Export +============ + +There are three ways to export a pre-trained model. + + - Export the model parameters via `model.state_dict() `_ + - Export via `torchscript `_: either `torch.jit.script() `_ or `torch.jit.trace() `_ + - Export to `ONNX`_ via `torch.onnx.export() `_ + +Each method is explained below in detail. + +Export the model parameters via model.state_dict() +--------------------------------------------------- + +The command for this kind of export is + +.. code-block:: bash + + cd /tmp/icefall + export PYTHONPATH=/tmp/icefall:$PYTHONPATH + cd egs/yesno/ASR + + # assume that "--epoch 14 --avg 2" produces the lowest WER. + + ./tdnn/export.py --epoch 14 --avg 2 + +The output logs are given below: + +.. code-block:: bash + + 2023-08-16 20:42:03,912 INFO [export.py:76] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'epoch': 14, 'avg': 2, 'jit': False} + 2023-08-16 20:42:03,913 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt + 2023-08-16 20:42:03,950 INFO [export.py:93] averaging ['tdnn/exp/epoch-13.pt', 'tdnn/exp/epoch-14.pt'] + 2023-08-16 20:42:03,971 INFO [export.py:106] Not using torch.jit.script + 2023-08-16 20:42:03,974 INFO [export.py:111] Saved to tdnn/exp/pretrained.pt + +We can see from the logs that the exported model is saved to the file ``tdnn/exp/pretrained.pt``. + +To give you an idea of what ``tdnn/exp/pretrained.pt`` contains, we can use the following command: + +.. code-block:: python3 + + >>> import torch + >>> m = torch.load("tdnn/exp/pretrained.pt") + >>> list(m.keys()) + ['model'] + >>> list(m["model"].keys()) + ['tdnn.0.weight', 'tdnn.0.bias', 'tdnn.2.running_mean', 'tdnn.2.running_var', 'tdnn.2.num_batches_tracked', 'tdnn.3.weight', 'tdnn.3.bias', 'tdnn.5.running_mean', 'tdnn.5.running_var', 'tdnn.5.num_batches_tracked', 'tdnn.6.weight', 'tdnn.6.bias', 'tdnn.8.running_mean', 'tdnn.8.running_var', 'tdnn.8.num_batches_tracked', 'output_linear.weight', 'output_linear.bias'] + +We can use ``tdnn/exp/pretrained.pt`` in the following way with ``./tdnn/decode.py``: + +.. code-block:: bash + + cd tdnn/exp + ln -s pretrained.pt epoch-99.pt + cd ../.. + + ./tdnn/decode.py --epoch 99 --avg 1 + +The output logs of the above command are given below: + +.. code-block:: bash + + 2023-08-16 20:45:48,089 INFO [decode.py:262] Decoding started + 2023-08-16 20:45:48,090 INFO [decode.py:263] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'feature_dim': 23, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 99, 'avg': 1, 'export': False, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': False, 'return_cuts': True, 'num_workers': 2, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': False, 'k2-git-sha1': 'ad79f1c699c684de9785ed6ca5edb805a41f78c3', 'k2-git-date': 'Wed Jul 26 09:30:42 2023', 'lhotse-version': '1.16.0.dev+git.aa073f6.clean', 'torch-version': '2.0.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': '9a47c08-clean', 'icefall-git-date': 'Mon Aug 14 22:10:50 2023', 'icefall-path': '/private/tmp/icefall', 'k2-path': '/private/tmp/icefall_env/lib/python3.11/site-packages/k2/__init__.py', 'lhotse-path': '/private/tmp/icefall_env/lib/python3.11/site-packages/lhotse/__init__.py', 'hostname': 'fangjuns-MacBook-Pro.local', 'IP address': '127.0.0.1'}} + 2023-08-16 20:45:48,092 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt + 2023-08-16 20:45:48,103 INFO [decode.py:272] device: cpu + 2023-08-16 20:45:48,109 INFO [checkpoint.py:112] Loading checkpoint from tdnn/exp/epoch-99.pt + 2023-08-16 20:45:48,115 INFO [asr_datamodule.py:218] About to get test cuts + 2023-08-16 20:45:48,115 INFO [asr_datamodule.py:253] About to get test cuts + 2023-08-16 20:45:50,386 INFO [decode.py:203] batch 0/?, cuts processed until now is 4 + 2023-08-16 20:45:50,556 INFO [decode.py:240] The transcripts are stored in tdnn/exp/recogs-test_set.txt + 2023-08-16 20:45:50,557 INFO [utils.py:564] [test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ] + 2023-08-16 20:45:50,558 INFO [decode.py:248] Wrote detailed error stats to tdnn/exp/errs-test_set.txt + 2023-08-16 20:45:50,559 INFO [decode.py:315] Done! + +We can see that it produces an identical WER as before. + +We can also use it to decode files with the following command: + +.. code-block:: bash + + # ./tdnn/pretrained.py requires kaldifeat + # + # Please refer to https://csukuangfj.github.io/kaldifeat/installation/from_wheels.html + # for how to install kaldifeat + + pip install kaldifeat==1.25.0.dev20230726+cpu.torch2.0.0 -f https://csukuangfj.github.io/kaldifeat/cpu.html + + ./tdnn/pretrained.py \ + --checkpoint ./tdnn/exp/pretrained.pt \ + --HLG ./data/lang_phone/HLG.pt \ + --words-file ./data/lang_phone/words.txt \ + download/waves_yesno/0_0_0_1_0_0_0_1.wav \ + download/waves_yesno/0_0_1_0_0_0_1_0.wav + +The output is given below: + +.. code-block:: bash + + 2023-08-16 20:53:19,208 INFO [pretrained.py:136] {'feature_dim': 23, 'num_classes': 4, 'sample_rate': 8000, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'checkpoint': './tdnn/exp/pretrained.pt', 'words_file': './data/lang_phone/words.txt', 'HLG': './data/lang_phone/HLG.pt', 'sound_files': ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']} + 2023-08-16 20:53:19,208 INFO [pretrained.py:142] device: cpu + 2023-08-16 20:53:19,208 INFO [pretrained.py:144] Creating model + 2023-08-16 20:53:19,212 INFO [pretrained.py:156] Loading HLG from ./data/lang_phone/HLG.pt + 2023-08-16 20:53:19,213 INFO [pretrained.py:160] Constructing Fbank computer + 2023-08-16 20:53:19,213 INFO [pretrained.py:170] Reading sound files: ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav'] + 2023-08-16 20:53:19,224 INFO [pretrained.py:176] Decoding started + 2023-08-16 20:53:19,304 INFO [pretrained.py:212] + download/waves_yesno/0_0_0_1_0_0_0_1.wav: + NO NO NO YES NO NO NO YES + + download/waves_yesno/0_0_1_0_0_0_1_0.wav: + NO NO YES NO NO NO YES NO + + + 2023-08-16 20:53:19,304 INFO [pretrained.py:214] Decoding Done + + +Export via torch.jit.script() +----------------------------- + +The command for this kind of export is + +.. code-block:: bash + + cd /tmp/icefall + export PYTHONPATH=/tmp/icefall:$PYTHONPATH + cd egs/yesno/ASR + + # assume that "--epoch 14 --avg 2" produces the lowest WER. + + ./tdnn/export.py --epoch 14 --avg 2 --jit true + +The output logs are given below: + +.. code-block:: bash + + 2023-08-16 20:47:44,666 INFO [export.py:76] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'epoch': 14, 'avg': 2, 'jit': True} + 2023-08-16 20:47:44,667 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt + 2023-08-16 20:47:44,670 INFO [export.py:93] averaging ['tdnn/exp/epoch-13.pt', 'tdnn/exp/epoch-14.pt'] + 2023-08-16 20:47:44,677 INFO [export.py:100] Using torch.jit.script + 2023-08-16 20:47:44,843 INFO [export.py:104] Saved to tdnn/exp/cpu_jit.pt + +From the output logs we can see that the generated file is saved to ``tdnn/exp/cpu_jit.pt``. + +Don't be confused by the name ``cpu_jit.pt``. The ``cpu`` part means the model is moved to +CPU before exporting. That means, when you load it with: + +.. code-block:: bash + + torch.jit.load() + +you don't need to specify the argument `map_location `_ +and it resides on CPU by default. + +To use ``tdnn/exp/cpu_jit.pt`` with `icefall`_ to decode files, we can use: + +.. code-block:: bash + + # ./tdnn/jit_pretrained.py requires kaldifeat + # + # Please refer to https://csukuangfj.github.io/kaldifeat/installation/from_wheels.html + # for how to install kaldifeat + + pip install kaldifeat==1.25.0.dev20230726+cpu.torch2.0.0 -f https://csukuangfj.github.io/kaldifeat/cpu.html + + + ./tdnn/jit_pretrained.py \ + --nn-model ./tdnn/exp/cpu_jit.pt \ + --HLG ./data/lang_phone/HLG.pt \ + --words-file ./data/lang_phone/words.txt \ + download/waves_yesno/0_0_0_1_0_0_0_1.wav \ + download/waves_yesno/0_0_1_0_0_0_1_0.wav + +The output is given below: + +.. code-block:: bash + + 2023-08-16 20:56:00,603 INFO [jit_pretrained.py:121] {'feature_dim': 23, 'num_classes': 4, 'sample_rate': 8000, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'nn_model': './tdnn/exp/cpu_jit.pt', 'words_file': './data/lang_phone/words.txt', 'HLG': './data/lang_phone/HLG.pt', 'sound_files': ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']} + 2023-08-16 20:56:00,603 INFO [jit_pretrained.py:127] device: cpu + 2023-08-16 20:56:00,603 INFO [jit_pretrained.py:129] Loading torchscript model + 2023-08-16 20:56:00,640 INFO [jit_pretrained.py:134] Loading HLG from ./data/lang_phone/HLG.pt + 2023-08-16 20:56:00,641 INFO [jit_pretrained.py:138] Constructing Fbank computer + 2023-08-16 20:56:00,641 INFO [jit_pretrained.py:148] Reading sound files: ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav'] + 2023-08-16 20:56:00,642 INFO [jit_pretrained.py:154] Decoding started + 2023-08-16 20:56:00,727 INFO [jit_pretrained.py:190] + download/waves_yesno/0_0_0_1_0_0_0_1.wav: + NO NO NO YES NO NO NO YES + + download/waves_yesno/0_0_1_0_0_0_1_0.wav: + NO NO YES NO NO NO YES NO + + + 2023-08-16 20:56:00,727 INFO [jit_pretrained.py:192] Decoding Done + +.. hint:: + + We provide only code for ``torch.jit.script()``. You can try ``torch.jit.trace()`` + if you want. + +Export via torch.onnx.export() +------------------------------ + +The command for this kind of export is + +.. code-block:: bash + + cd /tmp/icefall + export PYTHONPATH=/tmp/icefall:$PYTHONPATH + cd egs/yesno/ASR + + # tdnn/export_onnx.py requires onnx and onnxruntime + pip install onnx onnxruntime + + # assume that "--epoch 14 --avg 2" produces the lowest WER. + + ./tdnn/export_onnx.py \ + --epoch 14 \ + --avg 2 + +The output logs are given below: + +.. code-block:: bash + + 2023-08-16 20:59:20,888 INFO [export_onnx.py:83] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'epoch': 14, 'avg': 2} + 2023-08-16 20:59:20,888 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt + 2023-08-16 20:59:20,892 INFO [export_onnx.py:100] averaging ['tdnn/exp/epoch-13.pt', 'tdnn/exp/epoch-14.pt'] + ================ Diagnostic Run torch.onnx.export version 2.0.0 ================ + verbose: False, log level: Level.ERROR + ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== + + 2023-08-16 20:59:21,047 INFO [export_onnx.py:127] Saved to tdnn/exp/model-epoch-14-avg-2.onnx + 2023-08-16 20:59:21,047 INFO [export_onnx.py:136] meta_data: {'model_type': 'tdnn', 'version': '1', 'model_author': 'k2-fsa', 'comment': 'non-streaming tdnn for the yesno recipe', 'vocab_size': 4} + 2023-08-16 20:59:21,049 INFO [export_onnx.py:140] Generate int8 quantization models + 2023-08-16 20:59:21,075 INFO [onnx_quantizer.py:538] Quantization parameters for tensor:"/Transpose_1_output_0" not specified + 2023-08-16 20:59:21,081 INFO [export_onnx.py:151] Saved to tdnn/exp/model-epoch-14-avg-2.int8.onnx + +We can see from the logs that it generates two files: + + - ``tdnn/exp/model-epoch-14-avg-2.onnx`` (ONNX model with ``float32`` weights) + - ``tdnn/exp/model-epoch-14-avg-2.int8.onnx`` (ONNX model with ``int8`` weights) + +To use the generated ONNX model files for decoding with `onnxruntime`_, we can use + +.. code-block:: bash + + # ./tdnn/onnx_pretrained.py requires kaldifeat + # + # Please refer to https://csukuangfj.github.io/kaldifeat/installation/from_wheels.html + # for how to install kaldifeat + + pip install kaldifeat==1.25.0.dev20230726+cpu.torch2.0.0 -f https://csukuangfj.github.io/kaldifeat/cpu.html + + ./tdnn/onnx_pretrained.py \ + --nn-model ./tdnn/exp/model-epoch-14-avg-2.onnx \ + --HLG ./data/lang_phone/HLG.pt \ + --words-file ./data/lang_phone/words.txt \ + download/waves_yesno/0_0_0_1_0_0_0_1.wav \ + download/waves_yesno/0_0_1_0_0_0_1_0.wav + +The output is given below: + +.. code-block:: bash + + 2023-08-16 21:03:24,260 INFO [onnx_pretrained.py:166] {'feature_dim': 23, 'sample_rate': 8000, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'nn_model': './tdnn/exp/model-epoch-14-avg-2.onnx', 'words_file': './data/lang_phone/words.txt', 'HLG': './data/lang_phone/HLG.pt', 'sound_files': ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']} + 2023-08-16 21:03:24,260 INFO [onnx_pretrained.py:171] device: cpu + 2023-08-16 21:03:24,260 INFO [onnx_pretrained.py:173] Loading onnx model ./tdnn/exp/model-epoch-14-avg-2.onnx + 2023-08-16 21:03:24,267 INFO [onnx_pretrained.py:176] Loading HLG from ./data/lang_phone/HLG.pt + 2023-08-16 21:03:24,270 INFO [onnx_pretrained.py:180] Constructing Fbank computer + 2023-08-16 21:03:24,273 INFO [onnx_pretrained.py:190] Reading sound files: ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav'] + 2023-08-16 21:03:24,279 INFO [onnx_pretrained.py:196] Decoding started + 2023-08-16 21:03:24,318 INFO [onnx_pretrained.py:232] + download/waves_yesno/0_0_0_1_0_0_0_1.wav: + NO NO NO YES NO NO NO YES + + download/waves_yesno/0_0_1_0_0_0_1_0.wav: + NO NO YES NO NO NO YES NO + + + 2023-08-16 21:03:24,318 INFO [onnx_pretrained.py:234] Decoding Done + +.. note:: + + To use the ``int8`` ONNX model for decoding, please use: + + .. code-block:: bash + + ./tdnn/onnx_pretrained.py \ + --nn-model ./tdnn/exp/model-epoch-14-avg-2.onnx \ + --HLG ./data/lang_phone/HLG.pt \ + --words-file ./data/lang_phone/words.txt \ + download/waves_yesno/0_0_0_1_0_0_0_1.wav \ + download/waves_yesno/0_0_1_0_0_0_1_0.wav + +For the more curious +-------------------- + +If you are wondering how to deploy the model without ``torch``, please +continue reading. We will show how to use `sherpa-onnx`_ to run the +exported ONNX models, which depends only on `onnxruntime`_ and does not +depend on ``torch``. + +In this tutorial, we will only demonstrate the usage of `sherpa-onnx`_ with the +pre-trained model of the `yesno`_ recipe. There are also other two frameworks +available: + + - `sherpa`_. It works with torchscript models. + - `sherpa-ncnn`_. It works with models exported using :ref:`icefall_export_to_ncnn` with `ncnn`_ + +Please see ``_ for further details. diff --git a/_sources/for-dummies/training.rst.txt b/_sources/for-dummies/training.rst.txt new file mode 100644 index 000000000..816ef2d3b --- /dev/null +++ b/_sources/for-dummies/training.rst.txt @@ -0,0 +1,39 @@ +.. _dummies_tutorial_training: + +Training +======== + +After :ref:`dummies_tutorial_data_preparation`, we can start training. + +The command to start the training is quite simple: + +.. code-block:: bash + + cd /tmp/icefall + export PYTHONPATH=/tmp/icefall:$PYTHONPATH + cd egs/yesno/ASR + + # We use CPU for training by setting the following environment variable + export CUDA_VISIBLE_DEVICES="" + + ./tdnn/train.py + +That's it! + +You can find the training logs below: + +.. literalinclude:: ./code/train-yesno.txt + +For the more curious +-------------------- + +.. code-block:: bash + + ./tdnn/train.py --help + +will print the usage information about ``./tdnn/train.py``. For instance, you +can specify the number of epochs to train and the location to save the training +results. + +The training text logs are saved in ``tdnn/exp/log`` while the tensorboard +logs are in ``tdnn/exp/tensorboard``. diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt index 0fa8fdd1c..fb539d3f2 100644 --- a/_sources/index.rst.txt +++ b/_sources/index.rst.txt @@ -20,6 +20,7 @@ speech recognition recipes using `k2 `_. :maxdepth: 2 :caption: Contents: + for-dummies/index.rst installation/index docker/index faqs diff --git a/contributing/code-style.html b/contributing/code-style.html index 603669e06..a0ce2b2dd 100644 --- a/contributing/code-style.html +++ b/contributing/code-style.html @@ -44,6 +44,7 @@
diff --git a/installation/index.html b/installation/index.html index a2dd8cd3b..05a5f1e8f 100644 --- a/installation/index.html +++ b/installation/index.html @@ -20,7 +20,7 @@ - + @@ -44,6 +44,7 @@ diff --git a/model-export/export-model-state-dict.html b/model-export/export-model-state-dict.html index b47f85b9f..48b763f97 100644 --- a/model-export/export-model-state-dict.html +++ b/model-export/export-model-state-dict.html @@ -44,6 +44,7 @@

Please see the following screenshot for the output of an example execution.

-
+
Downloading codebook indexes and preparing training manifest.
-

Fig. 6 Downloading codebook indexes and preparing training manifest.

+

Fig. 6 Downloading codebook indexes and preparing training manifest.

@@ -245,10 +246,10 @@ set use_extracted_c num_codebooks by yourself.

Now, you should see the following files under the directory ./data/vq_fbank_layer36_cb8.

-
+
MVQ-augmented training manifests
-

Fig. 7 MVQ-augmented training manifests.

+

Fig. 7 MVQ-augmented training manifests.

Whola! You are ready to perform knowledge distillation training now!

diff --git a/recipes/Non-streaming-ASR/librispeech/index.html b/recipes/Non-streaming-ASR/librispeech/index.html index fdf26a2a0..0799c43f6 100644 --- a/recipes/Non-streaming-ASR/librispeech/index.html +++ b/recipes/Non-streaming-ASR/librispeech/index.html @@ -44,6 +44,7 @@