diff --git a/README.md b/README.md index 140d07645..e5a6bc627 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,7 @@ We provide three recipes at present: - [yesno][yesno] - [LibriSpeech][librispeech] + - [Aishell][aishell] - [TIMIT][timit] ### yesno @@ -57,6 +58,31 @@ The WER for this model is: We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kNmDXNMwREi0rZGAOIAOJo93REBuOTcd?usp=sharing) +### Aishell + +We provide two models for this recipe: [conformer CTC model][Aishell_conformer_ctc] +and [TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc]. + +#### Conformer CTC Model + +The best CER we currently have is: + +| | test | +|-----|------| +| CER | 4.26 | + + +We provide a Colab notebook to run a pre-trained conformer CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WnG17io5HEZ0Gn_cnh_VzK5QYOoiiklC?usp=sharing) + +#### TDNN LSTM CTC Model + +The CER for this model is: + +| | test | +|-----|-------| +| CER | 10.16 | + +We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1qULaGvXq7PCu_P61oubfz9b53JzY4H3z?usp=sharing) ### TIMIT diff --git a/docs/source/recipes/aishell/conformer_ctc.rst b/docs/source/recipes/aishell/conformer_ctc.rst index 20967780a..c7fd91e99 100644 --- a/docs/source/recipes/aishell/conformer_ctc.rst +++ b/docs/source/recipes/aishell/conformer_ctc.rst @@ -18,7 +18,7 @@ In this tutorial, you will learn: - (1) How to prepare data for training and decoding - (2) How to start the training, either with a single GPU or multiple GPUs - - (3) How to do decoding after training, with 1best and attention decoder rescoring + - (3) How to do decoding after training, with ctc-decoding, 1best and attention decoder rescoring - (4) How to use a pre-trained model, provided by us Data preparation @@ -623,3 +623,125 @@ We do provide a colab notebook for this recipe showing how to use a pre-trained **Congratulations!** You have finished the aishell ASR recipe with conformer CTC models in ``icefall``. + + +If you want to deploy your trained model in C++, please read the following section. + +Deployment with C++ +------------------- + +This section describes how to deploy the pre-trained model in C++, without +Python dependencies. + +.. HINT:: + + At present, it does NOT support streaming decoding. + +First, let us compile k2 from source: + +.. code-block:: bash + + $ cd $HOME + $ git clone https://github.com/k2-fsa/k2 + $ cd k2 + $ git checkout v2.0-pre + +.. CAUTION:: + + You have to switch to the branch ``v2.0-pre``! + +.. code-block:: bash + + $ mkdir build-release + $ cd build-release + $ cmake -DCMAKE_BUILD_TYPE=Release .. + $ make -j hlg_decode + + # You will find four binaries in `./bin`, i.e. ./bin/hlg_decode, + +Now you are ready to go! + +Assume you have run: + + .. code-block:: bash + + $ cd k2/build-release + $ ln -s /path/to/icefall-asr-aishell-conformer-ctc ./ + +To view the usage of ``./bin/hlg_decode``, run: + +.. code-block:: + + $ ./bin/hlg_decode + +It will show you the following message: + +.. code-block:: bash + + Please provide --nn_model + + This file implements decoding with an HLG decoding graph. + + Usage: + ./bin/hlg_decode \ + --use_gpu true \ + --nn_model \ + --hlg \ + --word_table \ + \ + \ + + + To see all possible options, use + ./bin/hlg_decode --help + + Caution: + - Only sound files (*.wav) with single channel are supported. + - It assumes the model is conformer_ctc/transformer.py from icefall. + If you use a different model, you have to change the code + related to `model.forward` in this file. + + +HLG decoding +^^^^^^^^^^^^ + +.. code-block:: bash + + ./bin/hlg_decode \ + --use_gpu true \ + --nn_model icefall_asr_aishell_conformer_ctc/exp/cpu_jit.pt \ + --hlg icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt \ + --word_table icefall_asr_aishell_conformer_ctc/data/lang_char/words.txt \ + icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav \ + icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav \ + icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav + +The output is: + +.. code-block:: + + 2021-11-18 14:48:20.89 [I] k2/torch/bin/hlg_decode.cu:115:int main(int, char**) Device: cpu + 2021-11-18 14:48:20.89 [I] k2/torch/bin/hlg_decode.cu:124:int main(int, char**) Load wave files + 2021-11-18 14:48:20.97 [I] k2/torch/bin/hlg_decode.cu:131:int main(int, char**) Build Fbank computer + 2021-11-18 14:48:20.98 [I] k2/torch/bin/hlg_decode.cu:142:int main(int, char**) Compute features + 2021-11-18 14:48:20.115 [I] k2/torch/bin/hlg_decode.cu:150:int main(int, char**) Load neural network model + 2021-11-18 14:48:20.693 [I] k2/torch/bin/hlg_decode.cu:165:int main(int, char**) Compute nnet_output + 2021-11-18 14:48:23.182 [I] k2/torch/bin/hlg_decode.cu:180:int main(int, char**) Load icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt + 2021-11-18 14:48:33.489 [I] k2/torch/bin/hlg_decode.cu:185:int main(int, char**) Decoding + 2021-11-18 14:48:45.217 [I] k2/torch/bin/hlg_decode.cu:216:int main(int, char**) + Decoding result: + + icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav + 甚至 出现 交易 几乎 停止 的 情况 + + icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav + 一二 线 城市 虽然 也 处于 调整 中 + + icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav + 但 因为 聚集 了 过多 公共 资源 + +There is a Colab notebook showing you how to run a torch scripted model in C++. +Please see |aishell asr conformer ctc torch script colab notebook| + +.. |aishell asr conformer ctc torch script colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg + :target: https://colab.research.google.com/drive/1Vh7RER7saTW01DtNbvr7CY7ovNZgmfWz?usp=sharing diff --git a/egs/librispeech/ASR/conformer_ctc/ali.py b/egs/librispeech/ASR/conformer_ctc/ali.py index ad72a88e7..2b2967506 100755 --- a/egs/librispeech/ASR/conformer_ctc/ali.py +++ b/egs/librispeech/ASR/conformer_ctc/ali.py @@ -28,12 +28,12 @@ from conformer import Conformer from icefall.bpe_graph_compiler import BpeCtcTrainingGraphCompiler from icefall.checkpoint import average_checkpoints, load_checkpoint from icefall.decode import one_best_decoding +from icefall.env import get_env_info from icefall.lexicon import Lexicon from icefall.utils import ( AttributeDict, encode_supervisions, get_alignments, - get_env_info, save_alignments, setup_logger, )