mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-09-18 21:44:18 +00:00
update results
This commit is contained in:
parent
dd393b5d72
commit
b354368685
@ -1998,6 +1998,118 @@ avg=11
|
||||
You can find the tensorboard log at: <https://tensorboard.dev/experiment/D7NQc3xqTpyVmWi5FnWjrA>
|
||||
|
||||
|
||||
|
||||
### LibriSpeech BPE training results (Conformer-CTC 2)
|
||||
|
||||
[conformer_ctc2](./conformer_ctc2)
|
||||
|
||||
#### 2022-07-21
|
||||
|
||||
It implements a 'reworked' version of CTC attention model.
|
||||
As demenstrated by pruned_transducer_stateless2, reworked Conformer model has superior performance compared to the original Conformer.
|
||||
So in this modified version of CTC attention model, it has the reworked Conformer as the encoder and the reworked Transformer as the decoder.
|
||||
conformer_ctc2 also integrates with the idea of the 'averaging models' in pruned_transducer_stateless4.
|
||||
|
||||
The WERs on comparisons with a baseline model, for the librispeech test dataset, are listed as below.
|
||||
|
||||
The baseline model is the original conformer CTC attention model trained with icefall/egs/librispeech/ASR/conformer_ctc.
|
||||
The model is downloaded from <https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09>.
|
||||
This model has 12 layers of Conformer encoder layers and 6 Transformer decoder layers.
|
||||
Number of model parameters is 109,226,120.
|
||||
It has been trained with 90 epochs with full Librispeech dataset.
|
||||
|
||||
For this reworked CTC attention model, it has 12 layers of reworked Conformer encoder layers and 6 reworked Transformer decoder layers.
|
||||
Number of model parameters is 103,071,035.
|
||||
With full Librispeech data set, it was trained for **only** 30 epochs because the reworked model would converge much faster.
|
||||
Please refer to <https://tensorboard.dev/experiment/GR1s6VrJRTW5rtB50jakew/#scalars> to see the loss convergence curve.
|
||||
Please find the above trained model at <https://huggingface.co/WayneWiser/icefall-asr-librispeech-conformer-ctc2-jit-bpe-500-2022-07-21> in huggingface.
|
||||
|
||||
The decoding configuration for the reworked model is --epoch 30, --avg 8, --use-averaged-model True, which is the best after searching.
|
||||
|
||||
| WER | reworked ctc attention | with --epoch 30 --avg 8 --use-averaged-model True | | ctc attention| with --epoch 77 --avg 55 | |
|
||||
|------------------------|-------|------|------|------|------|-----|
|
||||
| test sets | test-clean | test-other | Avg | test-clean | test-other | Avg |
|
||||
| ctc-greedy-search | 2.98% | 7.14%| 5.06%| 2.90%| 7.47%| 5.19%|
|
||||
| ctc-decoding | 2.98% | 7.14%| 5.06%| 2.90%| 7.47%| 5.19%|
|
||||
| 1best | 2.93% | 6.37%| 4.65%| 2.70%| 6.49%| 4.60%|
|
||||
| nbest | 2.94% | 6.39%| 4.67%| 2.70%| 6.48%| 4.59%|
|
||||
| nbest-rescoring | 2.68% | 5.77%| 4.23%| 2.55%| 6.07%| 4.31%|
|
||||
| whole-lattice-rescoring| 2.66% | 5.76%| 4.21%| 2.56%| 6.04%| 4.30%|
|
||||
| attention-decoder | 2.59% | 5.54%| 4.07%| 2.41%| 5.77%| 4.09%|
|
||||
| nbest-oracle | 1.53% | 3.47%| 2.50%| 1.69%| 4.02%| 2.86%|
|
||||
|
||||
|
||||
|
||||
conformer_ctc2 also implemented the CTC greedy search decoding, it has the indentical WERs with the CTC-decoding method.
|
||||
For other decoding method, the average WER of the two test sets with the two model is similar.
|
||||
Except for the 1best and nbest method, the overall performance of reworked model is better than the baseline model.
|
||||
|
||||
|
||||
To reproduce the above result, use the following commands for training:
|
||||
|
||||
The training commands are
|
||||
|
||||
``bash
|
||||
WORLD_SIZE=8
|
||||
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
|
||||
./conformer_ctc2/train.py \
|
||||
--manifest-dir data/fbank \
|
||||
--exp-dir conformer_ctc2/exp \
|
||||
--full-libri 1 \
|
||||
--spec-aug-time-warp-factor 80 \
|
||||
--max-duration 300 \
|
||||
--world-size ${WORLD_SIZE} \
|
||||
--start-epoch 1 \
|
||||
--num-epochs 30 \
|
||||
--att-rate 0.7 \
|
||||
--num-decoder-layers 6
|
||||
```
|
||||
|
||||
|
||||
And the following commands are for decoding:
|
||||
|
||||
``bash
|
||||
|
||||
|
||||
for method in ctc-greedy-search ctc-decoding 1best nbest-oracle; do
|
||||
python3 ./conformer_ctc2/decode.py \
|
||||
--exp-dir conformer_ctc2/exp \
|
||||
--use-averaged-model True --epoch 30 --avg 8 --max-duration 200 --method $method
|
||||
done
|
||||
|
||||
for method in nbest nbest-rescoring whole-lattice-rescoring attention-decoder ; do
|
||||
python3 ./conformer_ctc2/decode.py \
|
||||
--exp-dir conformer_ctc2/exp \
|
||||
--use-averaged-model True --epoch 30 --avg 8 --max-duration 20 --method $method
|
||||
done
|
||||
|
||||
rnn_dir=$(git rev-parse --show-toplevel)/icefall/rnn_lm
|
||||
./conformer_ctc2/decode.py \
|
||||
--exp-dir conformer_ctc2/exp \
|
||||
--lang-dir data/lang_bpe_500 \
|
||||
--lm-dir data/lm \
|
||||
--max-duration 30 \
|
||||
--concatenate-cuts 0 \
|
||||
--bucketing-sampler 1 \
|
||||
--num-paths 1000 \
|
||||
--use-averaged-model True \
|
||||
--epoch 30 \
|
||||
--avg 8 \
|
||||
--nbest-scale 0.5 \
|
||||
--rnn-lm-exp-dir ${rnn_dir}/exp \
|
||||
--rnn-lm-epoch 29 \
|
||||
--rnn-lm-avg 3 \
|
||||
--rnn-lm-embedding-dim 2048 \
|
||||
--rnn-lm-hidden-dim 2048 \
|
||||
--rnn-lm-num-layers 3 \
|
||||
--rnn-lm-tie-weights true \
|
||||
--method rnn-lm
|
||||
```
|
||||
|
||||
You can find the RNN-LM pre-trained model at
|
||||
<https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm/tree/main>
|
||||
|
||||
|
||||
### LibriSpeech BPE training results (Conformer-CTC)
|
||||
|
||||
#### 2021-11-09
|
||||
|
Loading…
x
Reference in New Issue
Block a user