mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 18:12:19 +00:00
* add aishell2 * fix aishell2 * add manifest stats * update prepare char dict * fix lint * setting max duration * lint * change context size to 1 * update result * update hf link * fix decoding comment * add more decoding methods * update result * change context-size 2 default
90 lines
3.5 KiB
Markdown
90 lines
3.5 KiB
Markdown
## Results
|
|
|
|
### Aishell2 char-based training results (Pruned Transducer 5)
|
|
|
|
#### 2022-07-11
|
|
|
|
Using the codes from this commit https://github.com/k2-fsa/icefall/pull/465.
|
|
|
|
When training with context size equals to 1, the WERs are
|
|
|
|
| | dev-ios | test-ios | comment |
|
|
|------------------------------------|-------|----------|----------------------------------|
|
|
| greedy search | 5.57 | 5.89 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| modified beam search (beam size 4) | 5.32 | 5.56 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| fast beam search (set as default) | 5.5 | 5.78 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| fast beam search nbest | 5.46 | 5.74 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| fast beam search oracle | 1.92 | 2.2 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| fast beam search nbest LG | 5.59 | 5.93 | --epoch 25, --avg 5, --max-duration 600 |
|
|
|
|
The training command for reproducing is given below:
|
|
|
|
```bash
|
|
export CUDA_VISIBLE_DEVICES="0,1,2,3"
|
|
|
|
./pruned_transducer_stateless5/train.py \
|
|
--world-size 4 \
|
|
--lang-dir data/lang_char \
|
|
--num-epochs 40 \
|
|
--start-epoch 1 \
|
|
--exp-dir /result \
|
|
--max-duration 300 \
|
|
--use-fp16 0 \
|
|
--num-encoder-layers 24 \
|
|
--dim-feedforward 1536 \
|
|
--nhead 8 \
|
|
--encoder-dim 384 \
|
|
--decoder-dim 512 \
|
|
--joiner-dim 512 \
|
|
--context-size 1
|
|
```
|
|
|
|
The decoding command is:
|
|
```bash
|
|
for method in greedy_search modified_beam_search \
|
|
fast_beam_search fast_beam_search_nbest \
|
|
fast_beam_search_nbest_oracle fast_beam_search_nbest_LG; do
|
|
./pruned_transducer_stateless5/decode.py \
|
|
--epoch 25 \
|
|
--avg 5 \
|
|
--exp-dir ./pruned_transducer_stateless5/exp \
|
|
--max-duration 600 \
|
|
--decoding-method $method \
|
|
--max-sym-per-frame 1 \
|
|
--num-encoder-layers 24 \
|
|
--dim-feedforward 1536 \
|
|
--nhead 8 \
|
|
--encoder-dim 384 \
|
|
--decoder-dim 512 \
|
|
--joiner-dim 512 \
|
|
--context-size 1 \
|
|
--beam 20.0 \
|
|
--max-contexts 8 \
|
|
--max-states 64 \
|
|
--num-paths 200 \
|
|
--nbest-scale 0.5 \
|
|
--context-size 1 \
|
|
--use-averaged-model True
|
|
done
|
|
```
|
|
The tensorboard training log can be found at
|
|
https://tensorboard.dev/experiment/RXyX4QjQQVKjBS2eQ2Qajg/#scalars
|
|
|
|
A pre-trained model and decoding logs can be found at <https://huggingface.co/yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-B-2022-07-12>
|
|
|
|
When training with context size equals to 2, the WERs are
|
|
|
|
| | dev-ios | test-ios | comment |
|
|
|------------------------------------|-------|----------|----------------------------------|
|
|
| greedy search | 5.47 | 5.81 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| modified beam search (beam size 4) | 5.38 | 5.61 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| fast beam search (set as default) | 5.36 | 5.61 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| fast beam search nbest | 5.37 | 5.6 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| fast beam search oracle | 2.04 | 2.2 | --epoch 25, --avg 5, --max-duration 600 |
|
|
| fast beam search nbest LG | 5.59 | 5.82 | --epoch 25, --avg 5, --max-duration 600 |
|
|
|
|
The tensorboard training log can be found at
|
|
https://tensorboard.dev/experiment/5AxJ8LHoSre8kDAuLp4L7Q/#scalars
|
|
|
|
A pre-trained model and decoding logs can be found at <https://huggingface.co/yuekai/icefall-asr-aishell2-pruned-transducer-stateless5-A-2022-07-12>
|