mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-09-04 06:34:20 +00:00
minor updates
This commit is contained in:
parent
fe35141e7e
commit
428579e3ac
@ -0,0 +1,19 @@
|
||||
# Introduction
|
||||
|
||||
This recipe includes scripts for training Zipformer model using both English and Chinese datasets.
|
||||
|
||||
# Included Training Sets
|
||||
|
||||
1. LibriSpeech (English)
|
||||
2. AiShell-2 (Chinese)
|
||||
3. TAL-CSASR (Code-Switching, Chinese and English)
|
||||
|
||||
|Datset| Number of hours| URL|
|
||||
|---|---:|---|
|
||||
|**TOTAL**|2,547|---|
|
||||
|LibriSpeech|960|https://www.openslr.org/12/|
|
||||
|AiShell-2|1,000|http://www.aishelltech.com/aishell_2|
|
||||
|TAL-CSASR|587|https://ai.100tal.com/openData/voice|
|
||||
|
||||
|
||||
|
@ -0,0 +1,44 @@
|
||||
## Results
|
||||
|
||||
### Zh-En datasets bpe-based training results (Non-streaming) on Zipformer model
|
||||
|
||||
This is the [pull request #1238](https://github.com/k2-fsa/icefall/pull/1265) in icefall.
|
||||
|
||||
#### Non-streaming (Byte-Level BPE vocab_size=2000)
|
||||
|
||||
Best results (num of params : ~69M):
|
||||
|
||||
The training command:
|
||||
|
||||
```
|
||||
./zipformer/train.py \
|
||||
--world-size 4 \
|
||||
--num-epochs 35 \
|
||||
--use-fp16 1 \
|
||||
--max-duration 1000 \
|
||||
--num-workers 8
|
||||
```
|
||||
|
||||
The decoding command:
|
||||
|
||||
```
|
||||
for method in greedy_search modified_beam_search fast_beam_search; do
|
||||
./zipformer/decode.py \
|
||||
--epoch 34 \
|
||||
--avg 19 \
|
||||
--decoding_method $method
|
||||
done
|
||||
```
|
||||
|
||||
Word Error Rates (WERs) listed below are produced by the checkpoint of the 20th epoch using greedy search and BPE model (# tokens is 2000).
|
||||
|
||||
| Datasets | TAL-CSASR | TAL-CSASR |
|
||||
|----------------------|-----------|-----------|
|
||||
| Zipformer WER (%) | dev | test |
|
||||
| greedy_search | 6.65 | 6.69 |
|
||||
| modified_beam_search | 6.46 | 6.51 |
|
||||
| fast_beam_search | 6.57 | 6.68 |
|
||||
|
||||
Pre-trained model can be found here : https://huggingface.co/zrjin/icefall-asr-zipformer-multi-zh-en-2023-11-22, which is trained on LibriSpeech 960-hour training set (with speed perturbation), TAL-CSASR training set (with speed perturbation) and AiShell-2 (w/o speed perturbation).
|
||||
|
||||
|
@ -13,7 +13,6 @@ dl_dir=$PWD/download
|
||||
. shared/parse_options.sh || exit 1
|
||||
|
||||
vocab_sizes=(
|
||||
500
|
||||
2000
|
||||
)
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user