Merge branch 'dev_zipformer_cn' of https://github.com/JinZr/icefall into dev_zipformer_cn

This commit is contained in:
JinZr 2023-08-14 01:26:59 +08:00
commit 6c088dfa48
5 changed files with 59 additions and 5 deletions

View File

@ -1,10 +1,12 @@
# Introduction
Please refer to <https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/aishell/index.html>
for how to run models in this recipe.
Please refer to <https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/aishell/index.html> for how to run models in this recipe.
Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co., Ltd.
400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. We hope to provide moderate amount of data for new researchers in the field of speech recognition.
(From [Open Speech and Language Resources](https://www.openslr.org/33/))
# Transducers

View File

@ -1,7 +1,11 @@
# Introduction
This recipe includes some different ASR models trained with Aishell2.
This recipe contains various different ASR models trained with Aishell2.
In AISHELL-2, 1000 hours of clean read-speech data from iOS is published, which is free for academic usage. On top of AISHELL-2 corpus, an improved recipe is developed and released, containing key components for industrial applications, such as Chinese word segmentation, flexible vocabulary expension and phone set transformation etc. Pipelines support various state-of-the-art techniques, such as time-delayed neural networks and Lattic-Free MMI objective funciton. In addition, we also release dev and test data from other channels (Android and Mic).
(From [AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale](https://arxiv.org/abs/1808.10583))
[./RESULTS.md](./RESULTS.md) contains the latest results.

View File

@ -1,7 +1,11 @@
# Introduction
This recipe includes some different ASR models trained with Aishell4 (including S, M and L three subsets).
This recipe contains some various ASR models trained with Aishell4 (including S, M and L three subsets).
The AISHELL-4 is a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenarios. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical application scenario in three aspects. With real recorded meetings, AISHELL-4 provides realistic acoustics and rich natural speech characteristics in conversation such as short pause, speech overlap, quick speaker turn, noise, etc. Meanwhile, the accurate transcription and speaker voice activity are provided for each meeting in AISHELL-4. This allows the researchers to explore different aspects in meeting processing, ranging from individual tasks such as speech front-end processing, speech recognition and speaker diarization, to multi-modality modeling and joint optimization of relevant tasks.
(From [Open Speech and Language Resources](https://www.openslr.org/111/))
[./RESULTS.md](./RESULTS.md) contains the latest results.

View File

@ -2,6 +2,50 @@
### Aishell4 Char training results (Pruned Transducer Stateless5)
#### 2023-08-14
#### Zipformer
[./zipformer](./zipformer)
It's reworked Zipformer with Pruned RNNT loss, note that results below are produced by model trained on data without speed perturbation applied.
**⚠️ If you prefer to have the speed perturbation disabled, please manually set `--perturb-speed` to `False` for `./local/compute_fbank_aishell.py` in the `prepare.sh` script.**
| | test | comment |
|------------------------|------|---------------------------------------|
| greedy search | 40.77 | --epoch 45 --avg 6 --max-duration 200 |
| modified beam search | 40.39 | --epoch 45 --avg 6 --max-duration 200 |
| fast beam search | 46.51 | --epoch 45 --avg 6 --max-duration 200 |
Command for training is:
```bash
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./zipformer/train.py \
--world-size 2 \
--num-epochs 45 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp \
--max-duration 1000
```
Command for decoding is:
```bash
for m in greedy_search modified_beam_search fast_beam_search ; do
./zipformer/decode.py \
--epoch 45 \
--avg 6 \
--exp-dir ./zipformer/exp \
--lang-dir data/lang_char \
--decoding-method $m
done
```
#### 2022-06-13
Using the codes from this PR https://github.com/k2-fsa/icefall/pull/399.