mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-26 18:24:18 +00:00
119 lines
3.7 KiB
Markdown
119 lines
3.7 KiB
Markdown
# Introduction
|
|
|
|
This is a pseudo-labeling based semi-supervised ASR recipe for the LibriSpeech dataset. The ASR model is Zipformer Transducer. The labeled data is Labeled data is LibriSpeech train-clean-100. Unlabeled data can be LibriSpeech "train-clean-360 + train-other-500" for conventional semi-supervised learning or TedLium3 training set for unsupervised domain adaptation.
|
|
|
|
## Description of the recipe
|
|
|
|
### Preparation of data
|
|
|
|
The data required in this recipe is the same with LibriSpeech/TedLium3 ASR recipe. And the tokenizer of LibriSpeech is used to build the model. Therefore, we can reuse the `prepare.sh` scripts in those recipes.
|
|
|
|
### Supervised training for the seed ASR model
|
|
|
|
Firstly, we need to perform supervised training on the LibriSpeech train-clean-100 subset to generate the seed model for the following pseudo-labeling based semi-supervsed training.
|
|
|
|
```
|
|
export CUDA_VISIBLE_DEVICES="0,1,2,3"
|
|
./zipformer/train_seed.py \
|
|
--world-size 4 \
|
|
--num-epochs 70 \
|
|
--start-epoch 1 \
|
|
--use-fp16 1 \
|
|
--exp-dir zipformer/exp_seed \
|
|
--max-duration 1000
|
|
```
|
|
|
|
For better performance of the seed model, we average the checkpoints as follows:
|
|
|
|
```
|
|
./zipformer/generate_averaged_model.py \
|
|
--epoch 70 \
|
|
--avg 30 \
|
|
--exp-dir ./zipformer/exp_seed
|
|
```
|
|
|
|
The above command generates the final seed model `./zipformer/exp_seed/epoch-70-avg-30.pt`
|
|
|
|
### Semi-supervised training for the final ASR model
|
|
|
|
Then, we peform semi-supervised training with the seed model as the initialization.
|
|
|
|
- Conventional semi-supervised learning setting where unlabeled data is "train-clean-360 + train-other-500":
|
|
|
|
```
|
|
./zipformer/train_pl.py \
|
|
--world-size 4 \
|
|
--num-epochs 20 \
|
|
--start-epoch 1 \
|
|
--use-fp16 1 \
|
|
--exp-dir zipformer/exp_pl_librispeech \
|
|
--max-duration 1000 \
|
|
--seed-model-path "zipformer/exp_seed/epoch-70-avg-30.pt" \
|
|
--unlabeled-dataset "librispeech"
|
|
```
|
|
|
|
- Unsupervised domain adaptation setting where unlabeled data is TedLium3 training set:
|
|
|
|
```
|
|
./zipformer/train_pl.py \
|
|
--world-size 4 \
|
|
--num-epochs 20 \
|
|
--start-epoch 1 \
|
|
--use-fp16 1 \
|
|
--exp-dir zipformer/exp_pl_tedlium \
|
|
--max-duration 1000 \
|
|
--seed-model-path "zipformer/exp_seed/epoch-70-avg-30.pt" \
|
|
--unlabeled-dataset "tedlium"
|
|
```
|
|
|
|
### Decode
|
|
|
|
Finally, we decode the ASR model to evaluate the performance.
|
|
|
|
- Evaluate on the LibriSpeech dataset:
|
|
|
|
```
|
|
./zipformer/decode.py \
|
|
--epoch 20 \
|
|
--avg 10 \
|
|
--exp-dir ./zipformer/exp_pl_librispeech \
|
|
--max-duration 600 \
|
|
--decoding-method modified_beam_search \
|
|
--beam-size 4 \
|
|
--dataset "librispeech"
|
|
```
|
|
|
|
- Evaluate on the TedLium3 dataset:
|
|
|
|
```
|
|
./zipformer/decode.py \
|
|
--epoch 20 \
|
|
--avg 10 \
|
|
--exp-dir ./zipformer/exp_pl_tedlium \
|
|
--max-duration 600 \
|
|
--decoding-method modified_beam_search \
|
|
--beam-size 4 \
|
|
--dataset "tedlium"
|
|
```
|
|
|
|
## Results
|
|
|
|
- Conventional semi-supervised learning (LibriSpeech 100h/LibriSpeech 860h)
|
|
|
|
| Model | test-clean | test-other | comment |
|
|
|-------------------------|------------|------------|---------------------|
|
|
| supervised seed model | 5.45 | 13.7 | --epoch 70 --avg 30 |
|
|
| pseudo-labeling model | 4.33 | 9.61 | --epoch 20 --avg 10 |
|
|
|
|
- Unsupervised domain adaptation (LibriSpeech 100h/TedLium3)
|
|
|
|
| Model | tedlium3 dev | tedlium3 test | comment |
|
|
|-------------------------|------------|------------|---------------------|
|
|
| supervised seed model | 18.29 | 18.16 | --epoch 70 --avg 30 |
|
|
| pseudo-labeling model | 14.97 | 14.65 | --epoch 20 --avg 10 |
|
|
|
|
|
|
## Pre-trained models and logs
|
|
|
|
You can find the pre-trained models, training logs, tensorboard logs, decoding logs and decoding results at <https://huggingface.co/zhu-han/icefall-pl-librispeech-zipformer-medium-2023-08-06>
|