mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-26 10:16:14 +00:00
update README & RESULTS
This commit is contained in:
parent
3505a8ec45
commit
ea1d9b20a8
31
egs/reazonspeech/ASR/README.md
Normal file
31
egs/reazonspeech/ASR/README.md
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
# Introduction
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
**ReazonSpeech** is an open-source dataset that contains a diverse set of natural Japanese speech, collected from terrestrial television streams. It contains more than 35,000 hours of audio.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
The dataset is available on Hugging Face. For more details, please visit:
|
||||||
|
|
||||||
|
- Dataset: https://huggingface.co/datasets/reazon-research/reazonspeech
|
||||||
|
- Paper: https://research.reazon.jp/_static/reazonspeech_nlp2023.pdf
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
[./RESULTS.md](./RESULTS.md) contains the latest results.
|
||||||
|
|
||||||
|
# Transducers
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
There are various folders containing the name `transducer` in this folder. The following table lists the differences among them.
|
||||||
|
|
||||||
|
| | Encoder | Decoder | Comment |
|
||||||
|
| ---------------------------------------- | -------------------- | ------------------ | ------------------------------------------------- |
|
||||||
|
| `pruned_transducer_stateless2` | Conformer (modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss |
|
||||||
|
| `pruned_transducer_stateless7_streaming` | Streaming Zipformer | Embedding + Conv1d | streaming version of pruned_transducer_stateless7 |
|
||||||
|
| `zipformer` | Upgraded Zipformer | Embedding + Conv1d | The latest recipe |
|
||||||
|
|
||||||
|
The decoder in `transducer_stateless` is modified from the paper [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/). We place an additional Conv1d layer right after the input embedding layer.
|
||||||
|
|
49
egs/reazonspeech/ASR/RESULTS.md
Normal file
49
egs/reazonspeech/ASR/RESULTS.md
Normal file
@ -0,0 +1,49 @@
|
|||||||
|
## Results
|
||||||
|
|
||||||
|
### Zipformer
|
||||||
|
|
||||||
|
#### Non-streaming
|
||||||
|
|
||||||
|
##### large-scaled model, number of model parameters: 159337842, i.e., 159.34 M
|
||||||
|
|
||||||
|
| decoding method | In-Distribution CER | JSUT | CommonVoice | TEDx | comment |
|
||||||
|
| :------------------: | :-----------------: | :--: | :---------: | :---: | :----------------: |
|
||||||
|
| greedy search | 4.2 | 6.7 | 7.84 | 17.9 | --epoch 39 --avg 7 |
|
||||||
|
| modified beam search | 4.13 | 6.77 | 7.69 | 17.82 | --epoch 39 --avg 7 |
|
||||||
|
|
||||||
|
The training command is:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
./zipformer/train.py \
|
||||||
|
--world-size 8 \
|
||||||
|
--num-epochs 40 \
|
||||||
|
--start-epoch 1 \
|
||||||
|
--use-fp16 1 \
|
||||||
|
--exp-dir zipformer/exp-large \
|
||||||
|
--causal 0 \
|
||||||
|
--num-encoder-layers 2,2,4,5,4,2 \
|
||||||
|
--feedforward-dim 512,768,1536,2048,1536,768 \
|
||||||
|
--encoder-dim 192,256,512,768,512,256 \
|
||||||
|
--encoder-unmasked-dim 192,192,256,320,256,192 \
|
||||||
|
--lang data/lang_char \
|
||||||
|
--max-duration 1600
|
||||||
|
```
|
||||||
|
|
||||||
|
The decoding command is:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
./zipformer/decode.py \
|
||||||
|
--epoch 40 \
|
||||||
|
--avg 16 \
|
||||||
|
--exp-dir zipformer/exp-large \
|
||||||
|
--max-duration 600 \
|
||||||
|
--causal 0 \
|
||||||
|
--decoding-method greedy_search \
|
||||||
|
--num-encoder-layers 2,2,4,5,4,2 \
|
||||||
|
--feedforward-dim 512,768,1536,2048,1536,768 \
|
||||||
|
--encoder-dim 192,256,512,768,512,256 \
|
||||||
|
--encoder-unmasked-dim 192,192,256,320,256,192 \
|
||||||
|
--lang data/lang_char \
|
||||||
|
--blank-penalty 0
|
||||||
|
```
|
||||||
|
|
Loading…
x
Reference in New Issue
Block a user