mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-12-09 14:05:33 +00:00
Fixes incorrect computation of encoder_dim when encoder_dim is a comma-separated list of integers by ensuring numeric (not lexicographic) max is used. Fixes #2018 - Replace int(max(params.encoder_dim.split(","))) (lexicographic max on strings) with max(_to_int_tuple(params.encoder_dim)) (numeric max). - Apply the fix consistently across all affected training scripts.
Introduction
This recipe includes some different ASR models trained with WenetSpeech.
./RESULTS.md contains the latest results.
Transducers
There are various folders containing the name transducer in this folder.
The following table lists the differences among them.
| Encoder | Decoder | Comment | |
|---|---|---|---|
pruned_transducer_stateless2 |
Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss |
pruned_transducer_stateless5 |
Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss |
The decoder in transducer_stateless is modified from the paper
Rnn-Transducer with Stateless Prediction Network.
We place an additional Conv1d layer right after the input embedding layer.