mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-12-09 22:15:28 +00:00
- some AudioTransform classes produce audio signals out of range [-1,+1]
- Resample produced 1.0079
- The range [-10,+10] was chosen to still be able to reliably
distinguish from the [-32k,+32k] signal...
- this is related to : https://github.com/lhotse-speech/lhotse/issues/1254
Introduction
This recipe includes some different ASR models trained with WenetSpeech.
./RESULTS.md contains the latest results.
Transducers
There are various folders containing the name transducer in this folder.
The following table lists the differences among them.
| Encoder | Decoder | Comment | |
|---|---|---|---|
pruned_transducer_stateless2 |
Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss |
pruned_transducer_stateless5 |
Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss |
The decoder in transducer_stateless is modified from the paper
Rnn-Transducer with Stateless Prediction Network.
We place an additional Conv1d layer right after the input embedding layer.