Fangjun Kuang 63563d16d3
Fix setting joiner dim (#2027)
Fixes incorrect computation of encoder_dim when encoder_dim is a comma-separated list of integers by ensuring numeric (not lexicographic) max is used.

Fixes #2018

- Replace int(max(params.encoder_dim.split(","))) (lexicographic max on strings) with max(_to_int_tuple(params.encoder_dim)) (numeric max).
- Apply the fix consistently across all affected training scripts.
2025-09-19 09:42:41 +08:00
..
2025-09-19 09:42:41 +08:00
2024-10-16 17:19:24 +08:00
2023-06-26 09:33:18 +08:00

Introduction

This recipe includes some different ASR models trained with WenetSpeech.

./RESULTS.md contains the latest results.

Transducers

There are various folders containing the name transducer in this folder. The following table lists the differences among them.

Encoder Decoder Comment
pruned_transducer_stateless2 Conformer(modified) Embedding + Conv1d Using k2 pruned RNN-T loss
pruned_transducer_stateless5 Conformer(modified) Embedding + Conv1d Using k2 pruned RNN-T loss

The decoder in transducer_stateless is modified from the paper Rnn-Transducer with Stateless Prediction Network. We place an additional Conv1d layer right after the input embedding layer.