Archived

This repository has been archived on 2026-03-23. You can view files and clone it, but cannot push or open issues or pull requests.

History

Karel Vesely 716b82cc3a

streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448 )

- some AudioTransform classes produce audio signals out of range [-1,+1]
   - Resample produced 1.0079
   - The range [-10,+10] was chosen to still be able to reliably
     distinguish from the [-32k,+32k] signal...
- this is related to : https://github.com/lhotse-speech/lhotse/issues/1254

2024-01-05 10:21:27 +08:00

local

minor fixes (#1240 )

2023-09-04 17:56:05 +08:00

pruned_transducer_stateless2

Use high_freq -400 in computing fbank features. (#1447 )

2024-01-04 13:59:32 +08:00

pruned_transducer_stateless5

streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448 )

2024-01-05 10:21:27 +08:00

zipformer

streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448 )

2024-01-05 10:21:27 +08:00

finetune.sh

Add finetuning script for aishell (#974 )

2023-03-30 17:08:46 +08:00

prepare.sh

Update the parameter 'vocab-size' (#1364 )

2023-11-02 20:45:30 +08:00

README.md

[WIP] Pruned-transducer-stateless5-for-WenetSpeech (offline and streaming) (#447 )

2022-07-28 12:54:27 +08:00

RESULTS.md

zipformer wenetspeech (#1130 )

2023-06-26 09:33:18 +08:00

shared

[Ready to merge] Pruned Transducer Stateless2 for WenetSpeech (char-based) (#349 )

2022-05-23 17:13:01 +08:00

README.md

Introduction

This recipe includes some different ASR models trained with WenetSpeech.

./RESULTS.md contains the latest results.

Transducers

There are various folders containing the name transducer in this folder. The following table lists the differences among them.

	Encoder	Decoder	Comment
`pruned_transducer_stateless2`	Conformer(modified)	Embedding + Conv1d	Using k2 pruned RNN-T loss
`pruned_transducer_stateless5`	Conformer(modified)	Embedding + Conv1d	Using k2 pruned RNN-T loss

The decoder in transducer_stateless is modified from the paper Rnn-Transducer with Stateless Prediction Network. We place an additional Conv1d layer right after the input embedding layer.