mirror of https://github.com/k2-fsa/icefall.git synced 2025-08-08 09:32:20 +00:00

History

Whisper large fine-tuning on wenetspeech, mutli-hans-zh (#1483 )

* add whisper fbank for wenetspeech

* add whisper fbank for other dataset

* add str to bool

* add decode for wenetspeech

* add requirments.txt

* add original model decode with 30s

* test feature extractor speed

* add aishell2 feat

* change compute feature batch

* fix overwrite

* fix executor

* regression

* add kaldifeatwhisper fbank

* fix io issue

* parallel jobs

* use multi machines

* add wenetspeech fine-tune scripts

* add monkey patch codes

* remove useless file

* fix subsampling factor

* fix too long audios

* add remove long short

* fix whisper version to support multi batch beam

* decode all wav files

* remove utterance more than 30s in test_net

* only test net

* using soft links

* add kespeech whisper feats

* fix index error

* add manifests for whisper

* change to licomchunky writer

* add missing option

* decrease cpu usage 

* add speed perturb for kespeech

* fix kespeech speed perturb

* add dataset

* load checkpoint from specific path

* add speechio

* add speechio results

---------

Co-authored-by: zr_jin <peter.jin.cn@gmail.com>

2024-03-07 19:04:27 +08:00

conformer_ctc

Use high_freq -400 in computing fbank features. (#1447 )

2024-01-04 13:59:32 +08:00

conformer_mmi

apply black on all files

2022-11-17 09:42:17 -05:00

local

Whisper Fine-tuning Recipe on Aishell1 (#1466 )

2024-01-27 00:32:30 +08:00

pruned_transducer_stateless2

fixed a CI test for wenetspeech (#1476 )

2024-01-27 06:41:56 +08:00

pruned_transducer_stateless3

fixed a CI test for wenetspeech (#1476 )

2024-01-27 06:41:56 +08:00

pruned_transducer_stateless7

fixed a CI test for wenetspeech (#1476 )

2024-01-27 06:41:56 +08:00

pruned_transducer_stateless7_bbpe

Use high_freq -400 in computing fbank features. (#1447 )

2024-01-04 13:59:32 +08:00

pruned_transducer_stateless7_streaming

streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448 )

2024-01-05 10:21:27 +08:00

tdnn_lstm_ctc

Fix buffer size of DynamicBucketingSampler (#1468 )

2024-01-21 02:10:42 +08:00

transducer_stateless

fixed a CI test for wenetspeech (#1476 )

2024-01-27 06:41:56 +08:00

transducer_stateless_modified

fixed a CI test for wenetspeech (#1476 )

2024-01-27 06:41:56 +08:00

transducer_stateless_modified-2

fixed a CI test for wenetspeech (#1476 )

2024-01-27 06:41:56 +08:00

whisper

minor fix for docstr and default param. (#1490 )

2024-02-05 12:47:52 +08:00

zipformer

A Zipformer recipe with Byte-level BPE for Aishell-1 (#1464 )

2024-01-16 21:08:35 +08:00

prepare_aidatatang_200zh.sh

disable speed perturbation by default (#1176 )

2023-08-10 20:56:02 +08:00

prepare.sh

Whisper Fine-tuning Recipe on Aishell1 (#1466 )

2024-01-27 00:32:30 +08:00

README.md

Whisper Fine-tuning Recipe on Aishell1 (#1466 )

2024-01-27 00:32:30 +08:00

RESULTS.md

Whisper large fine-tuning on wenetspeech, mutli-hans-zh (#1483 )

2024-03-07 19:04:27 +08:00

shared

Add aishell recipe (#30 )

2021-11-18 10:00:47 +08:00

README.md

Introduction

Please refer to https://k2-fsa.github.io/icefall/recipes/Non-streaming-ASR/aishell/index.html for how to run models in this recipe.

Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co., Ltd. 400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. We hope to provide moderate amount of data for new researchers in the field of speech recognition.

(From Open Speech and Language Resources)

Transducers

There are various folders containing the name transducer in this folder. The following table lists the differences among them.

	Encoder	Decoder	Comment
`transducer_stateless`	Conformer	Embedding + Conv1d	with `k2.rnnt_loss`
`transducer_stateless_modified`	Conformer	Embedding + Conv1d	with modified transducer from `optimized_transducer`
`transducer_stateless_modified-2`	Conformer	Embedding + Conv1d	with modified transducer from `optimized_transducer` + extra data
`pruned_transducer_stateless3`	Conformer (reworked)	Embedding + Conv1d	pruned RNN-T + reworked model with random combiner + using aidatatang_20zh as extra data
`pruned_transducer_stateless7`	Zipformer	Embedding	pruned RNN-T + zipformer encoder + stateless decoder with context-size 1

The decoder in transducer_stateless is modified from the paper Rnn-Transducer with Stateless Prediction Network. We place an additional Conv1d layer right after the input embedding layer.

Whisper

Recipe to finetune large pretrained models

	Encoder	Decoder	Comment
`whisper`	Transformer	Transformer	support fine-tuning using deepspeed