icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-08-08 09:32:20 +00:00

Author	SHA1	Message	Date
Zengwei Yao	0622dea30d	Add a TTS recipe VITS on LJSpeech dataset (#1372 ) * first commit * replace phonimizer with g2p * use Conformer as text encoder * modify training script, clean codes * rename directory * convert text to tokens in data preparation stage * fix tts_datamodule.py * support onnx export and testing the exported onnx model * add doc * add README.md * fix style	2023-11-29 21:28:38 +08:00
Fangjun Kuang	2318c3fbd0	Support CTC decoding on CPU using OpenFst and kaldi decoders. (#1244 )	2023-09-26 16:36:19 +08:00
Zengwei Yao	f18b539fbc	Add the upgraded Zipformer model (#1058 ) * add the zipformer codes, copied from branch from_dan_scaled_adam_exp1119 * support model export with torch.jit.script * update RESULTS.md * support exporting streaming model with torch.jit.script * add results of streaming models, with some minor changes * update README.md * add CI test * update k2 version in requirements-ci.txt * update pyproject.toml	2023-05-19 16:47:59 +08:00
Zengwei Yao	b25c234c51	Add Zipformer-MMI (#746 ) * Minor fix to conformer-mmi * Minor fixes * Fix decode.py * add training files * train with ctc warmup * add pruned_transducer_stateless7_mmi * add zipformer_mmi/mmi_decode.py, using HP as decoding graph * add mmi_decode.py * remove pruned_transducer_stateless7_mmi * rename zipformer_mmi/train_with_ctc.py as zipformer_mmi/train.py * remove unused method * rename mmi_decode.py * add export.py pretrained.py jit_pretrained.py ... * add RESULTS.md * add CI test * add docs * add README.md Co-authored-by: pkufool <wkang.pku@gmail.com>	2022-12-11 21:30:39 +08:00
Zengwei Yao	ece728d895	Apply delay penalty on k2 ctc loss (#669 ) * add init files * fix bug, apply delay penalty * fix decoding code and getting timestamps * add option applying delay penalty on ctc log-prob * fix bug of streaming decoding * minor change for bpe-based case * add test_model.py * add README.md * add CI	2022-11-28 22:34:02 +08:00
Zengwei Yao	f3ad32777a	Gradient filter for training lstm model (#564 ) * init files * add gradient filter module * refact getting median value * add cutoff for grad filter * delete comments * apply gradient filter in LSTM module, to filter both input and params * fix typing and refactor * filter with soft mask * rename lstm_transducer_stateless2 to lstm_transducer_stateless3 * fix typos, and update RESULTS.md * minor fix * fix return typing * fix typo	2022-09-29 11:15:43 +08:00
Fangjun Kuang	97b3fc53aa	Add LSTM for the multi-dataset setup. (#558 ) * Add LSTM for the multi-dataset setup. * Add results * fix style issues * add missing file	2022-09-16 18:40:25 +08:00
Zengwei Yao	f2f5baf687	Use ScaledLSTM as streaming encoder (#479 ) * add ScaledLSTM * add RNNEncoderLayer and RNNEncoder classes in lstm.py * add RNN and Conv2dSubsampling classes in lstm.py * hardcode bidirectional=False * link from pruned_transducer_stateless2 * link scaling.py pruned_transducer_stateless2 * copy from pruned_transducer_stateless2 * modify decode.py pretrained.py test_model.py train.py * copy streaming decoding files from pruned_transducer_stateless2 * modify streaming decoding files * simplified code in ScaledLSTM * flat weights after scaling * pruned2 -> pruned4 * link __init__.py * fix style * remove add_model_arguments * modify .flake8 * fix style * fix scale value in scaling.py * add random combiner for training deeper model * add using proj_size * add scaling converter for ScaledLSTM * support jit trace * add using averaged model in export.py * modify test_model.py, test if the model can be successfully exported by jit.trace * modify pretrained.py * support streaming decoding * fix model.py * Add cut_id to recognition results * Add cut_id to recognition results * do not pad in Conv subsampling module; add tail padding during decoding. * update RESULTS.md * minor fix * fix doc * update README.md * minor change, filter infinite loss * remove the condition of raise error * modify type hint for the return value in model.py * minor change * modify RESULTS.md Co-authored-by: pkufool <wkang.pku@gmail.com>	2022-08-19 14:38:45 +08:00
Quandwang	116d0cf26d	CTC attention model with reworked Conformer encoder and reworked Transformer decoder (#462 ) * ctc attention model with reworked conformer encoder and reworked transformer decoder * remove unnecessary func * resolve flake8 conflicts * fix typos and modify the expr of ScaledEmbedding * use original beam size * minor changes to the scripts * add rnn lm decoding * minor changes * check whether q k v weight is None * check whether q k v weight is None * check whether q k v weight is None * style correction * update results * update results * upload the decoding results of rnn-lm to the RESULTS * upload the decoding results of rnn-lm to the RESULTS * Update egs/librispeech/ASR/RESULTS.md Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Update egs/librispeech/ASR/RESULTS.md Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Update egs/librispeech/ASR/RESULTS.md Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>	2022-07-22 15:31:25 +08:00
Zengwei Yao	bc2882ddcc	Simplified memory bank for Emformer (#440 ) * init files * use average value as memory vector for each chunk * change tail padding length from right_context_length to chunk_length * correct the files, ln -> cp * fix bug in conv_emformer_transducer_stateless2/emformer.py * fix doc in conv_emformer_transducer_stateless/emformer.py * refactor init states for stream * modify .flake8 * fix bug about memory mask when memory_size==0 * add @torch.jit.export for init_states function * update RESULTS.md * minor change * update README.md * modify doc * replace torch.div() with << * fix bug, >> -> << * use i&i-1 to judge if it is a power of 2 * minor fix * fix error in RESULTS.md	2022-07-12 19:19:58 +08:00
Zengwei Yao	53f38c01d2	Emformer with conv module and scaling mechanism (#389 ) * copy files from existing branch * add rule in .flake8 * monir style fix * fix typos * add tail padding * refactor, use fixed-length cache for batch decoding * copy from streaming branch * copy from streaming branch * modify emformer states stack and unstack, streaming decoding, to be continued * refactor Stream class * remane streaming_feature_extractor.py * refactor streaming decoding * test states stack and unstack * fix bugs, no grad, and num_proccessed_frames * add modify_beam_search, fast_beam_search * support torch.jit.export * use torch.div * copy from pruned_transducer_stateless4 * modify export.py * add author info * delete other test functions * minor fix * modify doc * fix style * minor fix doc * minor fix * minor fix doc * update RESULTS.md * fix typo * add info * fix typo * fix doc * add test function for conv module, and minor fix. * add copyright info * minor change of test_emformer.py * fix doc of stack and unstack, test case with batch_size=1 * update README.md	2022-06-13 15:09:17 +08:00
Mingshuang Luo	ec5a112831	[Ready to merge] Do some coding style checks for the latest files (#379 ) * style check * do changes for .flake8 * a change for compute_fbank_yesno.py	2022-05-20 19:30:38 +08:00
Guanbo Wang	48a6a9a549	GigaSpeech RNN-T experiments (#318 ) * Copy RNN-T recipe from librispeech * flake8 * flake8 * Update params * gigaspeech decode * black * Update results * syntax highlight * Update RESULTS.md * typo	2022-05-13 11:03:26 +08:00
Zengwei Yao	00c48ec1f3	Model average (#344 ) * First upload of model average codes. * minor fix * update decode file * update .flake8 * rename pruned_transducer_stateless3 to pruned_transducer_stateless4 * change epoch number counter starting from 1 instead of 0 * minor fix of pruned_transducer_stateless4/train.py * refactor the checkpoint.py * minor fix, update docs, and modify the epoch number to count from 1 in the pruned_transducer_stateless4/decode.py * update author info * add docs of the scaling in function average_checkpoints_with_averaged_model	2022-05-05 21:20:04 +08:00
Fangjun Kuang	ac84220de9	Modified conformer with multi datasets (#312 ) * Copy files for editing. * Use librispeech + gigaspeech with modified conformer. * Support specifying number of workers for on-the-fly feature extraction. * Feature extraction code for GigaSpeech. * Combine XL splits lazily during training. * Fix warnings in decoding. * Add decoding code for GigaSpeech. * Fix decoding the gigaspeech dataset. We have to use the decoder/joiner networks for the GigaSpeech dataset. * Disable speed perturbe for XL subset. * Compute the Nbest oracle WER for RNN-T decoding. * Minor fixes. * Minor fixes. * Add results. * Update results. * Update CI. * Update results. * Fix style issues. * Update results. * Fix style issues.	2022-04-29 15:40:30 +08:00
Wang, Guanbo	5fe58de43c	GigaSpeech recipe (#120 ) * initial commit * support download, data prep, and fbank * on-the-fly feature extraction by default * support BPE based lang * support HLG for BPE * small fix * small fix * chunked feature extraction by default * Compute features for GigaSpeech by splitting the manifest. * Fixes after review. * Split manifests into 2000 pieces. * set audio duration mismatch tolerance to 0.01 * small fix * add conformer training recipe * Add conformer.py without pre-commit checking * lazy loading and use SingleCutSampler * DynamicBucketingSampler * use KaldifeatFbank to compute fbank for musan * use pretrained language model and lexicon * use 3gram to decode, 4gram to rescore * Add decode.py * Update .flake8 * Delete compute_fbank_gigaspeech.py * Use BucketingSampler for valid and test dataloader * Update params in train.py * Use bpe_500 * update params in decode.py * Decrease num_paths while CUDA OOM * Added README * Update RESULTS * black * Decrease num_paths while CUDA OOM * Decode with post-processing * Update results * Remove lazy_load option * Use default `storage_type` * Keep the original tolerance * Use split-lazy * black * Update pretrained model Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>	2022-04-14 16:07:22 +08:00
Mingshuang Luo	93c60a9d30	Code style check for librispeech pruned transducer stateless2 (#308 )	2022-04-11 22:15:18 +08:00
Zengwei Yao	0b6a2213c3	Modify icefall/__init__.py. (#287 ) * Modify icefall/__init__.py to import common functions defined in icefall/utils.py. * Modify icefall/__init__.py and .flake8.	2022-04-02 15:01:45 +08:00
Mingshuang Luo	ad28c8c5eb	Tedlium3 transducer stateless (#233 ) * add tedlium3 transducer-stateless	2022-03-18 11:39:06 +08:00
Wei Kang	b702281e90	Use k2 pruned transducer loss to train conformer-transducer model (#194 ) * Using k2 pruned version transducer loss to train model * Fix style * Minor fixes	2022-02-17 13:33:54 +08:00
Wei Kang	30c43b7f69	Add aishell recipe (#30 ) * Add aishell recipe * Remove unnecessary code and update docs * adapt to k2 v1.7, add docs and results * Update conformer ctc model * Update docs, pretrained.py & results * Fix code style * Fix code style * Fix code style * Minor fix * Minor fix * Fix pretrained.py * Update pretrained model & corresponding docs	2021-11-18 10:00:47 +08:00
Fangjun Kuang	53b79fafa7	Add MMI training with word pieces as modelling unit. (#6 ) * Fix an error in TDNN-LSTM training. * WIP: Refactoring * Refactor transformer.py * Remove unused code. * Minor fixes. * Fix decoder padding mask. * Add MMI training with word pieces. * Remove unused files. * Minor fixes. * Refactoring. * Minor fixes. * Use pre-computed alignments in LF-MMI training. * Minor fixes. * Update decoding script. * Add doc about how to check and use extracted alignments. * Fix style issues. * Fix typos. * Fix style issues. * Disable macOS tests for now.	2021-10-18 15:20:32 +08:00
Fangjun Kuang	184dbb3ea5	Add documentation about code style and creating new recipes. (#27 )	2021-08-25 14:48:41 +08:00
pkufool	f4223ee110	Add TDNN-LSTM-CTC Results (#25 ) * Add tdnn-lstm pretrained model and results * Add docs for TDNN-LSTM-CTC * Minor fix * Fix typo * Fix style checking	2021-08-24 21:09:27 +08:00
pkufool	19c4214958	Fix code style and add copyright. (#18 ) * Fix style and add copyright * Minor fix * Remove duplicate lines * Reformat conformer.py by black * Reformat code style with black. * Fix github workflows * Fix lhotse installation * Install icefall requirements * Update k2 version, remove lhotse from test workflow	2021-08-23 10:43:59 +08:00
Fangjun Kuang	e005ea062c	Minor fixes after review.	2021-07-20 10:02:20 +08:00
Fangjun Kuang	71c4e29ad5	Add style check tools.	2021-07-15 17:36:48 +08:00

27 Commits