icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-08-08 09:32:20 +00:00

Author	SHA1	Message	Date
Fangjun Kuang	902dc2364a	Update docker for torch 2.1 (#1326 )	2023-10-22 23:25:06 +08:00
Yifan Yang	416852e8a1	Add Zipformer recipe for GigaSpeech (#1254 ) Co-authored-by: Yifan Yang <yifanyeung@qq.com> Co-authored-by: yfy62 <yfy62@d3-hpc-sjtu-test-005.cm.cluster>	2023-10-21 15:36:59 +08:00
zr_jin	82199b8fe1	Init commit for swbd (#1146 )	2023-10-07 11:44:18 +08:00
Fangjun Kuang	109354b6b8	Add CTC HLG decoding for zipformer (#1287 )	2023-10-02 14:00:06 +08:00
Fangjun Kuang	f14b673408	Add HLG decoding with OpenFst on CPU for aishell conformer_ctc (#1279 )	2023-10-01 13:46:16 +08:00
Fangjun Kuang	772ee3955b	Support HLG decoding using OpenFst with kaldi decoders (#1275 )	2023-09-27 14:49:27 +08:00
Fangjun Kuang	2318c3fbd0	Support CTC decoding on CPU using OpenFst and kaldi decoders. (#1244 )	2023-09-26 16:36:19 +08:00
zr_jin	0f1bc6f8af	Multi_zh-Hans Recipe (#1238 ) * Init commit for recipes trained on multiple zh datasets. * fbank extraction for thchs30 * added support for aishell1 * added support for aishell-2 * fixes * fixes * fixes * added support for stcmds and primewords * fixes * added support for magicdata script for fbank computation not done yet * added script for magicdata fbank computation * file permission fixed * updated for the wenetspeech recipe * updated * Update preprocess_kespeech.py * updated * updated * updated * updated * file permission fixed * updated paths * fixes * added support for kespeech dev/test set fbank computation * fixes for file permission * refined support for KeSpeech * added scripts for BPE model training * updated * init commit for the multi_zh-cn zipformer recipe * disable speed perturbation by default * updated * updated * added necessary files for the zipformer recipe * removed redundant wenetspeech M and S sets * updates for multi dataset decoding * refined * formatting issues fixed * updated * minor fixes * this commit finalize the recipe (hopefully) * fixed formatting issues * minor fixes * updated * using soft links to reduce redundancy * minor updates * using soft links to reduce redundancy * minor updates * minor updates * using soft links to reduce redundancy * minor updates * Update README.md * minor updates * Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * minor updates * minor fixes * fixed a formatting issue * Update preprocess_kespeech.py * Update prepare.sh * Update egs/multi_zh-hans/ASR/local/compute_fbank_kespeech_splits.py Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Update egs/multi_zh-hans/ASR/local/preprocess_kespeech.py Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * removed redundant files * symlinks added * minor updates * added CI tests for `multi_zh-hans` * minor fixes * Update run-multi-zh_hans-zipformer.sh * Update run-multi-zh_hans-zipformer.sh * Update run-multi-zh_hans-zipformer.sh * Update run-multi-zh_hans-zipformer.sh * Update run-multi-zh_hans-zipformer.sh * Update run-multi-zh_hans-zipformer.sh * Update run-multi-zh_hans-zipformer.sh --------- Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>	2023-09-13 11:57:05 +08:00
zr_jin	49a4b67288	fixed a CI test issue related to python version (#1243 )	2023-09-07 19:48:46 +08:00
zr_jin	c912bd65d0	Update run-gigaspeech-pruned-transducer-stateless2-2022-05-12.sh (#1242 )	2023-09-07 18:48:27 +08:00
zr_jin	a81396b482	Use tokens.txt to replace bpe.model (#1162 )	2023-08-12 16:53:59 +08:00
Fangjun Kuang	d6b28a11a7	Add export script for the yesno recipe. (#1212 )	2023-08-11 23:57:00 +08:00
Fangjun Kuang	375520d419	Run the yesno recipe with docker in GitHub actions (#1191 )	2023-07-28 15:43:08 +08:00
Fangjun Kuang	751bb6ff1a	Add docker image for icefall (#1189 )	2023-07-28 10:34:40 +08:00
Fangjun Kuang	1dbbd7759e	Add tests for subsample.py and fix typos (#1180 )	2023-07-25 14:46:18 +08:00
Yifan Yang	ffe816e2a8	Fix blank skip ci test (#1167 ) * Fix for ci * Fix frame_reducer	2023-07-06 23:12:41 +08:00
Fangjun Kuang	6fd674312c	Fix failed CI tests (#1166 )	2023-07-05 10:52:34 +08:00
Wei Kang	219bba1310	zipformer wenetspeech (#1130 ) * copy files * update train.py * small fixes * Add decode.py * Fix dataloader in decode.py * add blank penalty * Add blank-penalty to other decoding method * Minor fixes * add zipformer2 recipe * Minor fixes * Remove pruned7 * export and test models * Replace bpe with tokens in export.py and pretrain.py * Minor fixes * Minor fixes * Minor fixes * Fix export * Update results * Fix zipformer-ctc * Fix ci * Fix ci * Fix CI * Fix CI --------- Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>	2023-06-26 09:33:18 +08:00
Zengwei Yao	0ad037d076	Add CTC loss option in zipformer recipe (#1111 ) * add CTC loss option in zipformer recipe * add ctc_decode.py * support CTC model export, add jit_pretrained_ctc.py, pretrained_ctc.py * update README.md and RESULTS.md * add CI test	2023-06-14 14:27:29 +08:00
Yifan Yang	7c4ff66a3d	Fix yesno Cl test (#1078 )	2023-05-22 12:46:43 +08:00
Fangjun Kuang	3883e362ad	Fix yesno CI test (#1077 )	2023-05-22 12:29:51 +08:00
Zengwei Yao	f18b539fbc	Add the upgraded Zipformer model (#1058 ) * add the zipformer codes, copied from branch from_dan_scaled_adam_exp1119 * support model export with torch.jit.script * update RESULTS.md * support exporting streaming model with torch.jit.script * add results of streaming models, with some minor changes * update README.md * add CI test * update k2 version in requirements-ci.txt * update pyproject.toml	2023-05-19 16:47:59 +08:00
marcoyang1998	34d1b07c3d	Modified beam search with RNNLM rescoring (#1002 ) * add RNNLM rescore * add shallow fusion and lm rescore for streaming zipformer * minor fix * update RESULTS.md * fix yesno workflow, change from ubuntu-18.04 to ubuntu-latest	2023-04-17 16:43:00 +08:00
Yifan Yang	a48812ddb3	Ban the test_rnn.py in ci-test (#949 )	2023-03-15 22:02:20 +08:00
Yifan Yang	28af269e5e	Fix for workflow (#934 )	2023-03-09 17:38:15 +08:00
Fangjun Kuang	c01175679e	Add CI test for exporting csj pretrained zipformer to ncnn (#913 )	2023-02-16 21:09:05 +08:00
Fangjun Kuang	c5e687ddf5	Export streaming zipformer to ncnn (#906 )	2023-02-13 23:41:43 +08:00
Fangjun Kuang	2b995639b7	Add ONNX support for Zipformer and ConvEmformer (#884 )	2023-02-09 00:02:38 +08:00
Fangjun Kuang	7ae03f6c88	Add onnx export support for pruned_transducer_stateless5 (#883 )	2023-02-07 17:47:08 +08:00
Fangjun Kuang	8d3810e289	Simplify ONNX export (#881 ) * Simplify ONNX export * Fix ONNX CI tests	2023-02-07 15:01:59 +08:00
Fangjun Kuang	52f3a747be	Refactor onnx export for streaming zipformer (#879 )	2023-02-07 12:12:26 +08:00
Yuekai Zhang	bf5f0342a2	Add streaming onnx export for zipformer (#831 ) * add streaming onnx export for zipformer * update triton support * add comments * add ci test * add onnxmltools for fp16 onnx export	2023-02-06 10:37:07 +08:00
Yunusemre	0f26edfde9	Add Zipformer Onnx Support (#778 ) * add export script * add zipformer onnx pretrained script * add onnx zipformer test * fix style * add zipformer onnx to workflow * replace is_in_onnx_export with is_tracing * add github.event.label.name == 'onnx' * add is_tracing to necessary conditions * fix pooling_mask * add onnx_check * add onnx_check to scripts * add is_tracing to scaling.py	2023-01-03 16:59:44 +08:00
Zengwei Yao	d167aad4ab	Add streaming zipformer (#787 ) * add streaming zipformer codes * add test_model.py * add export.py, pretrained.py, jit_pretrained.py * add cached_len for pooling module * add jit_trace_export.py and jit_trace_pretrained.py * fix bug in jit.trace * update RESULTS.md * add CI test * minor fix in pruned_transducer_stateless7/zipformer.py * update README.md	2022-12-30 10:52:18 +08:00
marcoyang1998	1f0408b103	Support Transformer LM (#750 ) * support transformer LM * show number of parameters during training * update docstring * testing files for ppl calculation * add lm wrampper for rnn and transformer LM * apply lm wrapper in lm shallow fusion * small updates * update decode.py to support LM fusion and LODR * add export.py * update CI and workflow * update decoding results * fix CI * remove transformer LM from CI test	2022-12-29 10:53:36 +08:00
Yifan Yang	070c77e724	Add Blankskip to Zipformer+CTC (#730 ) * init files * add ctc as auxiliary loss and ctc_decode.py * tuning the scalar of HLG score for 1best, nbest and nbest-oracle * rename to pruned_transducer_stateless7_ctc * fix doc * fix bug, recover the hlg scores * modify ctc_decode.py, move out the hlg scale * fix hlg_scale * add export.py and pretrained.py, and so on * upload files, update README.md and RESULTS.md * add CI test * update .gitignore * create symlinks * Add Blank Skip to Zipformer+CTC * Add warmup to blank skip * Add warmup to blank skip * Add __init__.py * Add parameters_names to Adam * Add warmup to blank skip * Modify frame_reducer * Modify frame_reducer * Add Blank Skip to decode. * Add ctc_decode.py * Add blank skip to Zipformer+CTC * process conflict * process conflict * modify ctc_guild_decode_bk.py * modify Lconv * produce the conflict * Add export.py * finish export * fix for running black * Add ci test * Add ci-test * chmod * chmod * fix bug for ci-test * fix bug for ci-test * fix bug for ci-test * rename the dirname * rename the dirname * change dirname * change dirname * fix notes * add pretrained.py * add pretrained.py * add pretrained.py * add pretrained.py * add pretrained.py * add pretrained.py * fix * fix * fix * finished * add the Copyright info and notes Co-authored-by: Zengwei Yao <yaozengwei@outlook.com> Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>	2022-12-21 17:41:31 +08:00
Zengwei Yao	0470bbae66	minor fix for zipformer recipe (#758 ) * minor fix * add CI test	2022-12-13 15:47:30 +08:00
Zengwei Yao	b25c234c51	Add Zipformer-MMI (#746 ) * Minor fix to conformer-mmi * Minor fixes * Fix decode.py * add training files * train with ctc warmup * add pruned_transducer_stateless7_mmi * add zipformer_mmi/mmi_decode.py, using HP as decoding graph * add mmi_decode.py * remove pruned_transducer_stateless7_mmi * rename zipformer_mmi/train_with_ctc.py as zipformer_mmi/train.py * remove unused method * rename mmi_decode.py * add export.py pretrained.py jit_pretrained.py ... * add RESULTS.md * add CI test * add docs * add README.md Co-authored-by: pkufool <wkang.pku@gmail.com>	2022-12-11 21:30:39 +08:00
Fangjun Kuang	f13cf61b05	Convert conv-emformer to ncnn (#717 ) * Export conv-emformer via torch.jit.trace()	2022-12-06 16:34:27 +08:00
Zengwei Yao	8eb4b9d96d	Combining rnnt loss and k2-ctc loss for Dan's Zipformer (#683 ) * init files * add ctc as auxiliary loss and ctc_decode.py * tuning the scalar of HLG score for 1best, nbest and nbest-oracle * rename to pruned_transducer_stateless7_ctc * fix doc * fix bug, recover the hlg scores * modify ctc_decode.py, move out the hlg scale * fix hlg_scale * add export.py and pretrained.py, and so on * upload files, update README.md and RESULTS.md * add CI test	2022-12-03 19:01:10 +08:00
Fangjun Kuang	6533f359c9	Fix CI (#726 ) * Fix CI * Disable shuffle for yesno. See https://github.com/k2-fsa/icefall/issues/197	2022-12-02 10:53:06 +08:00
Fangjun Kuang	2bca7032af	Update RNNLM training scripts (#720 ) * Update RNNLM training scripts * Fix a typo * Fix CI	2022-12-01 15:57:43 +08:00
marcoyang1998	4b5bc480e8	Add low-order density ratio in RNNLM shallow fusion (#678 ) * Support LODR in RNNLM shallow fusion * fix style * fix code style * update workflow and CI * update results * propagate changes to stateless3 * add decoding results for stateless3+giga * fix CI	2022-11-30 17:26:05 +08:00
Zengwei Yao	ece728d895	Apply delay penalty on k2 ctc loss (#669 ) * add init files * fix bug, apply delay penalty * fix decoding code and getting timestamps * add option applying delay penalty on ctc log-prob * fix bug of streaming decoding * minor change for bpe-based case * add test_model.py * add README.md * add CI	2022-11-28 22:34:02 +08:00
Desh Raj	107df3b115	apply black on all files	2022-11-17 09:42:17 -05:00
Fangjun Kuang	60317120ca	Revert "Apply new Black style changes"	2022-11-17 20:19:32 +08:00
Desh Raj	d110b04ad3	apply new black formatting to all files	2022-11-16 13:06:43 -05:00
Fangjun Kuang	855c76655b	Add zipformer from Dan using multi-dataset setup (#675 ) * Bug fix * Change subsamplling factor from 1 to 2 * Implement AttentionCombine as replacement for RandomCombine * Decrease random_prob from 0.5 to 0.333 * Add print statement * Apply single_prob mask, so sometimes we just get one layer as output. * Introduce feature mask per frame * Include changes from Liyong about padding conformer module. * Reduce single_prob from 0.5 to 0.25 * Reduce feature_mask_dropout_prob from 0.25 to 0.15. * Remove dropout from inside ConformerEncoderLayer, for adding to residuals * Increase feature_mask_dropout_prob from 0.15 to 0.2. * Swap random_prob and single_prob, to reduce prob of being randomized. * Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert the 43->48 change. * Randomize order of some modules * Bug fix * Stop backprop bug * Introduce a scale dependent on the masking value * Implement efficient layer dropout * Simplify the learned scaling factor on the modules * Compute valid loss on batch 0. * Make the scaling factors more global and the randomness of dropout more random * Bug fix * Introduce offset in layerdrop_scaleS * Remove final combination; implement layer drop that drops the final layers. * Bug fices * Fix bug RE self.training * Fix bug setting layerdrop mask * Fix eigs call * Add debug info * Remove warmup * Remove layer dropout and model-level warmup * Don't always apply the frame mask * Slight code cleanup/simplification * Various fixes, finish implementating frame masking * Remove debug info * Don't compute validation if printing diagnostics. * Apply layer bypass during warmup in a new way, including 2s and 4s of layers. * Update checkpoint.py to deal with int params * Revert initial_scale to previous values. * Remove the feature where it was bypassing groups of layers. * Implement layer dropout with probability 0.075 * Fix issue with warmup in test time * Add warmup schedule where dropout disappears from earlier layers first. * Have warmup that gradually removes dropout from layers; multiply initialization scales by 0.1. * Do dropout a different way * Fix bug in warmup * Remove debug print * Make the warmup mask per frame. * Implement layer dropout (in a relatively efficient way) * Decrease initial keep_prob to 0.25. * Make it start warming up from the very start, and increase warmup_batches to 6k * Change warmup schedule and increase warmup_batches from 4k to 6k * Make the bypass scale trainable. * Change the initial keep-prob back from 0.25 to 0.5 * Bug fix * Limit bypass scale to >= 0.1 * Revert "Change warmup schedule and increase warmup_batches from 4k to 6k" This reverts commit 86845bd5d859ceb6f83cd83f3719c3e6641de987. * Do warmup by dropping out whole layers. * Decrease frequency of logging variance_proportion * Make layerdrop different in different processes. * For speed, drop the same num layers per job. * Decrease initial_layerdrop_prob from 0.75 to 0.5 * Revert also the changes in scaled_adam_exp85 regarding warmup schedule * Remove unused code LearnedScale. * Reintroduce batching to the optimizer * Various fixes from debugging with nvtx, but removed the NVTX annotations. * Only apply ActivationBalancer with prob 0.25. * Fix s -> scaling for import. * Increase final layerdrop prob from 0.05 to 0.075 * Fix bug where fewer layers were dropped than should be; remove unnecesary print statement. * Fix bug in choosing layers to drop * Refactor RelPosMultiheadAttention to have 2nd forward function and introduce more modules in conformer encoder layer * Reduce final layerdrop_prob from 0.075 to 0.05. * Fix issue with diagnostics if stats is None * Remove persistent attention scores. * Make ActivationBalancer and MaxEig more efficient. * Cosmetic improvements * Change scale_factor_scale from 0.5 to 0.8 * Make the ActivationBalancer regress to the data mean, not zero, when enforcing abs constraint. * Remove unused config value * Fix bug when channel_dim < 0 * Fix bug when channel_dim < 0 * Simplify how the positional-embedding scores work in attention (thanks to Zengwei for this concept) * Revert dropout on attention scores to 0.0. * This should just be a cosmetic change, regularizing how we get the warmup times from the layers. * Reduce beta from 0.75 to 0.0. * Reduce stats period from 10 to 4. * Reworking of ActivationBalancer code to hopefully balance speed and effectiveness. * Add debug code for attention weihts and eigs * Remove debug statement * Add different debug info. * Penalize attention-weight entropies above a limit. * Remove debug statements * use larger delta but only penalize if small grad norm * Bug fixes; change debug freq * Change cutoff for small_grad_norm * Implement whitening of values in conformer. * Also whiten the keys in conformer. * Fix an issue with scaling of grad. * Decrease whitening limit from 2.0 to 1.1. * Fix debug stats. * Reorganize Whiten() code; configs are not the same as before. Also remove MaxEig for self_attn module * Bug fix RE float16 * Revert whitening_limit from 1.1 to 2.2. * Replace MaxEig with Whiten with limit=5.0, and move it to end of ConformerEncoderLayer * Change LR schedule to start off higher * Simplify the dropout mask, no non-dropped-out sequences * Make attention dims configurable, not embed_dim//2, trying 256. * Reduce attention_dim to 192; cherry-pick scaled_adam_exp130 which is linear_pos interacting with query * Use half the dim for values, vs. keys and queries. * Increase initial-lr from 0.04 to 0.05, plus changes for diagnostics * Cosmetic changes * Changes to avoid bug in backward hooks, affecting diagnostics. * Random clip attention scores to -5..5. * Add some random clamping in model.py * Add reflect=0.1 to invocations of random_clamp() * Remove in_balancer. * Revert model.py so there are no constraints on the output. * Implement randomized backprop for softmax. * Reduce min_abs from 1e-03 to 1e-04 * Add RandomGrad with min_abs=1.0e-04 * Use full precision to do softmax and store ans. * Fix bug in backprop of random_clamp() * Get the randomized backprop for softmax in autocast mode working. * Remove debug print * Reduce min_abs from 1.0e-04 to 5.0e-06 * Add hard limit of attention weights to +- 50 * Use normal implementation of softmax. * Remove use of RandomGrad * Remove the use of random_clamp in conformer.py. * Reduce the limit on attention weights from 50 to 25. * Reduce min_prob of ActivationBalancer from 0.1 to 0.05. * Penalize too large weights in softmax of AttentionDownsample() * Also apply limit on logit in SimpleCombiner * Increase limit on logit for SimpleCombiner to 25.0 * Add more diagnostics to debug gradient scale problems * Changes to grad scale logging; increase grad scale more frequently if less than one. * Add logging * Remove comparison diagnostics, which were not that useful. * Configuration changes: scores limit 5->10, min_prob 0.05->0.1, cur_grad_scale more aggressive increase * Reset optimizer state when we change loss function definition. * Make warmup period decrease scale on simple loss, leaving pruned loss scale constant. * Cosmetic change * Increase initial-lr from 0.05 to 0.06. * Increase initial-lr from 0.06 to 0.075 and decrease lr-epochs from 3.5 to 3. * Fixes to logging statements. * Introduce warmup schedule in optimizer * Increase grad_scale to Whiten module * Add inf check hooks * Renaming in optim.py; remove step() from scan_pessimistic_batches_for_oom in train.py * Change base lr to 0.1, also rename from initial lr in train.py * Adding activation balancers after simple_am_prob and simple_lm_prob * Reduce max_abs on am_balancer * Increase max_factor in final lm_balancer and am_balancer * Use penalize_abs_values_gt, not ActivationBalancer. * Trying to reduce grad_scale of Whiten() from 0.02 to 0.01. * Add hooks.py, had negleted to git add it. * don't do penalize_values_gt on simple_lm_proj and simple_am_proj; reduce --base-lr from 0.1 to 0.075 * Increase probs of activation balancer and make it decay slower. * Dont print out full non-finite tensor * Increase default max_factor for ActivationBalancer from 0.02 to 0.04; decrease max_abs in ConvolutionModule.deriv_balancer2 from 100.0 to 20.0 * reduce initial scale in GradScaler * Increase max_abs in ActivationBalancer of conv module from 20 to 50 * --base-lr0.075->0.5; --lr-epochs 3->3.5 * Revert 179->180 change, i.e. change max_abs for deriv_balancer2 back from 50.0 20.0 * Save some memory in the autograd of DoubleSwish. * Change the discretization of the sigmoid to be expectation preserving. * Fix randn to rand * Try a more exact way to round to uint8 that should prevent ever wrapping around to zero * Make it use float16 if in amp but use clamp to avoid wrapping error * Store only half precision output for softmax. * More memory efficient backprop for DoubleSwish. * Change to warmup schedule. * Changes to more accurately estimate OOM conditions * Reduce cutoff from 100 to 5 for estimating OOM with warmup * Make 20 the limit for warmup_count * Cast to float16 in DoubleSwish forward * Hopefully make penalize_abs_values_gt more memory efficient. * Add logging about memory used. * Change scalar_max in optim.py from 2.0 to 5.0 * Regularize how we apply the min and max to the eps of BasicNorm * Fix clamping of bypass scale; remove a couple unused variables. * Increase floor on bypass_scale from 0.1 to 0.2. * Increase bypass_scale from 0.2 to 0.4. * Increase bypass_scale min from 0.4 to 0.5 * Rename conformer.py to zipformer.py * Rename Conformer to Zipformer * Update decode.py by copying from pruned_transducer_stateless5 and changing directory name * Remove some unused variables. * Fix clamping of epsilon * Refactor zipformer for more flexibility so we can change number of encoder layers. * Have a 3rd encoder, at downsampling factor of 8. * Refactor how the downsampling is done so that it happens later, but the 1st encoder stack still operates after a subsampling of 2. * Fix bug RE seq lengths * Have 4 encoder stacks * Have 6 different encoder stacks, U-shaped network. * Reduce dim of linear positional encoding in attention layers. * Reduce min of bypass_scale from 0.5 to 0.3, and make it not applied in test mode. * Tuning change to num encoder layers, inspired by relative param importance. * Make decoder group size equal to 4. * Add skip connections as in normal U-net * Avoid falling off the loop for weird inputs * Apply layer-skip dropout prob * Have warmup schedule for layer-skipping * Rework how warmup count is produced; should not affect results. * Add warmup schedule for zipformer encoder layer, from 1.0 -> 0.2. * Reduce initial clamp_min for bypass_scale from 1.0 to 0.5. * Restore the changes from scaled_adam_219 and scaled_adam_exp220, accidentally lost, re layer skipping * Change to schedule of bypass_scale min: make it larger, decrease slower. * Change schedule after initial loss not promising * Implement pooling module, add it after initial feedforward. * Bug fix * Introduce dropout rate to dynamic submodules of conformer. * Introduce minimum probs in the SimpleCombiner * Add bias in weight module * Remove dynamic weights in SimpleCombine * Remove the 5th of 6 encoder stacks * Fix some typos * small fixes * small fixes * Copy files * Update decode.py * Add changes from the master * Add changes from the master * update results * Add CI * Small fixes * Small fixes Co-authored-by: Daniel Povey <dpovey@gmail.com>	2022-11-15 16:56:05 +08:00
Fangjun Kuang	cedf9aa24f	Fix shallow fusion and add CI tests for it (#676 ) * Fix shallow fusion and add CI tests for it * Fix -1 index in embedding introduced in the zipformer PR	2022-11-13 11:51:00 +08:00
Fangjun Kuang	7e82f87126	Add Zipformer from Dan (#672 )	2022-11-12 18:11:19 +08:00

1 2 3 4

166 Commits