1376 Commits

Author SHA1 Message Date
Daniel Povey
c5a037b8bc Merge branch 'pradam_exp1l3' into pradam_exp1m3 2022-07-30 08:21:28 +08:00
Daniel Povey
17bc002e6e Refactoring that does not affect results. 2022-07-30 07:45:29 +08:00
Daniel Povey
3110138ab5 Smooth grad_cov with eps; add a 4th stage of smoothing, this time on Z_inv. 2022-07-30 07:30:32 +08:00
Wei Kang
2f75236c05
Support dynamic chunk streaming training in pruned_transcuder_stateless5 (#454)
* support dynamic chunk streaming training

* Add simulate streaming decoding

* Support streaming decoding

* fix causal

* Minor fixes

* fix streaming decode; add results
2022-07-29 16:40:06 +08:00
Daniel Povey
ca28f46f75 Merge branch 'pradam_exp1l2' into pradam_exp1m2 2022-07-29 15:16:10 +08:00
Daniel Povey
3ad042444e More changes to reduce numerical roundoff for dims with zero grad and params. 2022-07-29 14:38:50 +08:00
Mingshuang Luo
1b478d3ac3
Add other decoding methods (nbest, nbest oracle, nbest LG) for wenetspeech pruned rnnt2 (#482)
* add other decoding methods for wenetspeech

* changes for RESULTS.md

* add ngram-lm-scale=0.35 results

* set ngram-lm-scale=0.35 as default

* Update README.md

* add nbest-scale for flie name
2022-07-29 12:03:08 +08:00
Daniel Povey
a55f8c9c14 Modify scaling.py to prevent constant values 2022-07-29 11:38:05 +08:00
Lucky Wong
34b4356bad
correction for get rank id. (#507)
* Fix no attribute 'data' error.

* minor fixes

* correction for get rank id.
2022-07-29 11:28:52 +08:00
Fangjun Kuang
ec69967584
Set overwrite=True when extracting features in batches. (#487) 2022-07-29 11:17:19 +08:00
Daniel Povey
9d7af4be20 Modify scaling.py to prevent constant values 2022-07-29 09:34:13 +08:00
Daniel Povey
3c1fddaf48 Rework computation to reduce numerical roundoff 2022-07-29 06:22:17 +08:00
Mingshuang Luo
389f9c77e5
correction for prepare.sh (#506) 2022-07-28 17:01:46 +08:00
boji123
3c9e7f733b
[debug] raise remind when git-lfs not available (#504)
* [debug] raise remind when git-lfs not available

* modify comment
2022-07-28 16:17:49 +08:00
Daniel Povey
633cbd551a Increase lr_update_period from 200,4000 to 400, 5000 2022-07-28 14:45:45 +08:00
Mingshuang Luo
f26b62ac00
[WIP] Pruned-transducer-stateless5-for-WenetSpeech (offline and streaming) (#447)
* pruned-rnnt5-for-wenetspeech

* style check

* style check

* add streaming conformer

* add streaming decode

* changes codes for fast_beam_search and export cpu jit

* add modified-beam-search for streaming decoding

* add modified-beam-search for streaming decoding

* change for streaming_beam_search.py

* add README.md and RESULTS.md

* change for style_check.yml

* do some changes

* do some changes for export.py

* add some decode commands for usage

* add streaming results on README.md
2022-07-28 12:54:27 +08:00
Daniel Povey
0d038a6ea4 Remove debugging statement 2022-07-28 09:26:11 +08:00
Daniel Povey
8654a7385d Add denom_rel_eps, and set it to 1e-05 2022-07-28 09:10:20 +08:00
Daniel Povey
dc565f729b Take into account various outcomes from parameter tuning 2022-07-28 09:06:59 +08:00
Daniel Povey
daa55d5a3c Patches to make decoding work correctly at utt start, for greedy_search 2022-07-27 09:35:39 +08:00
Fangjun Kuang
385645d533
Fix get_transducer_model() for aishell. (#497)
PR #495 introduces an error. This commit fixes it.
2022-07-26 15:42:21 +08:00
Daniel Povey
e25ca74955 Use a measure of correlation for eigs that can be negative. 2022-07-26 13:40:57 +08:00
Daniel Povey
b9696878b4 Update diagnostics stats 2022-07-26 12:39:51 +08:00
Fangjun Kuang
d3fc4b031e
Support using aidatatang_200zh optionally in aishell training (#495)
* Use aidatatang_200zh optionally in aishell training.
2022-07-26 11:25:01 +08:00
Fangjun Kuang
4612b03947
Fix using G before assignment in pruned_transducer_stateless/decode.py (#494) 2022-07-26 10:37:02 +08:00
Wei Kang
b1d0956855
Add modified_beam_search for streaming decode (#489)
* Add modified_beam_search for pruned_transducer_stateless/streaming_decode.py

* refactor

* modified beam search for stateless3,4

* Fix comments

* Add real streamng ci
2022-07-25 16:53:23 +08:00
Zengwei Yao
8203d10be7
Add stats about duration and padding proportion (#485)
* add stats about duration and padding proportion

* add  for utt_duration

* add stats for other recipes

* add stats for other 2 recipes

* modify doc

* minor change
2022-07-25 16:40:43 +08:00
Fangjun Kuang
d99796898c
Update doc to add a link to Nadira Povey's YouTube channel. (#492)
* Update doc to add a link to Nadira Povey's YouTube channel.

* fix a typo
2022-07-25 12:06:40 +08:00
Daniel Povey
fe595f8772 Improve debugging output. 2022-07-25 09:02:36 +08:00
Daniel Povey
854c2965a9 Fix bug regarding G_prime being zero 2022-07-25 06:57:52 +08:00
Daniel Povey
3acdf3b395 Reworking the computation of Z to be numerically better. 2022-07-25 06:37:26 +08:00
Daniel Povey
5513f7fee5 Initial version of fixing numerical issue, will continue though 2022-07-25 06:27:01 +08:00
Daniel Povey
b0f0c6c4ab Setting lr_update_period=(200,4k) in train.py 2022-07-25 04:38:12 +08:00
Daniel Povey
06718052ec Refactoring, putting tunable values in constructor, a little cleanup 2022-07-25 04:31:42 +08:00
Daniel Povey
8efc512823 Remove some debugging code, found the mismatch 2022-07-24 11:52:10 +08:00
Daniel Povey
ba96439c76 Saving version I am trying to debug 2022-07-24 11:00:40 +08:00
Daniel Povey
962e95f119 Using a more flexible test. Moved to simpler update , tuned diffrently. 2022-07-24 09:20:53 +08:00
Daniel Povey
b8a9485011 Print git version for test output 2022-07-24 06:54:29 +08:00
Daniel Povey
48ac7e0bc3 Add max as well as min to G_prime 2022-07-24 06:50:05 +08:00
Daniel Povey
6290fcb535 Cleanup and refactoring 2022-07-24 05:48:38 +08:00
Daniel Povey
8a9bbb93bc Cosmetic fixes 2022-07-24 04:45:57 +08:00
Daniel Povey
966ac36cde Fixes to comments 2022-07-24 04:36:41 +08:00
Daniel Povey
33ffd17515 Some cleanup 2022-07-24 04:22:11 +08:00
Daniel Povey
ddceb7963b Interpolate between iterative estimate of scale, and original value. 2022-07-23 15:27:48 +08:00
Daniel Povey
2c4bdd0ad0 Add _update_param_scales_simple(), add documentation 2022-07-23 14:49:58 +08:00
Daniel Povey
9730352257 Redce smoothing constant slightly 2022-07-23 13:12:31 +08:00
Daniel Povey
e1873fc0bb Tune phase2 again, from 0.005,5.0 to 0.01,40. Epoch 140 is 0.21/0.149 2022-07-23 10:10:01 +08:00
Daniel Povey
0fc58bac56 More tuning, epoch-140 results are 0.23,0.11 2022-07-23 09:52:51 +08:00
Daniel Povey
34a2d331bf Smooth in opposite orientation to G 2022-07-23 09:38:16 +08:00
Daniel Povey
a972655a70 Tuning. 2022-07-23 09:15:49 +08:00