Daniel Povey
5d69acb25b
Add max-abs-value
2022-03-13 13:15:20 +08:00
Daniel Povey
db7a3b6eea
Reduce initial_scale.
2022-03-12 18:50:02 +08:00
Daniel Povey
b7b2d8970b
Cosmetic change
2022-03-12 17:47:35 +08:00
Daniel Povey
a392cb9fbc
Reduce initial scaling of modules
2022-03-12 16:53:03 +08:00
Daniel Povey
ca8cf2a73b
Another rework, use scales on linear/conv
2022-03-12 15:38:13 +08:00
Daniel Povey
0abba9e7a2
Fix self.post-scale-mha
2022-03-12 11:20:44 +08:00
Daniel Povey
76a2b9d362
Add learnable post-scale for mha
2022-03-12 11:19:49 +08:00
Daniel Povey
7eb5a84cbe
Add identity pre_norm_final for diagnostics.
2022-03-11 21:00:43 +08:00
Daniel Povey
cc558faf26
Fix scale from 0.5 to 2.0 as I really intended..
2022-03-11 19:11:50 +08:00
Daniel Povey
98156711ef
Introduce in_scale=0.5 for SwishExpScale
2022-03-11 19:07:34 +08:00
Daniel Povey
5eafccb369
Change how scales are applied; fix residual bug
2022-03-11 17:46:33 +08:00
Daniel Povey
bcf417fce2
Change max_factor in DerivBalancer from 0.025 to 0.01; fix scaling code.
2022-03-11 14:47:46 +08:00
Daniel Povey
2940d3106f
Fix q*scaling logic
2022-03-11 14:44:13 +08:00
Daniel Povey
ab9a17413a
Scale up pos_bias_u and pos_bias_v before use.
2022-03-11 14:37:52 +08:00
Daniel Povey
2fa9c636a4
use nonzero threshold in DerivBalancer
2022-03-10 23:24:55 +08:00
Daniel Povey
425e274c82
Replace norm in ConvolutionModule with a scaling factor.
2022-03-10 16:01:53 +08:00
Daniel Povey
b55472bb42
Replace most normalizations with scales (still have norm in conv)
2022-03-10 14:43:54 +08:00
Daniel Povey
a37d98463a
Restore ConvolutionModule to state before changes; change all Swish,Swish(Swish) to SwishOffset.
2022-03-06 11:55:02 +08:00
Daniel Povey
8a8b81cd18
Replace relu with swish-squared.
2022-03-05 22:21:42 +08:00
Daniel Povey
5f2c0a09b7
Convert swish nonlinearities to ReLU
2022-03-05 16:28:24 +08:00
Daniel Povey
65b09dd5f2
Double the threshold in brelu; slightly increase max_factor.
2022-03-05 00:07:14 +08:00
Daniel Povey
6252282fd0
Add deriv-balancing code
2022-03-04 20:19:11 +08:00
Daniel Povey
eb3ed54202
Reduce scale from 50 to 20
2022-03-04 15:56:45 +08:00
Daniel Povey
9cc5999829
Fix duplicate Swish; replace norm+swish with swish+exp-scale in convolution module
2022-03-04 15:50:51 +08:00
Daniel Povey
7e88999641
Increase scale from 20 to 50.
2022-03-04 14:31:29 +08:00
Daniel Povey
3207bd98a9
Increase scale on Scale from 4 to 20
2022-03-04 13:16:40 +08:00
Daniel Povey
cd216f50b6
Add import
2022-03-04 11:03:01 +08:00
Daniel Povey
bc6c720e25
Combine ExpScale and swish for memory reduction
2022-03-04 10:52:05 +08:00
Daniel Povey
23b3aa233c
Double learning rate of exp-scale units
2022-03-04 00:42:37 +08:00
Daniel Povey
5c177fc52b
pelu_base->expscale, add 2xExpScale in subsampling, and in feedforward units.
2022-03-03 23:52:03 +08:00
Daniel Povey
3fb559d2f0
Add baseline for the PeLU expt, keeping only the small normalization-related changes.
2022-03-02 18:27:08 +08:00
Daniel Povey
9d1b4ae046
Add pelu to this good-performing setup..
2022-03-02 16:33:27 +08:00
Daniel Povey
c1063def95
First version of rand-combine iterated-training-like idea.
2022-02-27 17:34:58 +08:00
Daniel Povey
63d8d935d4
Refactor/simplify ConformerEncoder
2022-02-27 13:56:15 +08:00
Daniel Povey
2af1b3af98
Remove ReLU in attention
2022-02-14 19:39:19 +08:00
Daniel Povey
a859dcb205
Remove learnable offset, use relu instead.
2022-02-07 12:14:48 +08:00
Daniel Povey
48a764eccf
Add min in q,k,v of attention
2022-02-06 21:19:37 +08:00
Fangjun Kuang
14c93add50
Remove batchnorm, weight decay, and SOS from transducer conformer encoder ( #155 )
...
* Remove batchnorm, weight decay, and SOS.
* Make --context-size configurable.
* Update results.
2021-12-27 16:01:10 +08:00
Fangjun Kuang
cb04c8a750
Limit the number of symbols per frame in RNN-T decoding. ( #151 )
2021-12-18 11:00:42 +08:00
Fangjun Kuang
1d44da845b
RNN-T Conformer training for LibriSpeech ( #143 )
...
* Begin to add RNN-T training for librispeech.
* Copy files from conformer_ctc.
Will edit it.
* Use conformer/transformer model as encoder.
* Begin to add training script.
* Add training code.
* Remove long utterances to avoid OOM when a large max_duraiton is used.
* Begin to add decoding script.
* Add decoding script.
* Minor fixes.
* Add beam search.
* Use LSTM layers for the encoder.
Need more tunings.
* Use stateless decoder.
* Minor fixes to make it ready for merge.
* Fix README.
* Update RESULT.md to include RNN-T Conformer.
* Minor fixes.
* Fix tests.
* Minor fixes.
* Minor fixes.
* Fix tests.
2021-12-18 07:42:51 +08:00