122 Commits

Author SHA1 Message Date
Daniel Povey
e7e7560bba Implement chunking 2023-02-10 15:02:29 +08:00
Daniel Povey
167b58baa0 Make output dim of Zipformer be max dim 2023-01-14 14:29:29 +08:00
Daniel Povey
fb7a967276 Increase unmasked dims 2023-01-13 17:38:11 +08:00
Daniel Povey
bebc27f274 Increasing encoder-dim of some layers, and unmasked-dim 2023-01-13 17:36:45 +08:00
Daniel Povey
e6af583ee1 Increase encoder-dim of slowest stack from 320 to 384 2023-01-13 14:40:42 +08:00
Daniel Povey
a88587dc8a Fix comment; have 6, not 4, layers in most-downsampled stack. 2023-01-13 00:12:46 +08:00
Daniel Povey
bac72718f0 Bug fixes, config changes 2023-01-12 22:11:42 +08:00
Daniel Povey
1e04c3d892 Reduce dimension for speed, have varying dims 2023-01-12 21:15:39 +08:00
Daniel Povey
c7107ead64 Fix bug in get_adjusted_batch_count 2023-01-07 17:45:22 +08:00
Daniel Povey
9242800d42 Remove the 8x-subsampled stack 2023-01-07 12:59:57 +08:00
Daniel Povey
ef48019d6e Reduce feedforward-dims 2023-01-06 22:26:58 +08:00
Daniel Povey
6a762914bf Increase base-lr from 0.05 t to 0.055 2023-01-06 13:35:57 +08:00
Daniel Povey
90c02b471c Revert base LR to 0.05 2023-01-05 16:27:43 +08:00
Daniel Povey
067b861c70 Use largest LR for printing 2023-01-05 14:46:15 +08:00
Daniel Povey
6c7fd8c046 Increase base-lr to 0.06 2023-01-05 14:23:59 +08:00
Daniel Povey
0d7161ebec Use get_parameter_groups_with_lr in train.py; bug fixes 2023-01-05 14:11:33 +08:00
Daniel Povey
b7be18c2f8 Keep only needed changes from Liyong's branch 2023-01-05 12:23:32 +08:00
Daniel Povey
096ebeaf23 take a couple files from liyong's branch 2023-01-05 12:01:42 +08:00
Daniel Povey
829e4bd4db Bug fix in save-bad-model code 2022-12-21 15:33:58 +08:00
Daniel Povey
266e71cc79 Save checkpoint on failure. 2022-12-21 15:09:16 +08:00
Daniel Povey
d2b272ab50 Add back 2 conformer layers in 1st stack. 2022-12-20 13:54:06 +08:00
Daniel Povey
28cac1c2dc Merge debugging changes to optimizer. 2022-12-20 13:01:50 +08:00
Daniel Povey
b546ac866c Merge change from 726, set batch count at start of loop for repeatability. 2022-12-20 11:48:50 +08:00
Daniel Povey
2cc5bc18be Merge branch 'scaled_adam_exp731' into scaled_adam_exp737
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
2022-12-20 00:04:49 +08:00
Daniel Povey
f439399ced Adjust batch count w.r.t. reference duration 2022-12-18 14:25:23 +08:00
Daniel Povey
0341ff1ec5 One more convnext layer, two fewer conformer layers. 2022-12-17 22:00:58 +08:00
Daniel Povey
286b2021c2 Convert batch index to int 2022-12-17 16:31:45 +08:00
Daniel Povey
2c0cec86a3 Set batch count less frequently 2022-12-17 16:31:24 +08:00
Daniel Povey
912adfff7c Increase all ff dims by 256 2022-12-08 21:11:58 +08:00
Daniel Povey
6e598cb18d Reduce top grad_scale limit from 128 to 32. 2022-12-08 18:36:29 +08:00
Daniel Povey
3f82ee0783 Merge dropout schedule, 0.3 ... 0.1 over 20k batches 2022-12-08 18:18:46 +08:00
Daniel Povey
63e881f89b Pass in dropout from train.py 2022-12-05 23:49:40 +08:00
Daniel Povey
0da228c587 Restore the computation of valid stats. 2022-12-05 19:50:25 +08:00
Daniel Povey
7999dd0dbe Introduce scalar multiplication and change rules for updating gradient scale. 2022-12-05 16:15:20 +08:00
Daniel Povey
12e8c3f0fa One more layer on input 2022-11-29 16:47:24 +08:00
Daniel Povey
87ef4078d3 Add two more layers. 2022-11-28 13:56:40 +08:00
Daniel Povey
f483f1e0ef Implement attention weights sharing for successive layers, for Zipformer 2022-11-28 13:41:11 +08:00
Daniel Povey
a6fb9772a8 Remove 4 layers. 2022-11-27 13:29:29 +08:00
Daniel Povey
f71b1d2c3a Add 4 more layers 2022-11-26 21:18:24 +08:00
Daniel Povey
320c58401f Increase 2 feedforward dims from 1.5k to 2k. 2022-11-26 19:45:41 +08:00
Daniel Povey
1d0252d420 Merge branch 'scaled_adam_exp466' into scaled_adam_exp472.
Below is a more complete list of the changes I am making, although some of
these may be counted in the last

  numbers XXX below correspond to branches numbered scaled_adam_expXXX.
    - from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules,
      but simplified this a little to use the same dropout schedule and drop them out all together
      also have all 3 submodules use separate heads.
    - from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module.
    - merge from 462->467, about using TanSwish not tanh.
    - merge 462->465, remove whitening in self-attention module
    - merge the part of 465->466  that was about diagnostics (name in Whiten module)
2022-11-23 14:41:09 +08:00
Daniel Povey
6c5763fbb3 Implement subtracted momentum [0.33,0.66], and print name in Whiten module. 2022-11-22 21:57:48 +08:00
Daniel Povey
99cd9f5788 Add more layers. 2022-11-22 19:48:42 +08:00
Daniel Povey
211e3af680 Remove changes in previous merge commit that did not relate to length_factor. 2022-11-21 14:32:05 +08:00
Daniel Povey
a52ec3da28 Change feedforward dims: increase 1536->1792 for largest ff dim and move it one step later. 2022-11-20 14:24:41 +08:00
Daniel Povey
8a095c1cd1 Add SmallConvModule; decrease feedforward dims to keep about same num params. 2022-11-18 12:46:40 +08:00
Daniel Povey
e9806950f5 Reduce pos-dim from 96 to 48. 2022-11-17 23:42:39 +08:00
Daniel Povey
27f8497fea Reduce pos_dim from 128 to 96. 2022-11-17 10:39:36 +08:00
Daniel Povey
526b5e59a6 Increase pos-head-dim from 2 to 4. 2022-11-16 11:53:55 +08:00
Daniel Povey
fc74ff63fb Remove one feedforward module and give params to the other 2. 2022-11-16 11:46:05 +08:00