225 Commits

Author SHA1 Message Date
Fangjun Kuang
50d2281524
Add modified transducer loss for AIShell dataset (#219)
* Add modified transducer for aishell.

* Minor fixes.

* Add extra data in transducer training.

The extra data is from http://www.openslr.org/62/

* Update export.py and pretrained.py

* Update CI to install pretrained models with aishell.

* Update results.

* Update results.

* Update README.

* Use symlinks to avoid copies.
2022-03-02 16:02:38 +08:00
Fangjun Kuang
05cb297858
Update result for full libri + GigaSpeech using transducer_stateless. (#231) 2022-03-01 17:01:46 +08:00
Fangjun Kuang
72f838dee1
Update results for transducer_stateless after training for more epochs. (#207) 2022-03-01 16:35:02 +08:00
Daniel Povey
2ff520c800 Improvements to diagnostics (RE those with 1 dim 2022-02-28 12:22:27 +08:00
Daniel Povey
c1063def95 First version of rand-combine iterated-training-like idea. 2022-02-27 17:34:58 +08:00
Daniel Povey
63d8d935d4 Refactor/simplify ConformerEncoder 2022-02-27 13:56:15 +08:00
Daniel Povey
581786a6d3 Adding diagnostics code... 2022-02-27 13:44:43 +08:00
Fangjun Kuang
2332ba312d
Begin to use multiple datasets in training (#213)
* Begin to use multiple datasets.

* Finish preparing training datasets.

* Minor fixes

* Copy files.

* Finish training code.

* Display losses for gigaspeech and librispeech separately.

* Fix decode.py

* Make the probability to select a batch from GigaSpeech configurable.

* Update results.

* Minor fixes.
2022-02-21 15:27:27 +08:00
Fangjun Kuang
1c35ae1dba
Reset seed at the beginning of each epoch. (#221)
* Reset seed at the beginning of each epoch.

* Use a different seed for each epoch.
2022-02-21 15:16:39 +08:00
Wei Kang
b702281e90
Use k2 pruned transducer loss to train conformer-transducer model (#194)
* Using k2 pruned version transducer loss to train model

* Fix style

* Minor fixes
2022-02-17 13:33:54 +08:00
Daniel Povey
2af1b3af98 Remove ReLU in attention 2022-02-14 19:39:19 +08:00
Daniel Povey
d187ad8b73 Change max_frames from 0.2 to 0.15 2022-02-11 16:24:17 +08:00
Daniel Povey
4cd2c02fff Fix num_time_masks code; revert 0.8 to 0.9 2022-02-10 15:53:11 +08:00
Daniel Povey
c170c53006 Change p=0.9 to p=0.8 in SpecAug 2022-02-10 14:59:14 +08:00
Daniel Povey
8aa50df4f0 Change p=0.5->0.9, mask_fraction 0.3->0.2 2022-02-09 22:52:53 +08:00
Wang, Guanbo
70a3c56a18
Fix librispeech train.py (#211)
* fix librispeech train.py

* remove note
2022-02-09 16:42:28 +08:00
Daniel Povey
dd19a6a2b1 Fix to num_feature_masks bug I introduced; reduce max_frames_mask_fraction 0.4->0.3 2022-02-09 12:02:19 +08:00
Daniel Povey
bd36216e8c Use much more aggressive SpecAug setup 2022-02-08 21:55:20 +08:00
Daniel Povey
beaf5bfbab Merge specaug change from Mingshuang. 2022-02-08 19:42:23 +08:00
Daniel Povey
395065eb11 Merge branch 'spec-augment-change' of https://github.com/luomingshuang/icefall into attention_relu_specaug 2022-02-08 19:40:33 +08:00
Mingshuang Luo
3323cabf46 Experiments based on SpecAugment change 2022-02-08 14:25:31 +08:00
Fangjun Kuang
27fa5f05d3
Update git SHA-1 in RESULTS.md for transducer_stateless. (#202) 2022-02-07 18:45:45 +08:00
Fangjun Kuang
a8150021e0
Use modified transducer loss in training. (#179)
* Use modified transducer loss in training.

* Minor fix.

* Add modified beam search.

* Add modified beam search.

* Minor fixes.

* Fix typo.

* Update RESULTS.

* Fix a typo.

* Minor fixes.
2022-02-07 18:37:36 +08:00
Daniel Povey
a859dcb205 Remove learnable offset, use relu instead. 2022-02-07 12:14:48 +08:00
Wei Kang
35ecd7e562
Fix torch.nn.Embedding error for torch below 1.8.0 (#198) 2022-02-06 21:59:54 +08:00
Daniel Povey
48a764eccf Add min in q,k,v of attention 2022-02-06 21:19:37 +08:00
Daniel Povey
8f8ec223a7 Changes to fbank computation, use lilcom chunky writer 2022-02-06 21:18:40 +08:00
pkufool
fcd25bdfff Fix torch.nn.Embedding error for torch below 1.8.0 2022-02-06 18:22:56 +08:00
Wei Kang
5ae80dfca7
Minor fixes (#193) 2022-01-27 18:01:17 +08:00
Piotr Żelasko
1731cc37bb Black 2022-01-24 10:20:22 -05:00
Piotr Żelasko
f92c24a73a
Merge branch 'master' into feature/libri-conformer-phone-ctc 2022-01-24 10:18:56 -05:00
Piotr Żelasko
565c1d8413 Address code review 2022-01-24 10:17:47 -05:00
Piotr Żelasko
1d5fe8afa4 flake8 2022-01-21 17:27:02 -05:00
Piotr Żelasko
f0f35e6671 black 2022-01-21 17:22:41 -05:00
Piotr Żelasko
f28951f2b6 Add an assertion 2022-01-21 17:16:49 -05:00
Piotr Żelasko
3d109b121d Remove train_phones.py and modify train.py instead 2022-01-21 17:08:53 -05:00
Fangjun Kuang
d6050eb02e Fix calling optimized_transducer after new release. (#182) 2022-01-21 08:18:50 +08:00
Fangjun Kuang
f94ff19bfe
Refactor beam search and update results. (#177) 2022-01-18 16:40:19 +08:00
Fangjun Kuang
273e5fb2f3
Update git SHA1 for transducer_stateless model. (#174) 2022-01-10 11:58:17 +08:00
Fangjun Kuang
4c1b3665ee
Use optimized_transducer to compute transducer loss. (#162)
* WIP: Use optimized_transducer to compute transducer loss.

* Minor fixes.

* Fix decoding.

* Fix decoding.

* Add RESULTS.

* Update RESULTS.

* Update CI.

* Fix sampling rate for yesno recipe.
2022-01-10 11:54:58 +08:00
Fangjun Kuang
413b2e8569
Add git sha1 to RESULTS.md for conformer encoder + stateless decoder. (#160) 2021-12-28 12:04:01 +08:00
Fangjun Kuang
14c93add50
Remove batchnorm, weight decay, and SOS from transducer conformer encoder (#155)
* Remove batchnorm, weight decay, and SOS.

* Make --context-size configurable.

* Update results.
2021-12-27 16:01:10 +08:00
Fangjun Kuang
8187d6236c
Minor fix to maximum number of symbols per frame for RNN-T decoding. (#157)
* Minor fix to maximum number of symbols per frame RNN-T decoding.

* Minor fixes.
2021-12-24 21:48:40 +08:00
Fangjun Kuang
5b6699a835
Minor fixes to the RNN-T Conformer model (#152)
* Disable weight decay.

* Remove input feature batchnorm..

* Replace BatchNorm in the Conformer model with LayerNorm.

* Use tanh in the joint network.

* Remove sos ID.

* Reduce the number of decoder layers from 4 to 2.

* Minor fixes.

* Fix typos.
2021-12-23 13:54:25 +08:00
Fangjun Kuang
fb6a57e9e0
Increase the size of the context in the RNN-T decoder. (#153) 2021-12-23 07:55:02 +08:00
Fangjun Kuang
cb04c8a750
Limit the number of symbols per frame in RNN-T decoding. (#151) 2021-12-18 11:00:42 +08:00
Fangjun Kuang
1d44da845b
RNN-T Conformer training for LibriSpeech (#143)
* Begin to add RNN-T training for librispeech.

* Copy files from conformer_ctc.

Will edit it.

* Use conformer/transformer model as encoder.

* Begin to add training script.

* Add training code.

* Remove long utterances to avoid OOM when a large max_duraiton is used.

* Begin to add decoding script.

* Add decoding script.

* Minor fixes.

* Add beam search.

* Use LSTM layers for the encoder.

Need more tunings.

* Use stateless decoder.

* Minor fixes to make it ready for merge.

* Fix README.

* Update RESULT.md to include RNN-T Conformer.

* Minor fixes.

* Fix tests.

* Minor fixes.

* Minor fixes.

* Fix tests.
2021-12-18 07:42:51 +08:00
Wei Kang
a183d5bfd7
Remove batchnorm (#147)
* Remove batch normalization

* Minor fixes

* Fix typo

* Fix comments

* Add assertion for use_feat_batchnorm
2021-12-14 08:20:03 +08:00
Fangjun Kuang
1aff64b708
Apply layer normalization to the output of each gate in LSTM/GRU. (#139)
* Apply layer normalization to the output of each gate in LSTM.

* Apply layer normalization to the output of each gate in GRU.

* Add projection support to LayerNormLSTMCell.

* Add GPU tests.

* Use typeguard.check_argument_types() to validate type annotations.

* Add typeguard as a requirement.

* Minor fixes.

* Fix CI.

* Fix CI.

* Fix test failures for torch 1.8.0

* Fix errors.
2021-12-07 18:38:03 +08:00
Fangjun Kuang
ec591698b0
Associate a cut with token alignment (without repeats) (#125)
* WIP: Associate a cut with token alignment (without repeats)

* Save framewise alignments with/without repeats.

* Minor fixes.
2021-11-29 18:50:54 +08:00