201 Commits

Author SHA1 Message Date
Fangjun Kuang
cbf8c18ebd
Minor fixes for aishell (#218)
* Minor fixes to aishell.

* Minor fixes.
2022-02-19 22:28:19 +08:00
Wei Kang
b702281e90
Use k2 pruned transducer loss to train conformer-transducer model (#194)
* Using k2 pruned version transducer loss to train model

* Fix style

* Minor fixes
2022-02-17 13:33:54 +08:00
Wang, Guanbo
e8eb408760
Incremental pruning threshold (#214)
* Incremental pruning threshold

* flake8

* black

* minor fix
2022-02-16 16:59:27 +08:00
Wang, Guanbo
be1c86b06c
print num_frame as %.2f (#204) 2022-02-08 14:56:58 +08:00
Piotr Żelasko
f92c24a73a
Merge branch 'master' into feature/libri-conformer-phone-ctc 2022-01-24 10:18:56 -05:00
Piotr Żelasko
f0f35e6671 black 2022-01-21 17:22:41 -05:00
Piotr Żelasko
3d109b121d Remove train_phones.py and modify train.py instead 2022-01-21 17:08:53 -05:00
huangruizhe
298faabb90
minor fixes 2022-01-02 23:38:33 -08:00
huangruizhe
7577b08bed
fixed the mistake 2022-01-02 23:32:43 -08:00
huangruizhe
82c8fac6ee
fixed a case where BOW can have problem to compute (ZeroDivisionError) 2022-01-02 15:29:50 -08:00
huangruizhe
0a67015d63
Update make_kn_lm.py 2022-01-02 00:27:27 -08:00
huangruizhe
49aab7e658
Update make_kn_lm.py
Fixed issue #163
2022-01-02 00:14:27 -08:00
Fangjun Kuang
95af039733
RNN-T training for yesno. (#141)
* RNN-T training for yesno.

* Rename Jointer to Joiner.
2021-12-07 21:44:37 +08:00
Fangjun Kuang
ec591698b0
Associate a cut with token alignment (without repeats) (#125)
* WIP: Associate a cut with token alignment (without repeats)

* Save framewise alignments with/without repeats.

* Minor fixes.
2021-11-29 18:50:54 +08:00
Fangjun Kuang
0e541f5b5d
Print hostname and IP address to the log. (#131)
We are using multiple machines to do various experiments. It makes
life easier to know which experiment is running on which machine
if we also log the IP and hostname of the machine.
2021-11-26 11:25:59 +08:00
Piotr Żelasko
8eb94fa4a0 CTC-only phone conformer recipe for LibriSpeech 2021-11-23 15:34:46 -05:00
Wei Kang
4151cca147
Add torch script support for Aishell and update documents (#124)
* Add aishell recipe

* Remove unnecessary code and update docs

* adapt to k2 v1.7, add docs and results

* Update conformer ctc model

* Update docs, pretrained.py & results

* Fix code style

* Fix code style

* Fix code style

* Minor fix

* Minor fix

* Fix pretrained.py

* Update pretrained model & corresponding docs

* Export torch script model for Aishell

* Add C++ deployment docs

* Minor fixes

* Fix unit test

* Update Readme
2021-11-19 16:37:05 +08:00
Wei Kang
30c43b7f69
Add aishell recipe (#30)
* Add aishell recipe

* Remove unnecessary code and update docs

* adapt to k2 v1.7, add docs and results

* Update conformer ctc model

* Update docs, pretrained.py & results

* Fix code style

* Fix code style

* Fix code style

* Minor fix

* Minor fix

* Fix pretrained.py

* Update pretrained model & corresponding docs
2021-11-18 10:00:47 +08:00
Fangjun Kuang
5b10310bd1 Handle empty lattices in attention decoder rescoring. 2021-11-11 15:42:30 +08:00
Fangjun Kuang
8d679c3e74
Fix typos. (#115) 2021-11-10 14:45:30 +08:00
Fangjun Kuang
21096e99d8
Update result for the librispeech recipe using vocab size 500 and att rate 0.8 (#113)
* Update RESULTS using vocab size 500, att rate 0.8

* Update README.

* Refactoring.

Since FSAs in an Nbest object are linear in structure, we can
add the scores of a path to compute the total scores.

* Update documentation.

* Change default vocab size from 5000 to 500.
2021-11-10 14:32:52 +08:00
Fangjun Kuang
04029871b6
Fix a bug in Nbest.compute_am_scores and Nbest.compute_lm_scores. (#111) 2021-11-09 13:44:51 +08:00
Fangjun Kuang
91cfecebf2
Remove duplicated token seq in rescoring. (#108)
* Remove duplicated token seq in rescoring.

* Use a larger range for ngram_lm_scale and attention_scale
2021-11-06 08:54:45 +08:00
Fangjun Kuang
12d647d899
Add a note about the CUDA OOM error. (#94)
* Add a note about the CUDA OOM error.

Some users consider this kind of OOM as an error during decoding,
but actually it is not. This pull request clarifies that.

* Fix style issues.
2021-10-29 12:17:56 +08:00
Fangjun Kuang
8cb7f712e4
Use GPU for averaging checkpoints if possible. (#84) 2021-10-26 17:10:04 +08:00
Fangjun Kuang
53b79fafa7
Add MMI training with word pieces as modelling unit. (#6)
* Fix an error in TDNN-LSTM training.

* WIP: Refactoring

* Refactor transformer.py

* Remove unused code.

* Minor fixes.

* Fix decoder padding mask.

* Add MMI training with word pieces.

* Remove unused files.

* Minor fixes.

* Refactoring.

* Minor fixes.

* Use pre-computed alignments in LF-MMI training.

* Minor fixes.

* Update decoding script.

* Add doc about how to check and use extracted alignments.

* Fix style issues.

* Fix typos.

* Fix style issues.

* Disable macOS tests for now.
2021-10-18 15:20:32 +08:00
Fangjun Kuang
4890e27b45
Extract framewise alignment information using CTC decoding (#39)
* Use new APIs with k2.RaggedTensor

* Fix style issues.

* Update the installation doc, saying it requires at least k2 v1.7

* Extract framewise alignment information using CTC decoding.

* Print environment information.

Print information about k2, lhotse, PyTorch, and icefall.

* Fix CI.

* Fix CI.

* Compute framewise alignment information of the LibriSpeech dataset.

* Update comments for the time to compute alignments of train-960.

* Preserve cut id in mix cut transformer.

* Minor fixes.

* Add doc about how to extract framewise alignments.
2021-10-18 14:24:33 +08:00
Mingshuang Luo
597c5efdb1
Use LossRecord to record and print the loss for the training process (#62)
* Update index.rst (AS->ASR)

* Update conformer_ctc.rst (pretraind->pretrained)

* Fix some spelling errors.

* Fix some spelling errors.

* Use LossRecord to record and print loss in the training process

* Change the name "LossRecord" to "MetricsTracker"
2021-10-12 15:58:03 +08:00
Fangjun Kuang
707d7017a7
Support pure ctc decoding requiring neither a lexicon nor an n-gram LM (#58)
* Rename lattice_score_scale to nbest_scale.

* Support pure CTC decoding requiring neither a lexicion nor an n-gram LM.

* Fix style issues.

* Fix a typo.

* Minor fixes.
2021-09-26 14:21:49 +08:00
Fangjun Kuang
455693aede
Fix hasattr of AttributeDict. (#52) 2021-09-22 16:37:20 +08:00
Fangjun Kuang
a80e58e15d
Refactor decode.py to make it more readable and more modular. (#44)
* Refactor decode.py to make it more readable and more modular.

* Fix an error.

Nbest.fsa should always have token IDs as labels and
word IDs as aux_labels.

* Add nbest decoding.

* Compute edit distance with k2.

* Refactor nbest-oracle.

* Add rescore with nbest lists.

* Add whole-lattice rescoring.

* Add rescoring with attention decoder.

* Refactoring.

* Fixes after refactoring.

* Fix a typo.

* Minor fixes.

* Replace [] with () for shapes.

* Use k2 v1.9

* Use Levenshtein graphs/alignment from k2 v1.9

* [doc] Require k2 >= v1.9

* Minor fixes.
2021-09-20 15:44:54 +08:00
Fangjun Kuang
cc77cb3459
Fix decode.py to remove the correct axis. (#50)
* Fix decode.py to remove the correct axis.

* Run GitHub actions manually.
2021-09-17 16:49:03 +08:00
Wei Kang
9a6e0489c8
update api for RaggedTensor (#45)
* Fix code style

* update k2 version in CI

* fix compile hlg
2021-09-14 16:39:56 +08:00
Fangjun Kuang
abadc71415
Use new APIs with k2.RaggedTensor (#38)
* Use new APIs with k2.RaggedTensor

* Fix style issues.

* Update the installation doc, saying it requires at least k2 v1.7

* Use k2 v1.7
2021-09-08 14:55:30 +08:00
Fangjun Kuang
1bd5dcc8ac
WIP: Add doc for the LibriSpeech recipe. (#24)
* WIP: Add doc for the LibriSpeech recipe.

* Add more doc for LibriSpeech recipe.

* Add more doc for the LibriSpeech recipe.

* More doc.
2021-08-24 20:28:32 +08:00
Fangjun Kuang
6c2c9b9d74
Add recipe for the yes_no dataset. (#16)
* Add recipe for the yes_no dataset.

* Refactoring: Remove unused code.

* Add Colab notebook for the yesno dataset.

* Add GitHub actions to run yesno.

* Fix a typo.

* Minor fixes.

* Train more epochs for GitHub actions.

* Minor fixes.

* Minor fixes.

* Fix style issues.
2021-08-23 11:36:29 +08:00
pkufool
19c4214958
Fix code style and add copyright. (#18)
* Fix style and add copyright

* Minor fix

* Remove duplicate lines

* Reformat conformer.py by black

* Reformat code style with black.

* Fix github workflows

* Fix lhotse installation

* Install icefall requirements

* Update k2 version, remove lhotse from test workflow
2021-08-23 10:43:59 +08:00
Fangjun Kuang
8469f9ae0a
Refactor asr_datamodule. (#15)
* WIP: Refactor asr_datamodule.

* Fixes after review.

* Minor fixes.
2021-08-21 09:53:46 +08:00
Fangjun Kuang
9d0cc9d829
Support computing nbest oracle WER. (#10)
* Support computing nbest oracle WER.

* Add scale to all nbest based decoding/rescoring methods.

* Add script to run pretrained models.

* Use torchaudio to extract features.

* Support decoding multiple files at the same time.

Also, use kaldifeat for feature extraction.

* Support decoding with LM rescoring and attention-decoder rescoring.

* Minor fixes.

* Replace scale with lattice-score-scale.

* Add usage example with a provided pretrained model.
2021-08-20 11:53:37 +08:00
pkufool
ef233486ae
The training script produce WER of 2.57% on librispeech test-clean (#13)
* Add grad_clip and weight-decay, small fix of dataloader and masking

* Add RESULTS.md
2021-08-20 10:08:08 +08:00
Fangjun Kuang
5a0b9bcb23
Refactoring (#4)
* Fix an error in TDNN-LSTM training.

* WIP: Refactoring

* Refactor transformer.py

* Remove unused code.

* Minor fixes.
2021-08-04 14:53:02 +08:00
Fangjun Kuang
398ed80d7a Minor fixes to support DDP training. 2021-07-31 15:26:57 +08:00
Fangjun Kuang
acc63a9172 WIP: Add BPE training code. 2021-07-29 20:23:52 +08:00
Fangjun Kuang
bd69e4be32 Use attention decoder for rescoring. 2021-07-28 12:22:09 +08:00
Fangjun Kuang
f65854cca5 Add BPE decoding results. 2021-07-27 17:38:47 +08:00
Fangjun Kuang
d3101fb005 Fix loading checkpoint in DDP training. 2021-07-26 08:08:14 +08:00
Fangjun Kuang
8055bf31a0 Support DDP training. 2021-07-25 21:40:09 +08:00
Fangjun Kuang
4a66712406 Add LM rescoring. 2021-07-25 18:21:26 +08:00
Fangjun Kuang
6f9fe5b906 Refactor decoding code. 2021-07-24 22:23:50 +08:00
Fangjun Kuang
f3542c7793 Add CTC training. 2021-07-24 17:13:20 +08:00