15 Commits

Author SHA1 Message Date
Fangjun Kuang
7f1c0e07b6
Remove onnx and onnxruntime from requirements.txt (#640)
* Remove onnx and onnxruntime from requirements.txt
2022-10-31 13:44:40 +08:00
ezerhouni
9b671e1c21
Add Shallow fusion in modified_beam_search (#630)
* Add utility for shallow fusion

* test batch size == 1 without shallow fusion

* Use shallow fusion for modified-beam-search

* Modified beam search with ngram rescoring

* Fix code according to review

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-10-21 16:44:56 +08:00
Fangjun Kuang
95af039733
RNN-T training for yesno. (#141)
* RNN-T training for yesno.

* Rename Jointer to Joiner.
2021-12-07 21:44:37 +08:00
Fangjun Kuang
ec591698b0
Associate a cut with token alignment (without repeats) (#125)
* WIP: Associate a cut with token alignment (without repeats)

* Save framewise alignments with/without repeats.

* Minor fixes.
2021-11-29 18:50:54 +08:00
Wei Kang
4151cca147
Add torch script support for Aishell and update documents (#124)
* Add aishell recipe

* Remove unnecessary code and update docs

* adapt to k2 v1.7, add docs and results

* Update conformer ctc model

* Update docs, pretrained.py & results

* Fix code style

* Fix code style

* Fix code style

* Minor fix

* Minor fix

* Fix pretrained.py

* Update pretrained model & corresponding docs

* Export torch script model for Aishell

* Add C++ deployment docs

* Minor fixes

* Fix unit test

* Update Readme
2021-11-19 16:37:05 +08:00
Fangjun Kuang
53b79fafa7
Add MMI training with word pieces as modelling unit. (#6)
* Fix an error in TDNN-LSTM training.

* WIP: Refactoring

* Refactor transformer.py

* Remove unused code.

* Minor fixes.

* Fix decoder padding mask.

* Add MMI training with word pieces.

* Remove unused files.

* Minor fixes.

* Refactoring.

* Minor fixes.

* Use pre-computed alignments in LF-MMI training.

* Minor fixes.

* Update decoding script.

* Add doc about how to check and use extracted alignments.

* Fix style issues.

* Fix typos.

* Fix style issues.

* Disable macOS tests for now.
2021-10-18 15:20:32 +08:00
Fangjun Kuang
4890e27b45
Extract framewise alignment information using CTC decoding (#39)
* Use new APIs with k2.RaggedTensor

* Fix style issues.

* Update the installation doc, saying it requires at least k2 v1.7

* Extract framewise alignment information using CTC decoding.

* Print environment information.

Print information about k2, lhotse, PyTorch, and icefall.

* Fix CI.

* Fix CI.

* Compute framewise alignment information of the LibriSpeech dataset.

* Update comments for the time to compute alignments of train-960.

* Preserve cut id in mix cut transformer.

* Minor fixes.

* Add doc about how to extract framewise alignments.
2021-10-18 14:24:33 +08:00
Fangjun Kuang
707d7017a7
Support pure ctc decoding requiring neither a lexicon nor an n-gram LM (#58)
* Rename lattice_score_scale to nbest_scale.

* Support pure CTC decoding requiring neither a lexicion nor an n-gram LM.

* Fix style issues.

* Fix a typo.

* Minor fixes.
2021-09-26 14:21:49 +08:00
Fangjun Kuang
455693aede
Fix hasattr of AttributeDict. (#52) 2021-09-22 16:37:20 +08:00
Fangjun Kuang
a80e58e15d
Refactor decode.py to make it more readable and more modular. (#44)
* Refactor decode.py to make it more readable and more modular.

* Fix an error.

Nbest.fsa should always have token IDs as labels and
word IDs as aux_labels.

* Add nbest decoding.

* Compute edit distance with k2.

* Refactor nbest-oracle.

* Add rescore with nbest lists.

* Add whole-lattice rescoring.

* Add rescoring with attention decoder.

* Refactoring.

* Fixes after refactoring.

* Fix a typo.

* Minor fixes.

* Replace [] with () for shapes.

* Use k2 v1.9

* Use Levenshtein graphs/alignment from k2 v1.9

* [doc] Require k2 >= v1.9

* Minor fixes.
2021-09-20 15:44:54 +08:00
Fangjun Kuang
abadc71415
Use new APIs with k2.RaggedTensor (#38)
* Use new APIs with k2.RaggedTensor

* Fix style issues.

* Update the installation doc, saying it requires at least k2 v1.7

* Use k2 v1.7
2021-09-08 14:55:30 +08:00
pkufool
19c4214958
Fix code style and add copyright. (#18)
* Fix style and add copyright

* Minor fix

* Remove duplicate lines

* Reformat conformer.py by black

* Reformat code style with black.

* Fix github workflows

* Fix lhotse installation

* Install icefall requirements

* Update k2 version, remove lhotse from test workflow
2021-08-23 10:43:59 +08:00
Fangjun Kuang
acc63a9172 WIP: Add BPE training code. 2021-07-29 20:23:52 +08:00
Fangjun Kuang
2e33e24348 Add CI test. 2021-07-24 17:47:41 +08:00
Fangjun Kuang
f3542c7793 Add CTC training. 2021-07-24 17:13:20 +08:00