Fangjun Kuang
b29e4bdd03
Fix style issues.
2021-11-17 12:24:35 +08:00
Fangjun Kuang
469b665a5a
Add files from Dan.
...
See https://github.com/k2-fsa/icefall/pull/54
2021-11-17 12:21:05 +08:00
Fangjun Kuang
1ea780f203
Minor fixes.
2021-11-16 14:24:01 +08:00
Fangjun Kuang
b8037cd529
Use eval() for the masked LM model in decoding.
2021-11-16 10:55:14 +08:00
Fangjun Kuang
7f5d9a1671
Fix an error.
2021-11-15 16:03:49 +08:00
Fangjun Kuang
878fb40a12
Fixes after review.
2021-11-15 15:44:25 +08:00
Fangjun Kuang
57b9c8868b
Fix incorrect doc.
2021-11-15 12:18:16 +08:00
Fangjun Kuang
e3d7f21372
Add more documentation.
2021-11-15 10:19:42 +08:00
Fangjun Kuang
d680b56c5c
Use correct path pairs to compute log-likelihood.
2021-11-15 10:01:16 +08:00
Fangjun Kuang
cdd539e55c
First version using conformer lm for rescoring (not tested)
2021-11-03 20:59:54 +08:00
Fangjun Kuang
1ac9bb3fd7
WIP: Begin to add decoding scripts.
2021-11-01 22:01:12 +08:00
Fangjun Kuang
19828cbf22
Add files form Dan.
...
See https://github.com/k2-fsa/icefall/pull/54
2021-11-01 21:58:43 +08:00
Fangjun Kuang
3441634f34
Finish preparing the inputs for conformer lm from an nbest object.
2021-11-01 21:34:22 +08:00
Fangjun Kuang
1b9e4f0fea
WIP: Decoding scripts using conformer LM.
2021-10-27 19:54:28 +08:00
Fangjun Kuang
8cb7f712e4
Use GPU for averaging checkpoints if possible. ( #84 )
2021-10-26 17:10:04 +08:00
Fangjun Kuang
712ead8207
Fix an error when attention decoder rescoring returns None. ( #90 )
2021-10-22 19:52:25 +08:00
Piotr Żelasko
902e0b238d
Merge pull request #82 from pzelasko/feature/find-pessimistic-batches
...
Find CUDA OOM batches before starting training
2021-10-19 11:26:13 -04:00
Piotr Żelasko
3cc99d2af2
make flake8 happy
2021-10-19 11:24:54 -04:00
cdxie
d30244e28f
add a docker file for some users ( #87 )
...
* add a docker file for some users
Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8-python3.8
* add a describing file of how to use dockerfile
give some steps to use dockerfile
2021-10-19 13:00:59 +08:00
Piotr Żelasko
86f3e0ef37
Make flake8 happy
2021-10-18 09:54:40 -04:00
Piotr Żelasko
6fbd7a287c
Refactor OOM batch scanning into a local function
2021-10-18 09:53:04 -04:00
Piotr Żelasko
d509d58f30
Merge branch 'master' into feature/find-pessimistic-batches
2021-10-18 09:47:21 -04:00
Fangjun Kuang
3effcb4225
Fix typos. ( #85 )
2021-10-18 16:17:14 +08:00
Fangjun Kuang
53b79fafa7
Add MMI training with word pieces as modelling unit. ( #6 )
...
* Fix an error in TDNN-LSTM training.
* WIP: Refactoring
* Refactor transformer.py
* Remove unused code.
* Minor fixes.
* Fix decoder padding mask.
* Add MMI training with word pieces.
* Remove unused files.
* Minor fixes.
* Refactoring.
* Minor fixes.
* Use pre-computed alignments in LF-MMI training.
* Minor fixes.
* Update decoding script.
* Add doc about how to check and use extracted alignments.
* Fix style issues.
* Fix typos.
* Fix style issues.
* Disable macOS tests for now.
2021-10-18 15:20:32 +08:00
Fangjun Kuang
4890e27b45
Extract framewise alignment information using CTC decoding ( #39 )
...
* Use new APIs with k2.RaggedTensor
* Fix style issues.
* Update the installation doc, saying it requires at least k2 v1.7
* Extract framewise alignment information using CTC decoding.
* Print environment information.
Print information about k2, lhotse, PyTorch, and icefall.
* Fix CI.
* Fix CI.
* Compute framewise alignment information of the LibriSpeech dataset.
* Update comments for the time to compute alignments of train-960.
* Preserve cut id in mix cut transformer.
* Minor fixes.
* Add doc about how to extract framewise alignments.
2021-10-18 14:24:33 +08:00
Jan "yenda" Trmal
bd7c2f7645
fix conformer typo in docs ( #83 )
2021-10-16 07:46:17 +08:00
Piotr Żelasko
403d1744ff
Introduce backprop in finding OOM batches
2021-10-15 10:05:13 -04:00
Piotr Żelasko
060117a9ff
Reformatting
2021-10-14 21:40:14 -04:00
Piotr Żelasko
1c7c79f2fc
Find CUDA OOM batches before starting training
2021-10-14 21:28:11 -04:00
Fangjun Kuang
fee1f84b20
Test pre-trained model in CI ( #80 )
...
* Add CI to run pre-trained models.
* Minor fixes.
* Install kaldifeat
* Install a CPU version of PyTorch.
* Fix CI errors.
* Disable decoder layers in pretrained.py if it is not used.
* Clone pre-trained model from GitHub.
* Minor fixes.
* Minor fixes.
* Minor fixes.
2021-10-15 00:41:33 +08:00
Mingshuang Luo
5401ce199d
Update ctc-decoding on pretrained.py and conformer_ctc.rst ( #78 )
2021-10-14 23:29:06 +08:00
Fangjun Kuang
f2387fe523
Fix a bug introduced while supporting torch script. ( #79 )
2021-10-14 20:09:38 +08:00
Fangjun Kuang
5016ee3c95
Give an informative message when users provide an unsupported decoding method ( #77 )
2021-10-14 16:20:35 +08:00
Mingshuang Luo
39bc8cae94
Add ctc decoding to pretrained.py on conformer_ctc ( #75 )
...
* Add ctc-decoding to pretrained.py
* update pretrained.py and conformer_ctc.rst
* update ctc-decoding for pretrained.py on conformer_ctc
* Update pretrained.py
* fix the style issue
* Update conformer_ctc.rst
* Update the running logs
2021-10-13 12:20:16 +08:00
Mingshuang Luo
391432b356
Update train.py ("10"--->"params.log_interval") ( #76 )
...
* Update train.py
* Update train.py
* Update train.py
2021-10-12 21:30:31 +08:00
Mingshuang Luo
597c5efdb1
Use LossRecord to record and print the loss for the training process ( #62 )
...
* Update index.rst (AS->ASR)
* Update conformer_ctc.rst (pretraind->pretrained)
* Fix some spelling errors.
* Fix some spelling errors.
* Use LossRecord to record and print loss in the training process
* Change the name "LossRecord" to "MetricsTracker"
2021-10-12 15:58:03 +08:00
Fangjun Kuang
beb54ddb61
Support torch script. ( #65 )
...
* WIP: Support torchscript.
* Minor fixes.
* Fix style issues.
* Add documentation about how to deploy a trained model.
2021-10-12 14:55:05 +08:00
Piotr Żelasko
d54828e73a
Merge pull request #73 from pzelasko/feature/bucketing-in-test
...
Use BucketingSampler for dev and test data
2021-10-09 10:58:29 -04:00
Piotr Żelasko
069ebaf9ba
Reformatting
2021-10-09 14:45:46 +00:00
Mingshuang Luo
6e43905d12
Update the documentation to include "ctc-decoding" ( #71 )
...
* Update conformer_ctc.rst
2021-10-09 11:56:25 +08:00
Piotr Żelasko
b682467e4d
Use BucketingSampler for dev and test data
2021-10-08 22:32:13 -04:00
Piotr Żelasko
adb068eb82
setup.py ( #64 )
2021-10-01 16:43:08 +08:00
Fangjun Kuang
707d7017a7
Support pure ctc decoding requiring neither a lexicon nor an n-gram LM ( #58 )
...
* Rename lattice_score_scale to nbest_scale.
* Support pure CTC decoding requiring neither a lexicion nor an n-gram LM.
* Fix style issues.
* Fix a typo.
* Minor fixes.
2021-09-26 14:21:49 +08:00
Fangjun Kuang
455693aede
Fix hasattr
of AttributeDict. ( #52 )
2021-09-22 16:37:20 +08:00
Fangjun Kuang
a80e58e15d
Refactor decode.py to make it more readable and more modular. ( #44 )
...
* Refactor decode.py to make it more readable and more modular.
* Fix an error.
Nbest.fsa should always have token IDs as labels and
word IDs as aux_labels.
* Add nbest decoding.
* Compute edit distance with k2.
* Refactor nbest-oracle.
* Add rescore with nbest lists.
* Add whole-lattice rescoring.
* Add rescoring with attention decoder.
* Refactoring.
* Fixes after refactoring.
* Fix a typo.
* Minor fixes.
* Replace [] with () for shapes.
* Use k2 v1.9
* Use Levenshtein graphs/alignment from k2 v1.9
* [doc] Require k2 >= v1.9
* Minor fixes.
2021-09-20 15:44:54 +08:00
Fangjun Kuang
cc77cb3459
Fix decode.py to remove the correct axis. ( #50 )
...
* Fix decode.py to remove the correct axis.
* Run GitHub actions manually.
2021-09-17 16:49:03 +08:00
Wei Kang
9a6e0489c8
update api for RaggedTensor ( #45 )
...
* Fix code style
* update k2 version in CI
* fix compile hlg
2021-09-14 16:39:56 +08:00
Fangjun Kuang
a2be2896a9
Fix the link to k2's installation doc. ( #46 )
2021-09-14 13:39:52 +08:00
Wei Kang
24656e9749
Update docs and remove unnecessary arguments ( #42 )
...
* Fix typo in docs
* Update docs and remove unnecessary arguments
* Fix code style
2021-09-13 18:28:57 +08:00
Fangjun Kuang
f792b466bf
Change default value of lattice-score-scale from 1.0 to 0.5 ( #41 )
...
* Change the default value of lattice-score-scale from 1.0 to 0.5
* Fix CI.
2021-09-13 10:49:18 +08:00