icefall

Author	SHA1	Message	Date
Fangjun Kuang	b29e4bdd03	Fix style issues.	2021-11-17 12:24:35 +08:00
Fangjun Kuang	469b665a5a	Add files from Dan. See https://github.com/k2-fsa/icefall/pull/54	2021-11-17 12:21:05 +08:00
Fangjun Kuang	1ea780f203	Minor fixes.	2021-11-16 14:24:01 +08:00
Fangjun Kuang	b8037cd529	Use eval() for the masked LM model in decoding.	2021-11-16 10:55:14 +08:00
Fangjun Kuang	7f5d9a1671	Fix an error.	2021-11-15 16:03:49 +08:00
Fangjun Kuang	878fb40a12	Fixes after review.	2021-11-15 15:44:25 +08:00
Fangjun Kuang	57b9c8868b	Fix incorrect doc.	2021-11-15 12:18:16 +08:00
Fangjun Kuang	e3d7f21372	Add more documentation.	2021-11-15 10:19:42 +08:00
Fangjun Kuang	d680b56c5c	Use correct path pairs to compute log-likelihood.	2021-11-15 10:01:16 +08:00
Fangjun Kuang	cdd539e55c	First version using conformer lm for rescoring (not tested)	2021-11-03 20:59:54 +08:00
Fangjun Kuang	1ac9bb3fd7	WIP: Begin to add decoding scripts.	2021-11-01 22:01:12 +08:00
Fangjun Kuang	19828cbf22	Add files form Dan. See https://github.com/k2-fsa/icefall/pull/54	2021-11-01 21:58:43 +08:00
Fangjun Kuang	3441634f34	Finish preparing the inputs for conformer lm from an nbest object.	2021-11-01 21:34:22 +08:00
Fangjun Kuang	1b9e4f0fea	WIP: Decoding scripts using conformer LM.	2021-10-27 19:54:28 +08:00
Fangjun Kuang	8cb7f712e4	Use GPU for averaging checkpoints if possible. (#84 )	2021-10-26 17:10:04 +08:00
Fangjun Kuang	712ead8207	Fix an error when attention decoder rescoring returns None. (#90 )	2021-10-22 19:52:25 +08:00
Piotr Żelasko	902e0b238d	Merge pull request #82 from pzelasko/feature/find-pessimistic-batches Find CUDA OOM batches before starting training	2021-10-19 11:26:13 -04:00
Piotr Żelasko	3cc99d2af2	make flake8 happy	2021-10-19 11:24:54 -04:00
cdxie	d30244e28f	add a docker file for some users (#87 ) * add a docker file for some users Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8-python3.8 * add a describing file of how to use dockerfile give some steps to use dockerfile	2021-10-19 13:00:59 +08:00
Piotr Żelasko	86f3e0ef37	Make flake8 happy	2021-10-18 09:54:40 -04:00
Piotr Żelasko	6fbd7a287c	Refactor OOM batch scanning into a local function	2021-10-18 09:53:04 -04:00
Piotr Żelasko	d509d58f30	Merge branch 'master' into feature/find-pessimistic-batches	2021-10-18 09:47:21 -04:00
Fangjun Kuang	3effcb4225	Fix typos. (#85 )	2021-10-18 16:17:14 +08:00
Fangjun Kuang	53b79fafa7	Add MMI training with word pieces as modelling unit. (#6 ) * Fix an error in TDNN-LSTM training. * WIP: Refactoring * Refactor transformer.py * Remove unused code. * Minor fixes. * Fix decoder padding mask. * Add MMI training with word pieces. * Remove unused files. * Minor fixes. * Refactoring. * Minor fixes. * Use pre-computed alignments in LF-MMI training. * Minor fixes. * Update decoding script. * Add doc about how to check and use extracted alignments. * Fix style issues. * Fix typos. * Fix style issues. * Disable macOS tests for now.	2021-10-18 15:20:32 +08:00
Fangjun Kuang	4890e27b45	Extract framewise alignment information using CTC decoding (#39 ) * Use new APIs with k2.RaggedTensor * Fix style issues. * Update the installation doc, saying it requires at least k2 v1.7 * Extract framewise alignment information using CTC decoding. * Print environment information. Print information about k2, lhotse, PyTorch, and icefall. * Fix CI. * Fix CI. * Compute framewise alignment information of the LibriSpeech dataset. * Update comments for the time to compute alignments of train-960. * Preserve cut id in mix cut transformer. * Minor fixes. * Add doc about how to extract framewise alignments.	2021-10-18 14:24:33 +08:00
Jan "yenda" Trmal	bd7c2f7645	fix conformer typo in docs (#83 )	2021-10-16 07:46:17 +08:00
Piotr Żelasko	403d1744ff	Introduce backprop in finding OOM batches	2021-10-15 10:05:13 -04:00
Piotr Żelasko	060117a9ff	Reformatting	2021-10-14 21:40:14 -04:00
Piotr Żelasko	1c7c79f2fc	Find CUDA OOM batches before starting training	2021-10-14 21:28:11 -04:00
Fangjun Kuang	fee1f84b20	Test pre-trained model in CI (#80 ) * Add CI to run pre-trained models. * Minor fixes. * Install kaldifeat * Install a CPU version of PyTorch. * Fix CI errors. * Disable decoder layers in pretrained.py if it is not used. * Clone pre-trained model from GitHub. * Minor fixes. * Minor fixes. * Minor fixes.	2021-10-15 00:41:33 +08:00
Mingshuang Luo	5401ce199d	Update ctc-decoding on pretrained.py and conformer_ctc.rst (#78 )	2021-10-14 23:29:06 +08:00
Fangjun Kuang	f2387fe523	Fix a bug introduced while supporting torch script. (#79 )	2021-10-14 20:09:38 +08:00
Fangjun Kuang	5016ee3c95	Give an informative message when users provide an unsupported decoding method (#77 )	2021-10-14 16:20:35 +08:00
Mingshuang Luo	39bc8cae94	Add ctc decoding to pretrained.py on conformer_ctc (#75 ) * Add ctc-decoding to pretrained.py * update pretrained.py and conformer_ctc.rst * update ctc-decoding for pretrained.py on conformer_ctc * Update pretrained.py * fix the style issue * Update conformer_ctc.rst * Update the running logs	2021-10-13 12:20:16 +08:00
Mingshuang Luo	391432b356	Update train.py ("10"--->"params.log_interval") (#76 ) * Update train.py * Update train.py * Update train.py	2021-10-12 21:30:31 +08:00
Mingshuang Luo	597c5efdb1	Use LossRecord to record and print the loss for the training process (#62 ) * Update index.rst (AS->ASR) * Update conformer_ctc.rst (pretraind->pretrained) * Fix some spelling errors. * Fix some spelling errors. * Use LossRecord to record and print loss in the training process * Change the name "LossRecord" to "MetricsTracker"	2021-10-12 15:58:03 +08:00
Fangjun Kuang	beb54ddb61	Support torch script. (#65 ) * WIP: Support torchscript. * Minor fixes. * Fix style issues. * Add documentation about how to deploy a trained model.	2021-10-12 14:55:05 +08:00
Piotr Żelasko	d54828e73a	Merge pull request #73 from pzelasko/feature/bucketing-in-test Use BucketingSampler for dev and test data	2021-10-09 10:58:29 -04:00
Piotr Żelasko	069ebaf9ba	Reformatting	2021-10-09 14:45:46 +00:00
Mingshuang Luo	6e43905d12	Update the documentation to include "ctc-decoding" (#71 ) * Update conformer_ctc.rst	2021-10-09 11:56:25 +08:00
Piotr Żelasko	b682467e4d	Use BucketingSampler for dev and test data	2021-10-08 22:32:13 -04:00
Piotr Żelasko	adb068eb82	setup.py (#64 )	2021-10-01 16:43:08 +08:00
Fangjun Kuang	707d7017a7	Support pure ctc decoding requiring neither a lexicon nor an n-gram LM (#58 ) * Rename lattice_score_scale to nbest_scale. * Support pure CTC decoding requiring neither a lexicion nor an n-gram LM. * Fix style issues. * Fix a typo. * Minor fixes.	2021-09-26 14:21:49 +08:00
Fangjun Kuang	455693aede	Fix `hasattr` of AttributeDict. (#52 )	2021-09-22 16:37:20 +08:00
Fangjun Kuang	a80e58e15d	Refactor decode.py to make it more readable and more modular. (#44 ) * Refactor decode.py to make it more readable and more modular. * Fix an error. Nbest.fsa should always have token IDs as labels and word IDs as aux_labels. * Add nbest decoding. * Compute edit distance with k2. * Refactor nbest-oracle. * Add rescore with nbest lists. * Add whole-lattice rescoring. * Add rescoring with attention decoder. * Refactoring. * Fixes after refactoring. * Fix a typo. * Minor fixes. * Replace [] with () for shapes. * Use k2 v1.9 * Use Levenshtein graphs/alignment from k2 v1.9 * [doc] Require k2 >= v1.9 * Minor fixes.	2021-09-20 15:44:54 +08:00
Fangjun Kuang	cc77cb3459	Fix decode.py to remove the correct axis. (#50 ) * Fix decode.py to remove the correct axis. * Run GitHub actions manually.	2021-09-17 16:49:03 +08:00
Wei Kang	9a6e0489c8	update api for RaggedTensor (#45 ) * Fix code style * update k2 version in CI * fix compile hlg	2021-09-14 16:39:56 +08:00
Fangjun Kuang	a2be2896a9	Fix the link to k2's installation doc. (#46 )	2021-09-14 13:39:52 +08:00
Wei Kang	24656e9749	Update docs and remove unnecessary arguments (#42 ) * Fix typo in docs * Update docs and remove unnecessary arguments * Fix code style	2021-09-13 18:28:57 +08:00
Fangjun Kuang	f792b466bf	Change default value of lattice-score-scale from 1.0 to 0.5 (#41 ) * Change the default value of lattice-score-scale from 1.0 to 0.5 * Fix CI.	2021-09-13 10:49:18 +08:00

1 2

99 Commits