* add pruned transducer stateless5 recipe for tal_csasr
* do some changes for merging
* change for conformer.py
* add wer and cer for Chinese and English respectively
* fix a error for conformer.py
* support streaming in conformer
* Add more documents
* support streaming on pruned_transducer_stateless2; add delay penalty; fixes for decode states
* Minor fixes
* streaming for pruned_transducer_stateless4
* Fix conv cache error, support async streaming decoding
* Fix style
* Fix style
* Fix style
* Add torch.jit.export
* mask the initial cache
* Cutting off invalid frames of encoder_embed output
* fix relative positional encoding in streaming decoding for compution saving
* Minor fixes
* Minor fixes
* Minor fixes
* Minor fixes
* Minor fixes
* Fix jit export for torch 1.6
* Minor fixes for streaming decoding
* Minor fixes on decode stream
* move model parameters to train.py
* make states in forward streaming optional
* update pretrain to support streaming model
* update results.md
* update tensorboard and pre-models
* fix typo
* Fix tests
* remove unused arguments
* add streaming decoding ci
* Minor fix
* Minor fix
* disable right context by default
* Add fast_beam_search_nbest.
* Fix CI errors.
* Fix CI errors.
* More fixes.
* Small fixes.
* Support using log_add in LG decoding with fast_beam_search.
* Support LG decoding in pruned_transducer_stateless
* Support LG for pruned_transducer_stateless2.
* Support LG for fast beam search.
* Minor fixes.
* Use jsonl for cutsets in the librispeech recipe.
* Use lazy cutset for all recipes.
* More fixes to use lazy CutSet.
* Remove force=True from logging to support Python < 3.8
* Minor fixes.
* Fix style issues.
* keep model_avg on cpu
* explicitly convert model_avg to cpu
* minor fix
* remove device convertion for model_avg
* modify usage of the model device in train.py
* change model.device to next(model.parameters()).device for decoding
* assert params.start_epoch>0
* assert params.start_epoch>0, params.start_epoch
* First upload of model average codes.
* minor fix
* update decode file
* update .flake8
* rename pruned_transducer_stateless3 to pruned_transducer_stateless4
* change epoch number counter starting from 1 instead of 0
* minor fix of pruned_transducer_stateless4/train.py
* refactor the checkpoint.py
* minor fix, update docs, and modify the epoch number to count from 1 in the pruned_transducer_stateless4/decode.py
* update author info
* add docs of the scaling in function average_checkpoints_with_averaged_model
* initial commit
* support download, data prep, and fbank
* on-the-fly feature extraction by default
* support BPE based lang
* support HLG for BPE
* small fix
* small fix
* chunked feature extraction by default
* Compute features for GigaSpeech by splitting the manifest.
* Fixes after review.
* Split manifests into 2000 pieces.
* set audio duration mismatch tolerance to 0.01
* small fix
* add conformer training recipe
* Add conformer.py without pre-commit checking
* lazy loading and use SingleCutSampler
* DynamicBucketingSampler
* use KaldifeatFbank to compute fbank for musan
* use pretrained language model and lexicon
* use 3gram to decode, 4gram to rescore
* Add decode.py
* Update .flake8
* Delete compute_fbank_gigaspeech.py
* Use BucketingSampler for valid and test dataloader
* Update params in train.py
* Use bpe_500
* update params in decode.py
* Decrease num_paths while CUDA OOM
* Added README
* Update RESULTS
* black
* Decrease num_paths while CUDA OOM
* Decode with post-processing
* Update results
* Remove lazy_load option
* Use default `storage_type`
* Keep the original tolerance
* Use split-lazy
* black
* Update pretrained model
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
* update icefall/__init__.py to import more common functions.
* update icefall/__init__.py
* make imports style consistent.
* exclude black check for icefall/__init__.py in pyproject.toml.
* Adding diagnostics code...
* Move diagnostics code from local dir to the shared icefall dir
* Remove the diagnostics code in the local dir
* Update docs of arguments, and remove stats_types() function in TensorDiagnosticOptions object.
* Update docs of arguments.
* Add copyright information.
* Corrected the time in copyright information.
Co-authored-by: Daniel Povey <dpovey@gmail.com>
We are using multiple machines to do various experiments. It makes
life easier to know which experiment is running on which machine
if we also log the IP and hostname of the machine.