mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-09-18 21:44:18 +00:00
update RESULTS.md
This commit is contained in:
parent
7f94b86bb0
commit
1f5216236f
@ -1,5 +1,163 @@
|
||||
## Results
|
||||
|
||||
### LibriSpeech BPE training results (Pruned Stateless Conv-Emformer RNN-T)
|
||||
|
||||
[conv_emformer_transducer_stateless](./conv_emformer_transducer_stateless)
|
||||
|
||||
It implements [Emformer](https://arxiv.org/abs/2010.10759) augmented with convolution module for streaming ASR.
|
||||
It is modified from [torchaudio](https://github.com/pytorch/audio).
|
||||
|
||||
See <https://github.com/k2-fsa/icefall/pull/389> for more details.
|
||||
|
||||
#### Training on full librispeech
|
||||
|
||||
The WERs are:
|
||||
|
||||
| | test-clean | test-other | comment | decoding mode |
|
||||
|-------------------------------------|------------|------------|---------------------------------------------|
|
||||
| greedy search (max sym per frame 1) | 3.63 | 9.61 | --epoch 30 --avg 10 | simulated streaming |
|
||||
| greedy search (max sym per frame 1) | 3.64 | 9.65 | --epoch 30 --avg 10 | streaming |
|
||||
| fast beam search | 3.61 | 9.4 | --epoch 30 --avg 10 | simulated streaming |
|
||||
| fast beam search | 3.58 | 9.5 | --epoch 30 --avg 10 | streaming |
|
||||
| modified beam search | 3.56 | 9.41 | --epoch 30 --avg 10 | simulated streaming |
|
||||
| modified beam search | 3.54 | 9.46 | --epoch 30 --avg 10 | streaming |
|
||||
|
||||
The training command is:
|
||||
|
||||
```bash
|
||||
./conv_emformer_transducer_stateless/train.py \
|
||||
--world-size 6 \
|
||||
--num-epochs 30 \
|
||||
--start-epoch 1 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--full-libri 1 \
|
||||
--max-duration 300 \
|
||||
--master-port 12321 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32
|
||||
```
|
||||
|
||||
The tensorboard log can be found at
|
||||
<https://tensorboard.dev/experiment/4em2FLsxRwGhmoCRQUEoDw/>
|
||||
|
||||
The simulated streaming decoding command using greedy search is:
|
||||
```bash
|
||||
./conv_emformer_transducer_stateless/decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--max-duration 300 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method greedy_search \
|
||||
--use-averaged-model True
|
||||
```
|
||||
|
||||
The simulated streaming decoding command using fast beam search is:
|
||||
```bash
|
||||
./conv_emformer_transducer_stateless/decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--max-duration 300 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method fast_beam_search \
|
||||
--use-averaged-model True \
|
||||
--beam 4 \
|
||||
--max-contexts 4 \
|
||||
--max-states 8
|
||||
```
|
||||
|
||||
The simulated streaming decoding command using modified beam search is:
|
||||
```bash
|
||||
./conv_emformer_transducer_stateless/decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--max-duration 300 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method modified_beam_search \
|
||||
--use-averaged-model True \
|
||||
--beam-size 4
|
||||
```
|
||||
|
||||
The streaming decoding command using greedy search is:
|
||||
```bash
|
||||
./conv_emformer_transducer_stateless/streaming_decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--num-decode-streams 2000 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method greedy_search \
|
||||
--use-averaged-model True
|
||||
```
|
||||
|
||||
The streaming decoding command using fast beam search is:
|
||||
```bash
|
||||
./conv_emformer_transducer_stateless/streaming_decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--num-decode-streams 2000 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method fast_beam_search \
|
||||
--use-averaged-model True \
|
||||
--beam 4 \
|
||||
--max-contexts 4 \
|
||||
--max-states 8
|
||||
```
|
||||
|
||||
The streaming decoding command using modified beam search is:
|
||||
```bash
|
||||
./conv_emformer_transducer_stateless/streaming_decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--num-decode-streams 2000 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method modified_beam_search \
|
||||
--use-averaged-model True \
|
||||
--beam-size 4
|
||||
```
|
||||
|
||||
Pretrained models, training logs, decoding logs, and decoding results
|
||||
are available at
|
||||
<https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless-2022-06-11>
|
||||
|
||||
### LibriSpeech BPE training results (Pruned Stateless Emformer RNN-T)
|
||||
|
||||
[pruned_stateless_emformer_rnnt2](./pruned_stateless_emformer_rnnt2)
|
||||
@ -280,12 +438,12 @@ The WERs are:
|
||||
|
||||
| | test-clean | test-other | comment |
|
||||
|-------------------------------------|------------|------------|-------------------------------------------------------------------------------|
|
||||
| greedy search (max sym per frame 1) | 2.75 | 6.74 | --epoch 30 --avg 6 --use_averaged_model False |
|
||||
| greedy search (max sym per frame 1) | 2.69 | 6.64 | --epoch 30 --avg 6 --use_averaged_model True |
|
||||
| fast beam search | 2.72 | 6.67 | --epoch 30 --avg 6 --use_averaged_model False |
|
||||
| fast beam search | 2.66 | 6.6 | --epoch 30 --avg 6 --use_averaged_model True |
|
||||
| modified beam search | 2.67 | 6.68 | --epoch 30 --avg 6 --use_averaged_model False |
|
||||
| modified beam search | 2.62 | 6.57 | --epoch 30 --avg 6 --use_averaged_model True |
|
||||
| greedy search (max sym per frame 1) | 2.75 | 6.74 | --epoch 30 --avg 6 --use-averaged-model False |
|
||||
| greedy search (max sym per frame 1) | 2.69 | 6.64 | --epoch 30 --avg 6 --use-averaged-model True |
|
||||
| fast beam search | 2.72 | 6.67 | --epoch 30 --avg 6 --use-averaged-model False |
|
||||
| fast beam search | 2.66 | 6.6 | --epoch 30 --avg 6 --use-averaged-model True |
|
||||
| modified beam search | 2.67 | 6.68 | --epoch 30 --avg 6 --use-averaged-model False |
|
||||
| modified beam search | 2.62 | 6.57 | --epoch 30 --avg 6 --use-averaged-model True |
|
||||
|
||||
The training command is:
|
||||
|
||||
|
@ -16,7 +16,57 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""
|
||||
Usage:
|
||||
(1) greedy search
|
||||
./conv_emformer_transducer_stateless/streaming_decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--num-decode-streams 2000 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method greedy_search \
|
||||
--use-averaged-model True
|
||||
|
||||
(2) modified beam search
|
||||
./conv_emformer_transducer_stateless/streaming_decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--num-decode-streams 2000 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method modified_beam_search \
|
||||
--use-averaged-model True \
|
||||
--beam-size 4
|
||||
|
||||
(3) fast beam search
|
||||
./conv_emformer_transducer_stateless/streaming_decode.py \
|
||||
--epoch 30 \
|
||||
--avg 10 \
|
||||
--exp-dir conv_emformer_transducer_stateless/exp \
|
||||
--num-decode-streams 2000 \
|
||||
--num-encoder-layers 12 \
|
||||
--chunk-length 32 \
|
||||
--cnn-module-kernel 31 \
|
||||
--left-context-length 32 \
|
||||
--right-context-length 8 \
|
||||
--memory-size 32 \
|
||||
--decoding-method fast_beam_search \
|
||||
--use-averaged-model True \
|
||||
--beam 4 \
|
||||
--max-contexts 4 \
|
||||
--max-states 8
|
||||
"""
|
||||
import argparse
|
||||
import logging
|
||||
import warnings
|
||||
@ -686,8 +736,9 @@ def decode_dataset(
|
||||
)
|
||||
del streams[i]
|
||||
|
||||
key = "greedy_search"
|
||||
if params.decoding_method == "fast_beam_search":
|
||||
if params.decoding_method == "greedy_search":
|
||||
key = "greedy_search"
|
||||
elif params.decoding_method == "fast_beam_search":
|
||||
key = (
|
||||
f"beam_{params.beam}_"
|
||||
f"max_contexts_{params.max_contexts}_"
|
||||
|
Loading…
x
Reference in New Issue
Block a user