mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 18:12:19 +00:00
19 lines
616 B
Markdown
19 lines
616 B
Markdown
## Description
|
|
|
|
(Note: the experiments here are only about language modeling)
|
|
|
|
ptb is short for Penn Treebank.
|
|
|
|
|
|
About the Penn Treebank corpus:
|
|
- This corpus is free for research purposes
|
|
- ptb.train.txt: train set
|
|
- ptb.valid.txt: development set (should be used just for tuning hyper-parameters, but not for training)
|
|
- ptb.test.txt: test set for reporting perplexity
|
|
|
|
You can download the dataset from one of the following URLs:
|
|
|
|
- https://github.com/townie/PTB-dataset-from-Tomas-Mikolov-s-webpage
|
|
- http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
|
|
- https://deepai.org/dataset/penn-treebank
|