icefall/egs/ptb/LM/README.md

19 lines
616 B
Markdown

## Description
(Note: the experiments here are only about language modeling)
ptb is short for Penn Treebank.
About the Penn Treebank corpus:
- This corpus is free for research purposes
- ptb.train.txt: train set
- ptb.valid.txt: development set (should be used just for tuning hyper-parameters, but not for training)
- ptb.test.txt: test set for reporting perplexity
You can download the dataset from one of the following URLs:
- https://github.com/townie/PTB-dataset-from-Tomas-Mikolov-s-webpage
- http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
- https://deepai.org/dataset/penn-treebank