# Readme This recipe contains data preparation for the [VoxPopuli](https://github.com/facebookresearch/voxpopuli) dataset [(pdf)](https://aclanthology.org/2021.acl-long.80.pdf). At the moment, without model training. ## audio per language | language | Size | Hrs. untranscribed | Hrs. transcribed | |----------|--------|--------------------|------------------| | bg | 295G | 17.6K | - | | cs | 308G | 18.7K | 62 | | da | 233G | 13.6K | - | | de | 379G | 23.2K | 282 | | el | 305G | 17.7K | - | | en | 382G | 24.1K | 543 | | es | 362G | 21.4K | 166 | | et | 179G | 10.6K | 3 | | fi | 236G | 14.2K | 27 | | fr | 376G | 22.8K | 211 | | hr | 132G | 8.1K | 43 | | hu | 297G | 17.7K | 63 | | it | 361G | 21.9K | 91 | | lt | 243G | 14.4K | 2 | | lv | 217G | 13.1K | - | | mt | 147G | 9.1K | - | | nl | 322G | 19.0K | 53 | | pl | 348G | 21.2K | 111 | | pt | 300G | 17.5K | - | | ro | 296G | 17.9K | 89 | | sk | 201G | 12.1K | 35 | | sl | 190G | 11.3K | 10 | | sv | 272G | 16.3K | - | | | | | | | total | 6.3T | 384K | 1791 |