Dataset No Language Left Behind Allen AI reproduced dataset which was used for training NLLB translation model by Meta AI. 450Gb of parallel data for 200 languages are available online in 🤗: https://huggingface.co/datasets/allenai/nllb The paper with detaiped description: https://arxiv.org/pdf/2207.04672.pdf