Обложка канала

Towards NLP

NLP: все n-граммы про анализ текстов. По всем дополнительным вопросам:

Towards NLP

4 года назад
Открыть в
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks Remember BLOOM 🌸 model? Now there are BLOOM datasets: multimodal multilingual datasets covering 363 languages across 32 language families💪! Four datasets are released: * bloom-lm for language modeling in 351 languages; * bloom-captioning for image-to-text or text-to-image tasks in 351 languages; * bloom-vist for visual storytelling in 351 languages; * bloom-speech for speech-to-text and text-to-speech tasks in 56 languages. The original paper with all details about collection process and datasets here.