π An open, billion-scale corpus of images interleaved with text. MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text. ΠΡΠΊΡΡΡΡΠΉ ΠΌΠΈΠ»Π»ΠΈΠ°ΡΠ΄Π½ΡΠΉ ΠΊΠΎΡΠΏΡΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ, ΡΠ΅ΡΠ΅Π΄ΡΡΡΠΈΡ ΡΡ Ρ ΡΠ΅ΠΊΡΡΠΎΠΌ. π₯ Github: https://github.com/allenai/mmc4 β© Paper: https://arxiv.org/abs/2304.06939v1 βοΈ Dataset: https://paperswithcode.com/dataset/c4 ai_machinelearning_big_data