π₯ Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Masked "language" modeling on images (Imglish), texts (English), and image-text pairs ("parallel sentences") in a unified manner. Github: github.com/microsoβ¦ter/beit Paper: https://arxiv.org/abs/2208.10442v1 Datasets: https://paperswithcode.com/dataset/visual-genome @ai_machinelearning_big_data