Crossmodal-3600 — Multilingual Reference Captions for Geographically Diverse Images
Once again, looks like that image captioning is ripe. Now the public is entertained only with a public validation set, albeit a formidable one.
The more ripe something is, the less will be shared. History is highly cyclical.
Just look at something like FAIR's massive NMT efforts.
At first they created fairseq (to counter OpenNMT which was in Tensorflow maybe), then CCMatrix. And now when their NMT dataset reached a critical scale of being competitive with off-the-shelf products, they did not share it, which makes total sense.