Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet, arxivVery interesting idea, i think its the third paper that i saw that using batch technique 📦https://github.com/yitu-opensource/T2T-ViT📄 https://arxiv.org/abs/2101.11986