Locked-Image Tuning: Adding Language Understanding to Image Models
Most interesting work is often neglected in favour of edgy huge compute backed bullshit (often generative and cherry-picked).
I really like this series of posts on REAL zero shot image / text classifiers.
ai.googleblog.com/2022/04…age.html
What I like here - sometimes freezing some model parts reduces compute significantly and improves performance ... as opposed to a general mantra of taking 1000 GPUs and training end-to-end.