Adapters in Transformers
Have you ever struggle with the problem of too time- and resource-consuming training procedure of Transformers? Especially, when you want not just train the top layers of the model, but indeed train the full model for your task. This is really a big problem. But how it can be avoided?
The researchers of the University of Darmstadt have proposed the new paradigm of Transformer-based models training — Adapters. What is this? Let's refer to the picture. Adapter is a one more layer in Transformer block that comes before the final Add&Norm operation. What's the point now — the weights of the original model are frozen, but we train only these adapters! Moreover, once the adapter is pretrained for some task (for instance. for sentiment classification), it can be shared as weights with other users!
Now there is special AdapterHub where such Adapters can be shared between all users. All this seems like quite interesting idea!
The material is taken from currently going ALPS-2022.
[paper]