Explainability for NLP
With the raise of LLMs from ClosedAI, the research in explainability for NLP is important as never before. Still, a lot of work should be done in the field. However, you already can experiment and try explain your fine-tuned LLMs on a specific task. For now, the majority of methods are explored for texts classification tasks and are adjusted from tabular data.
How it can be done?
1. Baseline approach: Leave-one-out explanations. For instance, you have a regression layer as one of the last layers in your model. You can check the tokens with major weights. Then, exclude them from the text and check if the model's answer has changed. If the tokens were indeed important, the answer should change dramatically as the model cannot orient on this words to make a correct decision.
2. Local Surrogate (LIME). Modification of the previous idea. Now, you delete each word from the sentence and check the results. The "importance" of the word will be estimated based on how the model's answer differ…