RUSSE 2022 Detoxification Competition
There was no posts in the channel for a month because we were preparing something quite interesting — the first shared task on text detoxification based on a parallel dataset! The shared task is hold on the base of Dialogue-2022 conference.
So, what is going on. The task of text detoxification is quite straightforward: given as an input some toxic text, you need to generate its non-toxic version. For example:
Well today i fucking fracking learned something. -> I have learned something new today.
Go ahead ban me, i don’t give a shit. -> It won’t matter to me if I get banned.
Interesting, right? Previously, I posted here a lot of content about detoxification and our experiments [the first Russian detoxification experiments, SOTA unsupervised English models]. However, all that was mostly about unsupervised methods. We have collected a unique parallel dataset for detoxification with which you are incredibly welcome to experiment! Moreover, your model results will be evaluated manually — we aim to find indeed strong detoxification systems! What is needed from your is to train/find/create such a seq2seq model that will pass human test.
This post is a fuse for the track that will start December, 15. More details here:
https://russe.nlpub.org/2022/tox/
Telegram group for communication:
https://t.me/joinchat/Ckja7Vh00qPOU887pLonqQ
See you in two days.