Towards NLP, страница 3, все посты канала

Towards NLP

4 года назад

Multiverse🌌: Multilingual Evidence for Fake News Detection Our extended work about the multilingual feature for fake news detection. Nowadays, each language bubble is like separate universe with its biased view on the event. Our approach can help to compare news across media in different languages and give more critical information about the event for the reader. The work contains broad fake news field analysis in general as well. Might be useful for the introduction into fake news detection topic. [link] Full paper

Читать дальше
Towards NLP

4 года назад

GALACTICA The amount of papers being published every month, week, and even day now is very overwhelmed. In May 2022, an average of 516 papers per day were submitted to arXiv. How will it be nice if there is a tool that helps researches to find papers for review more precisely, summarize it and help to organize research better? Now it is possible💪 The researches from Meta AI introduced new language model Galactica. What makes this model capable to work with equations, chemistry sequences, references, code, plain text, and other symbolic chains so good? * Dataset: The Galactica Corpus. Contains of 48m papers, 106b tokens from papers, reference material, encyclopedias and other scientific sources. * Tokenization: special type of tokenization and separation tokens for each type of sequences: citation, mathematics, chemistry sequences, and others. * Working Memory Token: recently, there was introduced chain-of-thoughts concept. In this work, the authors go further: memory token <work> that wraps prompting into step-by-step reasoning part. * Prompt Pre-Training (similar to FLAN) based on different tasks: QA, summarization, NER extraction, reasoning, dialogue, others. * Architecture: a Transformer architecture in a decoder-only setup. Now, using the demo, you can search by reference, short description of the main idea of the paper or even formula, and ask for summarization. Thanks for the Twitter community, the demo is now shouted down🫣 However, as always, the presented scientific is still interesting by itself. In a meanwhile, we will wait to again test the model in its full power. [link] The main page [link] The paper about Galactica LLM

Galactica Demo

www.galactica.org

Читать дальше
Towards NLP

4 года назад

The State of Multilingual AI There are around 7,000 languages spoken around the world. Around 400 languages have more than 1M speakers and around 1,200 languages have more than 100k ... Reviewing papers published at ACL 2008, she found that 63% of all papers focused on English. For a recent study, we similarly reviewed papers from ACL 2021 and found that almost 70% of papers only evaluate on English. 10 years on, little thus seems to have changed. by Sebastian Ruder: * Status Quo; * Recent Progress; * Challenges and Opportunities; https://ruder.io/state-of-multilingual-ai/

The State of Multilingual AI

This post takes a look at the state of multilingual AI. How multilingual are current models? What are recent contributions and remaining challenges?

Sebastian Ruder

Читать дальше
Реклама

Рекламный пост
Towards NLP

4 года назад

Stanford Seminar — ML Explainability If you want to be introduced into explainability topic, there is a cool seminar from Stanford! From the basics to the new horizons of research in this field. Videos on Youtube: link Slides: link

Machine Learning Explainability Workshop I Stanford - YouTube

YouTube
Towards NLP

4 года назад

NLP for Social Good - Daryna Dementieva | Munich NLP Recently I was a guest at MunichNLP seminars series. You are very welcome to watch if you want to know more about: * general idea of what is going in AI and NLP for Social Good; * fake news detection and how multilingual evidence can help to improve it; * what is going on in the field of texts detoxification. The recorded video is available on Youtube📣
Towards NLP

4 года назад

Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks Remember BLOOM 🌸 model? Now there are BLOOM datasets: multimodal multilingual datasets covering 363 languages across 32 language families💪! Four datasets are released: * bloom-lm for language modeling in 351 languages; * bloom-captioning for image-to-text or text-to-image tasks in 351 languages; * bloom-vist for visual storytelling in 351 languages; * bloom-speech for speech-to-text and text-to-speech tasks in 56 languages. The original paper with all details about collection process and datasets here.

Читать дальше
Towards NLP

4 года назад

Scaling Instruction-Finetuned Language Models TL;DR Additional fine-tuning of T5 or PaLM models on 1k (!) tasks make them better on evaluation tasks, make them to cover more languages, and scale to the new unseen tasks better. Google Brain team experimented with new methods of fine-tuning of Large Language Models. The main recipes for better LLMs: * the bigger amount of the tasks for pre-training you have, the better; * smarter prompts are also help more. By smarter here we can understand the usage of instructions and Chain-of-thought (see screenshots). Translating to human language, the more clues you give the model in the request, the more precise answer you will receive. The Chain-of-thought concept is quite interesting, the original paper of it is here. The optimal amount of tasks of pre-training is still an open research question (authors in their experiments jumped from 282 tasks directly to 1,836 tasks, quite a gap of number to explore). But, in the end, if we want to solve a new task and we generate smarter prompts for it, as the model was pre-trained, it will significantly improve zero-shot performance. The original paper with all details and a lot of table and examples of performances on different tasks. 🤗model cards: all variations of t5, flan-t5-base for illustration.

google/flan-t5-base · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Читать дальше
Towards NLP

4 года назад

MTEB: Massive Text Embedding Benchmark Indeed massive work of comparison of 33 models on 56 datasets and 112 languages💪 Now, if you are interested in some task, you can go to this leaderbord and orient to the best models for this task in specific language. Or, if you have new model, you can perform more clear and fair comparison. Paper: https://arxiv.org/abs/2210.07316 (useful to read more details about the tasks, abbreviations, details of the datasets and the models) Github: https://github.com/embeddings-benchmark/mteb Leaderboard at 🤗: https://huggingface.co/spaces/mteb/leaderboard

Читать дальше
Towards NLP

4 года назад

Method for Fighting Harmful Multilingual Content PhD work by Daryna Dementieva: 1. Fake News Detection using Multilingual Evidence 2. Texts Detoxification www.skoltech.ru/app/dat…sis5.pdf
Towards NLP

4 года назад

DALLE-2 Without Waitlist openai.com/blog/da…waitlist

DALL·E Now Available Without Waitlist

New users can start creating straight away. Lessons learned from deployment and improvements to our safety systems make wider availability possible. Sign up Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M users

OpenAI
Towards NLP

4 года назад

Dataset No Language Left Behind Allen AI reproduced dataset which was used for training NLLB translation model by Meta AI. 450Gb of parallel data for 200 languages are available online in 🤗: https://huggingface.co/datasets/allenai/nllb The paper with detaiped description: https://arxiv.org/pdf/2207.04672.pdf

allenai/nllb · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co
Towards NLP

4 года назад

Suggesty It is very inspiring to find products that wrap AI advanced technologies into elegant solutions. Suggesty is a browser extension that shows you a quick answer to your question while search engine can fail to generate a snippet. You can find any type of answers from general question to recipes or even grammar fixes. What is inside? GPT-3 API. Multiple language covering. You are welcome to test it! Download extension. Page of the product. For feedback, contact Yurii Rebryk (@rebryk)
Towards NLP

4 года назад

ALPS 2022 Advanced Language Processing Winter School will take place again this year 16-20 January 2023. ONLINE I personally participated in this school two previous runs. Highly recommended from because: * top-level lectures from NLP professionals; * good possibility to have your (maybe first) poster session and gain conference skills; * NLP professionals usually attend student poster sessions (!), ask very interesting questions that you can then use in your further research steps; * nice opportunity to build a network with NLP PhDs or Masters student all over the world! BUT: the deadline is already super soon! Take your chance to apply before 16th September 2022 The website: https://lig-alps.imag.fr/

Читать дальше
Towards NLP

4 года назад

Advanced NLP course One more course from CMU ~~again~~. Ongoing course about Advanced NLP techniques. The uploaded lectures are now more introductory but further there will be topics as multi-tasking, prompt engineering, debugging of NLP models, and adversarial attacks. Syllabus: phontron.com/class/a…ule.html Playlist of recorded lectures: youtube.com/playlis…playlist

CMU Advanced NLP 2022 - YouTube

YouTube
Towards NLP

4 года назад

#notOnlyNLP The recommendation from the bottom of my heart — the video about the representation of women in IT (unfortunately, only in Russian). While the gender situation in IT is way better then a decade before, it is still hard to name it an equality. You still even need bravery to point out the issue💪 In the video, the statistics is discussed, personal stories are shown, and a scientific dressing of the situation from sociologists and neuroscientists is present. Worth to take a look to understand more the other side and to inspire not to look at gender while choosing a profession. Thanks to @E_Batanina for discovering this link and congrats on being part of this nice project👏 So, please, watch here: https://youtu.be/qAU3pBhBXMA P.S. This recommendation is not made to promote Yandex company, but to promote IT for everyone.

Математическое неравенство

«Поставлю пятерку за красивые глаза», «ничего себе, ты программистка», «программирование — работа для парней» — эти стереотипы с детства преследуют женщин, которые выбрали карьеру в IT. Такие установки вредят не только им, но и компаниям, которым не хватает талантливых специалистов в разных областях. Как это предотвратить? Женщины разных возрастов и специальностей рассказывают о семье, учёбе, материнстве и роли руководителя. Вместе с экспертами они разбираются, как поменять воспитание детей, чтобы изменить целую индустрию. Режиссёр — Евгения Кашапова Сценарий — Анна Косинская, Ольга Гладышева Продюсер — Виктория Еке, Анна Косинская, Александра Юданова Оператор — Людмила Куропятникова, Филипп Задорожный Композитор — Ростислав Иванов, Анатолий Капитонов Монтаж — Вячеслав Кулешов, Владислав Костин Производство ASAP PRODUCTION.

YouTube

Читать дальше
Реклама

Рекламный пост
Towards NLP

4 года назад

Multilingual NLI Dataset New dataset contains 2 730 000 NLI text pairs in 26 languages💪It was created from previous English dataset using the latest open-source machine translation model. The dataset can be loaded here . Natural Language Inference (NLI) is the task of determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.

MoritzLaurer/multilingual-NLI-26lang-2mil7 · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co
Towards NLP

4 года назад

Efficient Methods for Natural Language Processing: A Survey In the raise of big models and big data required for its training, it is important to know which techniques can help you to optimize your models cooking🍳, The paper described different efficiency helping methods for each step of models preparing: 1. Data collection & Preprocessing. 2. Models Design. 3. Pre-training. 4. Fine-tuning. 5. Inference. 6. Model selection. Paper [here].
Towards NLP

4 года назад

Multilingual NLP discussion group A nice platform -- a new group on #multilinguality so we could have a place in telegram to discuss important new papers and share cool results during conferences, meetups and public discussions! Group is public, feel free to share the link! https://t.me/multilingual_nlp By @rybolos_channel

Multilingual NLP🦾🧠🌏🌸

No AGI without Multilingual NLU! Group rules: - please keep the discussion classy - no spam/unrelated shitposting

Telegram