Digest 2021-12 # ML / Papers Evaluating Syntactic Abilities of Language Models - ai.googleblog.com/2021/12…-of.html Efficiently and effectively scaling up language model pretraining for best language representation model on GLUE and SuperGLUE - www.microsoft.com/en-us/r…uperglue Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize - ai.googleblog.com/2021/12…ncy.html - TokenLearner is a learnable module that takes an image-like tensor (i.e., input) and generates a small set of tokens. - Saves memory and computation by half or more w/o loss of accuracy - Inserting TokenLearner after the initial quarter of the network (at 1/4) achieves almost identical accuracies as the baseline General and Scalable Parallelization for Neural Networks - ai.googleblog.com/2021/12…ion.html The Death of Feature Engineering is Greatly Exaggerated - petewarden.com/2021/12…ggerated A Fast WordPiece Tokenization System - ai.googleblog.com/2021/12…tem.html - but why? More Efficient In-Context Learning with GLaM - ai.googleblog.com/2021/12…ith.html - new 1T param MOE model Interpretable Deep Learning for Time Series Forecasting - ai.googleblog.com/2021/12…ime.html Why you should be using active learning to build ML - humanloop.com/blog/wh…learning Training Machine Learning Models More Efficiently with Dataset Distillation - ai.googleblog.com/2021/12…ore.html Farcical Self-Delusion - blog.piekniewski.info/2021/12…delusion How a Kalman filter works, in pictures - www.bzarg.com/p/how-a…pictures AI and the Future of Work: What We Know Today - thegradient.pub/artific…pectives WebGPT: Improving the factual accuracy of language models through web browsing - openai.com/blog/im…accuracy Facebook AI’s WMT21 News Translation Task Submission - http://arxiv.org/abs/2108.03265 #digest