This post explains Pattern-Exploiting Training (PET), a method that can be used to train Natural Language Processing models from less than 100 examples. It is based on a recent paper in which we show that PET outperforms GPT-3 on SuperGLUE, a challenging Natural Language Understanding benchmark, while requiring 99.9% fewer parameters.
🧩 How to Outperform GPT-3 by Combining Task Descriptions With Supervised Learning
✨ May 2020 NLP Papers: Synthesizer, RAG, Movement Pruning, GPT-3
This is a list of NLP papers I enjoyed reading, containing only papers that have been published on arXiv in May 2020. Papers are loosely grouped by topic. If you find this list helpful, think that a great paper is missing or have some other comment, let me know!
✨ April 2020 NLP Papers: Longformer, LSRA, MixText, Blender
This is a list of NLP papers matching two criteria: (1) they’ve been published on arXiv in April 2020 and (2) I enjoyed reading them. Papers are loosely grouped by topic. If you find this list helpful or think that a great paper is missing, let me know!
📃 Byte Pair Encoding is Suboptimal for Language Model Pretraining
My notes from reading “Byte Pair Encoding is Suboptimal for Language Model Pretraining” by Kaj Bostrom and Greg Durrett, which compares tokenization methods for language model pretraining.