This post discusses how DINO (Datasets From Instructions) can be used to distill the zero-shot knowledge of big language models like GPT3 into much smaller models without requiring any data or access to model internals.
-
posts
-
🦕 Using Big Language Models To Generate Entire Datasets From Scratch
-
🧩 How to Outperform GPT-3 by Combining Task Descriptions With Supervised Learning
This post explains Pattern-Exploiting Training (PET), a method that can be used to train Natural Language Processing models from less than 100 examples. It is based on a recent paper in which we show that PET outperforms GPT-3 on SuperGLUE, a challenging Natural Language Understanding benchmark, while requiring 99.9% fewer parameters.
-
✨ May 2020 NLP Papers: Synthesizer, RAG, Movement Pruning, GPT-3
This is a list of NLP papers I enjoyed reading, containing only papers that have been published on arXiv in May 2020. Papers are loosely grouped by topic. If you find this list helpful, think that a great paper is missing or have some other comment, let me know!
-
✨ April 2020 NLP Papers: Longformer, LSRA, MixText, Blender
This is a list of NLP papers matching two criteria: (1) they’ve been published on arXiv in April 2020 and (2) I enjoyed reading them. Papers are loosely grouped by topic. If you find this list helpful or think that a great paper is missing, let me know!
-
📃 Byte Pair Encoding is Suboptimal for Language Model Pretraining
My notes from reading “Byte Pair Encoding is Suboptimal for Language Model Pretraining” by Kaj Bostrom and Greg Durrett, which compares tokenization methods for language model pretraining.