Timo Schick

(picture similar)

posts
🦕 Using Big Language Models To Generate Entire Datasets From Scratch
- Research
May 19, 2021
This post discusses how DINO (Datasets From Instructions) can be used to distill the zero-shot knowledge of big language models like GPT3 into much smaller models without requiring any data or access to model internals.
🧩 How to Outperform GPT-3 by Combining Task Descriptions With Supervised Learning
- Explanatory Notes
Oct 23, 2020
This post explains Pattern-Exploiting Training (PET), a method that can be used to train Natural Language Processing models from less than 100 examples. It is based on a recent paper in which we show that PET outperforms GPT-3 on SuperGLUE, a challenging Natural Language Understanding benchmark, while requiring 99.9% fewer parameters.
✨ May 2020 NLP Papers: Synthesizer, RAG, Movement Pruning, GPT-3
- Reading List
Jun 3, 2020
This is a list of NLP papers I enjoyed reading, containing only papers that have been published on arXiv in May 2020. Papers are loosely grouped by topic. If you find this list helpful, think that a great paper is missing or have some other comment, let me know!
✨ April 2020 NLP Papers: Longformer, LSRA, MixText, Blender
- Reading List
May 4, 2020
This is a list of NLP papers matching two criteria: (1) they’ve been published on arXiv in April 2020 and (2) I enjoyed reading them. Papers are loosely grouped by topic. If you find this list helpful or think that a great paper is missing, let me know!
📃 Byte Pair Encoding is Suboptimal for Language Model Pretraining
- Paper Picks
Apr 14, 2020
My notes from reading “Byte Pair Encoding is Suboptimal for Language Model Pretraining” by Kaj Bostrom and Greg Durrett, which compares tokenization methods for language model pretraining.

Timo Schick

posts

🦕 Using Big Language Models To Generate Entire Datasets From Scratch

🧩 How to Outperform GPT-3 by Combining Task Descriptions With Supervised Learning

✨ May 2020 NLP Papers: Synthesizer, RAG, Movement Pruning, GPT-3

✨ April 2020 NLP Papers: Longformer, LSRA, MixText, Blender

📃 Byte Pair Encoding is Suboptimal for Language Model Pretraining