Finetuned Language Models are Zero-Shot Learners
🚀 ProductivityThis paper explores a simple method for improving the zero-shot learning abilities of language model...
Go to ICLR 2022 Conference homepage Finetuned Language Models are Zero-Shot Learners Jason Wei , Maarten Bosma , Vincent Zhao , Kelvin Guu , Adams Wei Yu , Brian Lester , Nan Du , Andrew M. Dai , Quoc V Le Published: 28 Jan 2022, Last Modified: 12 Oct 2025 ICLR 2022 Oral Readers: Everyone Keywords : natural language processing, zero-shot learning, language models Abstract : This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning—finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction tune it on over 60 NLP datasets verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 datasets that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning. One-sentence Summary : "Instruction tuning", which finetunes language models on a collection of tasks described via instructions, substantially boosts zero-shot performance on unseen tasks. Supplementary Material : zip Community Implementations : [ 1 code implementation](https://www.catalyzex.com/paper/finetuned-language-models-are-zero-shot/code) 14 Replies Loading OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors . © 2026 OpenReview
Related Tools

Claude
Claude is Anthropic

Stability AI
Multimodal media generation and editing tools designed for the best in the business. No creative cha...

DALL·E 3
DALL·E 3 understands significantly more nuance and detail than our previous systems, allowing you to...

Put AI agents to work for marketing | Jasper
Orchestrate intelligent agents to run end-to-end marketing workflows—delivering speed, control, and ...