LLM Core Papers | LLM核心论文
Large Language Models (LLMs) are cool, yet the sheer volume of research, with hundreds of new papers daily, can be overwhelming. Few papers, however, truly endure. The selection below is curated through extensive discussions with my friends, and have shaped my foundational perspective in this field. I hope this paper list can offer some help to those new to the field. And let me know how you think about this paper collection!
LLM虽然是灌水天堂,每天都有百来篇论文上arxiv,但大浪淘沙,真正有信息量、而又通用深刻的工作并不多。以下这些论文是和朋友多次讨论后留下的核心工作,他们很大程度上构成了我思考NLP问题的框架。
- Sentiment Neuron: Learning to Generate Reviews and Discovering Sentiment
- GPT-1: Improving Language Understanding by Generative Pre-Training
- Scaling Law: Scaling Laws for Neural Language Models
- GPT-3: Language Models are Few-Shot Learners
Value Alignment 价值对齐
- InstructGPT: Training language models to follow instructions with human feedback
- Constitutional AI: Harmlessness from AI Feedback
Architecture 架构
- Transformer: Attention is All You Need
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Efficient Tuning 轻量微调
Inference-time Algorithm 推理时算法
- Chain-of-Thought: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Benchmark 榜单
- MMLU: Measuring Massive Multitask Language Understanding
- MATH: Measuring Mathematical Problem Solving With the MATH Dataset
Multi-modality 多模态
- Multimodal Few-Shot Learning with Frozen Language Models
- CLIP: Learning Transferable Visual Models From Natural Language Supervision
- Flamingo: a Visual Language Model for Few-Shot Learning
High Concept 高观点
- Pretrained Transformers as Universal Computation Engines
- Large Language Models as General Pattern Machines
- An Observation on Generalization