What you’ll learn
Fine-tune models on curated datasets using supervised fine-tuning (SFT), Low-Rank Adaptation (LoRA), and Quantised LoRA (QLoRA) without destroying the base model’s general capabilities
Apply reinforcement learning from verifiable rewards (RLVR) and modern preference optimisation methods including Direct Preference Optimisation (DPO), Odds Ratio Preference Optimisation (ORPO), and beyond, to shape model behaviour
Evaluate models rigorously: design benchmarks, detect regression, and measure quality claims that survive scrutiny
Adapt models to specialised domains — from clinical language to legal text — turning general capability into a defensible competitive advantage
Train agentic models that take sequences of actions reliably, not just models that talk about taking actions
Quantise and compress fine-tuned models for deployment without sacrificing the gains you trained for
Why this book
Written with the enterprise practitioner in mind.
The literature on post-training is focused either on small educational use cases that do not consider enterprise realities, or presuppose the workflow of foundation labs. There’s nothing for the crucial middle: enterprise practitioners with real compute budgets who need to customise, align and deploy AI at scale. This book fills that gap.
Trade-offs, not best practices.
The book treats post-training decisions as trade-offs rather than best practices, helping practitioners match techniques to constraints. It provides decision frameworks, clearly documenting trade-offs and benefits.
From principles to practice.
Combines technical depth with strategic context. Includes companion Jupyter notebooks covering practical implementation. Shows how to embed proprietary knowledge, organisational values and domain expertise into foundation models.
What’s inside
Part I: The Foundation
- Chapter 1: Post-Training Essentials: What It Is and Why It Matters
- Chapter 2: Prerequisites for Success: Before You Fine-Tune
Part II: The Tools
- Chapter 3: Supervised Fine-Tuning: The Foundation Technique
- Chapter 4: Reinforcement Learning: Better Each Time
- Chapter 5: Preference Optimization: Modern Alternatives to PPO
- Chapter 6: Evaluation Strategies: Measuring Model Quality
Part III: The Craft
- Chapter 7: Efficiency Techniques: Quantization and Compression
- Chapter 8: Domain Adaptation: Make It Yours
- Chapter 9: Agentic Models: Deeds, Not Words
- Chapter 10: Reasoning Capabilities: Training for Complex Thought
Part IV: The Frontier
- Chapter 11: Synthetic Training: Self-Play and Generated Data
- Chapter 12: Multimodal Systems: Post-Training Beyond Text
- Chapter 13: Future Directions: What Comes Next
Companion skill for Claude Code
A free skill that turns Claude Code into a post-training advisor. Ask it whether to fine-tune, which technique to use, or how to diagnose training problems — and get quick, opinionated guidance drawn from the book’s decision frameworks, with chapter references for going deeper.
A navigator, not a replacement for the book. It gives you the key insight, the main gotcha, and points you to the right chapter — like a well-read colleague who’s already highlighted the important parts.
mkdir -p ~/.claude/skills/post-training-guide
curl -o ~/.claude/skills/post-training-guide/SKILL.md \
https://posttraining.guide/post-training-guide-skill.mdGet the book
Early Access is available now from No Starch Press. The full print edition ships Fall 2026.
Get Early Access at No Starch Press →Want to be notified when the print edition ships? Leave your email and I’ll send a single note when it’s out.
Post-training is where models stop being impressive and start being useful.
Frequently asked questions
What is post-training?
Post-training is the process of adapting a pre-trained foundation model to specific tasks, domains, or behavioural requirements. Unlike prompting or RAG, post-training permanently modifies the model’s weights through techniques such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and direct preference optimisation (DPO).
What is the difference between fine-tuning and post-training?
Fine-tuning (specifically supervised fine-tuning or SFT) is one technique within the broader post-training toolkit. Post-training also includes reinforcement learning methods (RLHF, RLVR, PPO), preference optimisation (DPO, ORPO, KTO), domain adaptation, agentic training, and evaluation — the full lifecycle of turning a foundation model into a production-ready system.
Who is this book for?
Enterprise AI engineers and ML practitioners who need to customise, align, and deploy language models at scale. The book assumes familiarity with Python, PyTorch, and the fundamentals of machine learning, but does not require prior experience with post-training specifically.
What post-training techniques does the book cover?
The book covers supervised fine-tuning (SFT) with LoRA and QLoRA, reinforcement learning from verifiable rewards (RLVR) with GRPO, preference optimisation (DPO, ORPO, KTO), model evaluation and benchmark design, domain adaptation, agentic model training, quantisation, multimodal alignment, and synthetic data generation.
Do I need access to GPUs to follow along?
The techniques are taught with practical Jupyter notebooks designed to work with real but manageable compute budgets. Many examples use parameter-efficient methods like LoRA that can run on a single consumer GPU. The book explicitly targets enterprise practitioners, not foundation model labs with thousands of GPUs.





