No Starch Press · Early Access Available Now

Post-Training:
A Practical Guide for
AI Engineers and Developers

Capable by default. Reliable by design.

Post-training is the process of adapting a pre-trained foundation model to specific tasks, domains, or behavioural requirements through techniques such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and direct preference optimisation (DPO). Unlike prompting or retrieval-augmented generation (RAG), post-training permanently modifies the model’s weights to embed new capabilities, domain knowledge, or alignment constraints.

If you’re a practitioner who has watched a promising AI demo fail to survive contact with production — where prompting hits its ceiling, retrieval isn’t enough, and the model still can’t be trusted with your domain — post-training is what you’ve been missing. This is a practical guide to turning foundation models into production-ready systems: reshaping behaviour, aligning to your values, and deploying with confidence. Each technique is taught concept-first, then implementation-through-code, so you understand not just what to run, but what you’re actually changing inside the model.

Get Early Access →Early Access PDF available now · Print edition Fall 2026
§1

What you’ll learn

Fine-tune models on curated datasets using supervised fine-tuning (SFT), Low-Rank Adaptation (LoRA), and Quantised LoRA (QLoRA) without destroying the base model’s general capabilities

Apply reinforcement learning from verifiable rewards (RLVR) and modern preference optimisation methods including Direct Preference Optimisation (DPO), Odds Ratio Preference Optimisation (ORPO), and beyond, to shape model behaviour

Evaluate models rigorously: design benchmarks, detect regression, and measure quality claims that survive scrutiny

Adapt models to specialised domains — from clinical language to legal text — turning general capability into a defensible competitive advantage

Train agentic models that take sequences of actions reliably, not just models that talk about taking actions

Quantise and compress fine-tuned models for deployment without sacrificing the gains you trained for

§2

Why this book

Written with the enterprise practitioner in mind.

The literature on post-training is focused either on small educational use cases that do not consider enterprise realities, or presuppose the workflow of foundation labs. There’s nothing for the crucial middle: enterprise practitioners with real compute budgets who need to customise, align and deploy AI at scale. This book fills that gap.

Trade-offs, not best practices.

The book treats post-training decisions as trade-offs rather than best practices, helping practitioners match techniques to constraints. It provides decision frameworks, clearly documenting trade-offs and benefits.

From principles to practice.

Combines technical depth with strategic context. Includes companion Jupyter notebooks covering practical implementation. Shows how to embed proprietary knowledge, organisational values and domain expertise into foundation models.

§3

About the author

Chris von Csefalvay is a Principal at HCLTech’s AI Practice, where he leads post-training research and clinical intelligence. He has held senior data science leadership roles across major enterprises, published extensively on distributed computing for ML, and designed language models for applications ranging from pharmacovigilance to social dynamics. He is also the author of Computational Modeling of Infectious Disease (Elsevier, 2023), a monograph on computational epidemiology. He holds degrees from the University of Oxford and Cardiff University and is a Fellow of the Royal Society for Public Health and Senior Member of IEEE.

chrisvoncsefalvay.com
§4

What’s inside

Part I: The Foundation

  • Chapter 1: Post-Training Essentials: What It Is and Why It Matters
  • Chapter 2: Prerequisites for Success: Before You Fine-Tune

Part II: The Tools

  • Chapter 3: Supervised Fine-Tuning: The Foundation Technique
  • Chapter 4: Reinforcement Learning: Better Each Time
  • Chapter 5: Preference Optimization: Modern Alternatives to PPO
  • Chapter 6: Evaluation Strategies: Measuring Model Quality

Part III: The Craft

  • Chapter 7: Efficiency Techniques: Quantization and Compression
  • Chapter 8: Domain Adaptation: Make It Yours
  • Chapter 9: Agentic Models: Deeds, Not Words
  • Chapter 10: Reasoning Capabilities: Training for Complex Thought

Part IV: The Frontier

  • Chapter 11: Synthetic Training: Self-Play and Generated Data
  • Chapter 12: Multimodal Systems: Post-Training Beyond Text
  • Chapter 13: Future Directions: What Comes Next
§5

Companion skill for Claude Code

A free skill that turns Claude Code into a post-training advisor. Ask it whether to fine-tune, which technique to use, or how to diagnose training problems — and get quick, opinionated guidance drawn from the book’s decision frameworks, with chapter references for going deeper.

A navigator, not a replacement for the book. It gives you the key insight, the main gotcha, and points you to the right chapter — like a well-read colleague who’s already highlighted the important parts.

Should I fine-tune or just use RAG?
DPO vs PPO — which should I use?
My loss curve is oscillating wildly
What LoRA rank should I start with?
How do I evaluate my fine-tuned model?
I need to adapt a model for medical text
mkdir -p ~/.claude/skills/post-training-guide curl -o ~/.claude/skills/post-training-guide/SKILL.md \ https://posttraining.guide/post-training-guide-skill.md
Download SKILL.md ↓Free · Works with Claude Code
§6

Get the book

Early Access is available now from No Starch Press. The full print edition ships Fall 2026.

Get Early Access at No Starch Press →

Want to be notified when the print edition ships? Leave your email and I’ll send a single note when it’s out.

Post-training is where models stop being impressive and start being useful.

§6

Frequently asked questions

What is post-training?

Post-training is the process of adapting a pre-trained foundation model to specific tasks, domains, or behavioural requirements. Unlike prompting or RAG, post-training permanently modifies the model’s weights through techniques such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and direct preference optimisation (DPO).

What is the difference between fine-tuning and post-training?

Fine-tuning (specifically supervised fine-tuning or SFT) is one technique within the broader post-training toolkit. Post-training also includes reinforcement learning methods (RLHF, RLVR, PPO), preference optimisation (DPO, ORPO, KTO), domain adaptation, agentic training, and evaluation — the full lifecycle of turning a foundation model into a production-ready system.

Who is this book for?

Enterprise AI engineers and ML practitioners who need to customise, align, and deploy language models at scale. The book assumes familiarity with Python, PyTorch, and the fundamentals of machine learning, but does not require prior experience with post-training specifically.

What post-training techniques does the book cover?

The book covers supervised fine-tuning (SFT) with LoRA and QLoRA, reinforcement learning from verifiable rewards (RLVR) with GRPO, preference optimisation (DPO, ORPO, KTO), model evaluation and benchmark design, domain adaptation, agentic model training, quantisation, multimodal alignment, and synthetic data generation.

Do I need access to GPUs to follow along?

The techniques are taught with practical Jupyter notebooks designed to work with real but manageable compute budgets. Many examples use parameter-efficient methods like LoRA that can run on a single consumer GPU. The book explicitly targets enterprise practitioners, not foundation model labs with thousands of GPUs.