Post-Training: A Practical Guide for AI Engineers and Developers

§1

What you’ll learn

Fine-tune models on curated datasets using supervised fine-tuning (SFT), Low-Rank Adaptation (LoRA), and Quantised LoRA (QLoRA) without destroying the base model’s general capabilities

Apply reinforcement learning from verifiable rewards (RLVR) and modern preference optimisation methods including Direct Preference Optimisation (DPO), Odds Ratio Preference Optimisation (ORPO), and beyond, to shape model behaviour

Evaluate models rigorously: design benchmarks, detect regression, and measure quality claims that survive scrutiny

Adapt models to specialised domains — from clinical language to legal text — turning general capability into a defensible competitive advantage

Train agentic models that take sequences of actions reliably, not just models that talk about taking actions

Quantise and compress fine-tuned models for deployment without sacrificing the gains you trained for

§2

Why this book

Written with the enterprise practitioner in mind.

The literature on post-training is focused either on small educational use cases that do not consider enterprise realities, or presuppose the workflow of foundation labs. There’s nothing for the crucial middle: enterprise practitioners with real compute budgets who need to customise, align and deploy AI at scale. This book fills that gap.

Trade-offs, not best practices.

The book treats post-training decisions as trade-offs rather than best practices, helping practitioners match techniques to constraints. It provides decision frameworks, clearly documenting trade-offs and benefits.

From principles to practice.

Combines technical depth with strategic context. Includes companion Jupyter notebooks covering practical implementation. Shows how to embed proprietary knowledge, organisational values and domain expertise into foundation models.

§3

About the author

Chris von Csefalvay is a Principal at HCLTech’s AI Practice, where he leads post-training research and clinical intelligence. He has held senior data science leadership roles across major enterprises, published extensively on distributed computing for ML, and designed language models for applications ranging from pharmacovigilance to social dynamics. He is also the author of Computational Modeling of Infectious Disease (Elsevier, 2023), a monograph on computational epidemiology. He holds degrees from the University of Oxford and Cardiff University and is a Fellow of the Royal Society for Public Health and Senior Member of IEEE.

chrisvoncsefalvay.com

§4

What’s inside

Part I: The Foundation

Chapter 1: Post-Training Essentials: What It Is and Why It Matters
Chapter 2: Prerequisites for Success: Before You Fine-Tune

Part II: The Tools

Chapter 3: Supervised Fine-Tuning: The Foundation Technique
Chapter 4: Reinforcement Learning: Better Each Time
Chapter 5: Preference Optimization: Modern Alternatives to PPO
Chapter 6: Evaluation Strategies: Measuring Model Quality

Part III: The Craft

Chapter 7: Efficiency Techniques: Quantization and Compression
Chapter 8: Domain Adaptation: Make It Yours
Chapter 9: Agentic Models: Deeds, Not Words
Chapter 10: Reasoning Capabilities: Training for Complex Thought

Part IV: The Frontier

Chapter 11: Synthetic Training: Self-Play and Generated Data
Chapter 12: Multimodal Systems: Post-Training Beyond Text
Chapter 13: Future Directions: What Comes Next

§5

Companion skill for Claude Code

A free skill that turns Claude Code into a post-training advisor. Ask it whether to fine-tune, which technique to use, or how to diagnose training problems — and get quick, opinionated guidance drawn from the book’s decision frameworks, with chapter references for going deeper.

A navigator, not a replacement for the book. It gives you the key insight, the main gotcha, and points you to the right chapter — like a well-read colleague who’s already highlighted the important parts.

“Should I fine-tune or just use RAG?”

“DPO vs PPO — which should I use?”

“My loss curve is oscillating wildly”

“What LoRA rank should I start with?”

“How do I evaluate my fine-tuned model?”

“I need to adapt a model for medical text”

mkdir -p ~/.claude/skills/post-training-guide
curl -o ~/.claude/skills/post-training-guide/SKILL.md \
  https://posttraining.guide/post-training-guide-skill.md

Download SKILL.md ↓Free · Works with Claude Code

§6

Get the book

Early Access is available now from No Starch Press. The full print edition ships Fall 2026.

Get Early Access at No Starch Press →

Want to be notified when the print edition ships? Leave your email and I’ll send a single note when it’s out.

Post-training is where models stop being impressive and start being useful.

§6

Frequently asked questions

What is post-training?

Post-training is the process of adapting a pre-trained foundation model to specific tasks, domains, or behavioural requirements. Unlike prompting or RAG, post-training permanently modifies the model’s weights through techniques such as supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and direct preference optimisation (DPO).

What is the difference between fine-tuning and post-training?

Fine-tuning (specifically supervised fine-tuning or SFT) is one technique within the broader post-training toolkit. Post-training also includes reinforcement learning methods (RLHF, RLVR, PPO), preference optimisation (DPO, ORPO, KTO), domain adaptation, agentic training, and evaluation — the full lifecycle of turning a foundation model into a production-ready system.

Who is this book for?

Enterprise AI engineers and ML practitioners who need to customise, align, and deploy language models at scale. The book assumes familiarity with Python, PyTorch, and the fundamentals of machine learning, but does not require prior experience with post-training specifically.

What post-training techniques does the book cover?

The book covers supervised fine-tuning (SFT) with LoRA and QLoRA, reinforcement learning from verifiable rewards (RLVR) with GRPO, preference optimisation (DPO, ORPO, KTO), model evaluation and benchmark design, domain adaptation, agentic model training, quantisation, multimodal alignment, and synthetic data generation.

Do I need access to GPUs to follow along?

The techniques are taught with practical Jupyter notebooks designed to work with real but manageable compute budgets. Many examples use parameter-efficient methods like LoRA that can run on a single consumer GPU. The book explicitly targets enterprise practitioners, not foundation model labs with thousands of GPUs.