Foundation Model Engineering

Last update: 2026-04-20

Foundation Model Engineering is a technical textbook for readers who want to understand how modern foundation models actually work, why the stack evolved the way it did, and what engineering trade-offs appear when those ideas meet real systems.

This project is written primarily for AI engineers and research-oriented readers who want to move past surface-level API usage and build a deeper mental model of architectures, training pipelines, inference systems, retrieval stacks, evaluation loops, and agentic workflows.

The goal is not to provide scattered tips or isolated definitions. The goal is to explain the historical flow, mathematical ideas, and systems constraints that connect topics like attention, MoE, RLHF, multimodality, long-context serving, RAG, and agents into one engineering narrative.

Why read this

If you have ever wondered why the field moved from RNNs to Transformers, why some models are dense while others are sparse, why inference systems care so much about KV cache and batching, or why evaluation and alignment are product problems rather than just research topics, this book is meant to help you connect those dots.

Instead of treating each topic as an isolated trend, the book tries to show how modeling ideas, systems constraints, and product requirements shape one another. The payoff is not just more terminology. It is better engineering judgment.

Who this is for

AI Engineers
Readers building or evaluating LLM systems, inference stacks, RAG systems, or agentic products.

Research-Oriented Readers
Readers who want a broad but technically grounded understanding of the foundation model landscape, including current architecture and systems trends.

What to expect

You will find rigorous conceptual explanations, concept-focused PyTorch examples, short quizzes for consolidation, and interactive visualizers for topics that are easier to understand by manipulating them directly. The material is designed to help you reason about quality, memory, throughput, latency, scaling, and alignment trade-offs, not just memorize terminology.

This is not a lightweight beginner introduction. If you are looking for a first overview of AI or a prompt-engineering-only guide, this book will probably feel denser than necessary. It is intentionally written for readers who want depth.

A Living Document

AI changes extremely quickly, so some details in a project like this may need revision as new papers, systems, and products appear. If you spot an outdated section, an awkward explanation, a typo, or a better reference, contributions are always welcome.

Pull requests that improve accuracy, pedagogy, examples, localization, or overall clarity are appreciated. The goal is for this to remain a useful long-term resource, not a frozen snapshot.

1. The Evolution of Intelligence

2. The Sequence Modeling Era

Foundation Model Engineering

Why read this

Who this is for

What to expect

A Living Document

Table of Contents

1. The Evolution of Intelligence

2. The Sequence Modeling Era

3. The Transformer Deep Dive

4. LLM Architectures & Paradigms

5. Scaling Mixture of Experts (MoE)

6. Foundation Model Pre-training

7. Training Optimization & Systems

8. Scaling Laws & Compute Optimality

9. Post-training: SFT & Instruction Tuning

10. Alignment: RLHF & Direct Preference

11. Multimodal Learning

12. LLM Inference Optimization

13. Model Compression & Quantization

14. RAG (Retrieval Augmented Generation)

15. Reasoning & Search-time Scaling

16. Agentic AI & Tools

17. AI Evaluation & Benchmarking

18. AI Safety & Alignment Research

19. Interpretability & Science of LLMs

20. Next Generation: SSM & Beyond