Training Economics

Engine Room Article 3: What Training Costs and What It Buys


The Cost Structure

Training a large language model from scratch involves substantial compute costs - tens of millions of dollars for frontier models. But compute is just one component.

You also need large, clean datasets (scarce for most domains), ML engineering talent comfortable with distributed training, infrastructure that can handle the workload, and time for iteration and debugging.

Compute costs get the headlines, but data quality, talent, and iteration time often determine whether training is viable.

What Training Actually Produces

Training compresses patterns from data into model weights. The model learns statistical relationships between tokens - what tends to follow what, in what contexts.

This compression is lossy. Information gets generalized, blended, and sometimes lost. What emerges is capability within the distribution of training data.

The Fine-Tuning Option

Most organizations don’t need to train from scratch - they can adapt existing models through fine-tuning. This is more accessible, but comes with its own tradeoffs.

Fine-tuning adjusts weights to shift model behavior toward specific tasks. It works well for adapting style, format, and focus. It’s less effective at adding genuinely new knowledge or capabilities.

Fine-tuning shifts focus and style effectively. It’s less effective at adding genuinely new capabilities that weren’t in the base model.

The RAG Alternative

Retrieval-Augmented Generation takes a different approach: instead of encoding knowledge in weights, keep it external and retrieve relevant information at query time.

This is often more practical for proprietary knowledge - cheaper than training, easier to update, more controllable. But it introduces different challenges: retrieval quality, context limits, and the complexity of knowing what to retrieve.


Training economics shape what’s viable. Understanding the full cost structure - not just compute - helps identify realistic approaches.

Related: 07-source—engine-room-series