The Training Pipeline

AI training happens in stages, each building on the last.

1

Pretraining

The model learns how to produce responses through self-supervised learning: predicting missing or next words in billions of text examples. This teaches the model basic grammar, vocabulary, and how sentences fit together.

Who does this: Usually the companies creating the models. You might occasionally help by labeling or rating raw training data.

2

Fine Tuning

Now the model learns how to respond appropriately. It's given realistic prompts like user questions, coding tasks, and support tickets. These are used as training for the model to produce ideal outputs, not just statistically likely ones.

Reinforcement Learning from Human Feedback (RLHF) is crucial here. Early models are often confident but wrong. Human feedback teaches them what answers are actually helpful, safe, and accurate.

circle-info

This is where you’ll spend most of your time. You might evaluate responses, rank competing answers, flag misleading or biased content, and provide feedback that nudges the model toward higher quality.

3

Continuous Improvement

Even after deployment, models encounter new edge cases, confusing prompts, or safety issues. These become new training examples for ongoing improvement.

You might help develop challenging test cases, evaluate how models handle tricky situations, and label results for additional training.

Training isn't a one-time event: it's a continuous loop of data, feedback, and improved responses.

Last updated