# rlhf

**RLHF** is the primary technique used to align large language models (LLMs) with human values, preferences, and safety standards. It is how companies like OpenAI, Anthropic, and Google fine-tune their models based on human judgment rather than automated metrics alone.

## What is RLHF?

In a standard RLHF workflow:

1. A base AI model generates multiple candidate responses to a prompt.
2. Human trainers review those responses and rank or select the best one.
3. A **reward model** is trained on those human preferences.
4. The base model is updated using reinforcement learning to produce responses the reward model — and therefore humans — would rate highly.

The result is a model that is better calibrated to human expectations: more helpful, more accurate, and safer.

## How Trainers Contribute

As a Folio trainer participating in RLHF projects, your primary task is **preference annotation** — reviewing pairs or sets of AI responses and indicating which is better, and why.

Typical RLHF tasks include:

* **Response ranking** — Given two or more AI outputs, select the best response based on accuracy, helpfulness, and safety
* **Response rating** — Score a single response on defined criteria (e.g., 1–5 scale for medical accuracy)
* **Rationale writing** — Provide a brief written explanation for your ranking or rating
* **Failure identification** — Flag responses that are factually wrong, harmful, or incomplete

In healthcare AI contexts, your clinical expertise directly shapes the reward signal that trains the model. A physician ranking clinical summaries is providing information that no automated system can replicate.

## Why RLHF Matters in Healthcare

Healthcare AI systems carry high stakes. A model used for clinical decision support, patient education, or diagnostic assistance must produce accurate, safe responses. RLHF with domain-expert annotators is the primary mechanism for achieving that standard.

Without expert human feedback, models may:

* Produce responses that sound medically plausible but are factually incorrect
* Omit critical safety warnings
* Fail to recognize rare but serious conditions
* Prioritize confident-sounding language over accuracy

Healthcare professionals on Folio are uniquely qualified to catch these failure modes.

## Course: RLHF 1

| Detail       | Info       |
| ------------ | ---------- |
| **Duration** | 15 minutes |
| **Lessons**  | 3          |
| **Format**   | Self-paced |

**RLHF 1** covers the foundational concepts of reinforcement learning from human feedback, explains how trainers fit into the RLHF pipeline, and walks through practical examples of preference annotation tasks.

### To access RLHF 1:

1. Go to **Learn** in the left sidebar.
2. Scroll to the **RLHF** section.
3. Click **Start Lesson** on the RLHF 1 card.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.folio.works/learn/rlhf.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.