> For the complete documentation index, see [llms.txt](https://docs.folio.works/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.folio.works/learn/safety-and-red-teaming.md).

# safety and red teaming

{% hint style="warning" %}
**This page is a draft.** Content is pending review and will be updated.
{% endhint %}

## Safety & Red-Teaming

The **Safety & Red-Teaming** track covers how healthcare professionals help identify and mitigate risks in AI systems — both in the content AI produces and in how AI systems respond to adversarial inputs.

### Content Safety vs. Response Safety

AI safety in healthcare spans two related dimensions:

#### Content Safety

Content safety focuses on ensuring that AI outputs do not contain harmful, misleading, or inappropriate information. In a healthcare context, this includes:

* Medically inaccurate claims that could mislead patients or clinicians
* Inappropriate clinical recommendations (e.g., suggesting contraindicated treatments)
* Failure to include necessary safety disclaimers
* Outputs that could cause psychological harm (e.g., detailed self-harm content)

#### Response Safety

Response safety focuses on how an AI system behaves when prompted in unusual or adversarial ways. A safe AI system should:

* Decline to produce genuinely dangerous content
* Maintain consistent behavior regardless of how a prompt is phrased
* Not be manipulated into bypassing its own safety guidelines
* Handle sensitive healthcare topics (e.g., medications, procedures) responsibly

### What is Red-Teaming?

Red-teaming is a structured adversarial testing process where trainers deliberately attempt to elicit unsafe, harmful, or policy-violating outputs from an AI model. The goal is to find failure modes before deployment.

Red-teaming tasks may include:

* Crafting prompts designed to bypass safety guidelines
* Testing model behavior with edge-case clinical scenarios
* Identifying inconsistencies in how a model handles similar sensitive topics
* Documenting failure modes with supporting rationale

### Why Healthcare Professionals Are Essential for AI Safety Testing

General red-teamers can probe for generic harms. Healthcare professionals can probe for the specific, high-stakes failure modes that matter in clinical settings:

* **Clinical plausibility** — A response may pass a general safety check but contain a dosing error only a pharmacist would catch
* **Specialty-specific risks** — Radiology AI, psychiatric AI, and pediatric AI each have distinct failure modes requiring specialty expertise
* **Patient harm scenarios** — Anticipating how a patient might act on incorrect AI-generated advice requires clinical experience
* **Regulatory awareness** — Understanding what AI claims are and are not appropriate under medical and regulatory standards

### Courses in This Track

| Course       | Topics                                                | Duration |
| ------------ | ----------------------------------------------------- | -------- |
| **Safety 1** | Content safety principles, identifying unsafe outputs | TBD      |
| **Safety 2** | Red-teaming methodology, adversarial prompting        | TBD      |

#### To access Safety & Red-Teaming courses:

1. Go to **Learn** in the left sidebar.
2. Scroll to the **Safety & Red-Teaming** section.
3. Click **Start Lesson** on any course card.