# safety and red teaming

{% hint style="warning" %}
**This page is a draft.** Content is pending review and will be updated.
{% endhint %}

## Safety & Red-Teaming

The **Safety & Red-Teaming** track covers how healthcare professionals help identify and mitigate risks in AI systems — both in the content AI produces and in how AI systems respond to adversarial inputs.

### Content Safety vs. Response Safety

AI safety in healthcare spans two related dimensions:

#### Content Safety

Content safety focuses on ensuring that AI outputs do not contain harmful, misleading, or inappropriate information. In a healthcare context, this includes:

* Medically inaccurate claims that could mislead patients or clinicians
* Inappropriate clinical recommendations (e.g., suggesting contraindicated treatments)
* Failure to include necessary safety disclaimers
* Outputs that could cause psychological harm (e.g., detailed self-harm content)

#### Response Safety

Response safety focuses on how an AI system behaves when prompted in unusual or adversarial ways. A safe AI system should:

* Decline to produce genuinely dangerous content
* Maintain consistent behavior regardless of how a prompt is phrased
* Not be manipulated into bypassing its own safety guidelines
* Handle sensitive healthcare topics (e.g., medications, procedures) responsibly

### What is Red-Teaming?

Red-teaming is a structured adversarial testing process where trainers deliberately attempt to elicit unsafe, harmful, or policy-violating outputs from an AI model. The goal is to find failure modes before deployment.

Red-teaming tasks may include:

* Crafting prompts designed to bypass safety guidelines
* Testing model behavior with edge-case clinical scenarios
* Identifying inconsistencies in how a model handles similar sensitive topics
* Documenting failure modes with supporting rationale

### Why Healthcare Professionals Are Essential for AI Safety Testing

General red-teamers can probe for generic harms. Healthcare professionals can probe for the specific, high-stakes failure modes that matter in clinical settings:

* **Clinical plausibility** — A response may pass a general safety check but contain a dosing error only a pharmacist would catch
* **Specialty-specific risks** — Radiology AI, psychiatric AI, and pediatric AI each have distinct failure modes requiring specialty expertise
* **Patient harm scenarios** — Anticipating how a patient might act on incorrect AI-generated advice requires clinical experience
* **Regulatory awareness** — Understanding what AI claims are and are not appropriate under medical and regulatory standards

### Courses in This Track

| Course       | Topics                                                | Duration |
| ------------ | ----------------------------------------------------- | -------- |
| **Safety 1** | Content safety principles, identifying unsafe outputs | TBD      |
| **Safety 2** | Red-teaming methodology, adversarial prompting        | TBD      |

#### To access Safety & Red-Teaming courses:

1. Go to **Learn** in the left sidebar.
2. Scroll to the **Safety & Red-Teaming** section.
3. Click **Start Lesson** on any course card.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.folio.works/learn/safety-and-red-teaming.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
