AI Safety

← Back to Responsible AI

Ensuring AI systems behave as intended and do not cause harm. Covers the alignment problem (making AI pursue intended goals), reward hacking (gaming the reward signal), specification gaming (exploiting loopholes), and alignment techniques (RLHF, Constitutional AI).

RLHF and Alignment (alignment technique)
Robustness (safety requires robustness)

responsible-ai ai-safety alignment

Software Engineering KB

Explorer

AI Safety

AI Safety

Graph View

Table of Contents

Backlinks

Software Engineering KB

Explorer

AI Safety

AI Safety

Related

Graph View

Table of Contents

Backlinks