AI Safety
← Back to Responsible AI
Ensuring AI systems behave as intended and do not cause harm. Covers the alignment problem (making AI pursue intended goals), reward hacking (gaming the reward signal), specification gaming (exploiting loopholes), and alignment techniques (RLHF, Constitutional AI).
Related
- RLHF and Alignment (alignment technique)
- Robustness (safety requires robustness)