AI Safety

Back to Responsible AI

Ensuring AI systems behave as intended and do not cause harm. Covers the alignment problem (making AI pursue intended goals), reward hacking (gaming the reward signal), specification gaming (exploiting loopholes), and alignment techniques (RLHF, Constitutional AI).


responsible-ai ai-safety alignment