Reinforcement Learning
← Back to Learning Types
An agent learns to make decisions by interacting with an environment and receiving reward signals. Used in robotics, game playing, and LLM alignment (RLHF).
Key Properties
Core Components
- Agent — the learner/decision maker
- Environment — what the agent interacts with
- State — current situation
- Action — what the agent can do
- Reward — feedback signal
- Policy — strategy mapping states to actions
Key Algorithms
- Q-Learning, Deep Q-Networks (DQN)
- Policy Gradient methods (REINFORCE, PPO, A3C)
Related
- Supervised Learning (contrast: no explicit reward signal)