Reinforcement Learning

Back to Learning Types

An agent learns to make decisions by interacting with an environment and receiving reward signals. Used in robotics, game playing, and LLM alignment (RLHF).

Key Properties

Core Components

  • Agent — the learner/decision maker
  • Environment — what the agent interacts with
  • State — current situation
  • Action — what the agent can do
  • Reward — feedback signal
  • Policy — strategy mapping states to actions

Key Algorithms

  • Q-Learning, Deep Q-Networks (DQN)
  • Policy Gradient methods (REINFORCE, PPO, A3C)

ml reinforcement-learning rlhf