States, Actions, and Rewards

Back to Reinforcement Learning

The core components of RL: states describe the environment, actions are what the agent can do, and rewards provide feedback. Q-learning learns action-value functions; policy gradient methods directly optimize the policy.

ml reinforcement-learning mdp