Structure of RL algorithms

The anatomy of a RL algorithm

A simple example

Another one: RL by backprop

Comparison

Last updated