Structure of RL algorithms

The anatomy of a RL algorithm

A simple example

A simple example

Another one: RL by backprop

Another example: RL by backprop

Comparison

Which parts are expensive?

Last updated

Was this helpful?