Discount factors

Infinite cases

So the new target is:

Discount factors for policy gradient

In Monte Carlo policy gradients, we have 2 options:

option 1:

option 2:

Consider causality:

Because the reason why we use discount factor is to solve infinity problems in continuous cases, but death model(option 2) only cares the early steps of the whole episode. We want to approximate to the average reward without discount. The future rewards is more uncertain, which needs to be removed gradually.

Actor-critic algorithms (with discount)

batch version

batch actor-critic algorithm:

repeat until converge:

online version

online actor-critic algorithm:

repeat until converge:

Last updated