Other advantages
Last updated
Was this helpful?
Last updated
Was this helpful?
will bring bias and reward-sum will bring variance.
Critic
+: lower variance
-: higher bias if value is wrong (it always is)
Monte Carlo
+: no bias
-: higher variance (because single-sample estimate)
Can we combine these two, to control bias/variance tradeoff?
Use weighted combination of n-step returns:
Reward declines due to discount factor . We can early cut.
Choosing often works better.
Do we have to choose just one ? We can cut everywhere all at once.