PG in practice
Policy gradient with automatic differentiation: ignored.
Gradient has high variance and will be noisy. Because this isn't the same as supervised learning.
Consider using much larger batches.
Tweaking learning rates is very hard, and adaptive step size rules like ADAM can be OK-ish.
Last updated
Was this helpful?