PG in practice
Last updated
Last updated
Policy gradient with automatic differentiation: ignored.
Gradient has high variance and will be noisy. Because this isn't the same as supervised learning.
Consider using much larger batches.
Tweaking learning rates is very hard, and adaptive step size rules like ADAM can be OK-ish.