Intuition of PG
Comparison to maximum likelihood
Policy gradient:
Maximum likelihood:
The only difference is that policy gradient update formula has a weight of , which means good stuff(with high reward sum) is made more likely but bad stuff(with low reward sum) is made less likely. To conclude, this algorithm simply formalizes the notion of "trial and error".
Last updated
Was this helpful?