Intuition of PG
Comparison to maximum likelihood
Policy gradient:
Maximum likelihood:
Last updated
Comparison to maximum likelihood
Policy gradient:
Maximum likelihood:
Last updated
The only difference is that policy gradient update formula has a weight of , which means good stuff(with high reward sum) is made more likely but bad stuff(with low reward sum) is made less likely. To conclude, this algorithm simply formalizes the notion of "trial and error".