MDP Definition
Last updated
Last updated
Let's begin with Markov Decision Process (MDP).
We choose action at time when we saw observation , and the latent state transfered to , and we got reward from the environment.
Markov decision process
: state space; states (discrete or continuous)
: action space; actions (discrete or continuous)
: transition operator, a tensor
: reward function;
partially observed Markov decision process
: state space; states (discrete or continuous)
: action space; actions (discrete or continuous)
: observation space; observations (discrete or continuous)
: transition operator, a tensor
: emission probability
: reward function;