MDP Definition
Terminology and Notation
Let's begin with Markov Decision Process (MDP).

We choose action at time when we saw observation , and the latent state transfered to , and we got reward from the environment.
Definitions
Fully Observed
Markov decision process
: state space; states (discrete or continuous)
: action space; actions (discrete or continuous)
: transition operator, a tensor
: reward function;
Partially Observed
partially observed Markov decision process
: state space; states (discrete or continuous)
: action space; actions (discrete or continuous)
: observation space; observations (discrete or continuous)
: transition operator, a tensor
: emission probability
: reward function;
Last updated
Was this helpful?