Terminology and Notation
Let's begin with Markov Decision Process (MDP).
We choose action at at time t when we saw observation ot, and the latent state st transfered to st+1, and we got reward r(s,t) from the environment.
Markov decision process M={S,A,T,r}
S : state space; states s∈S (discrete or continuous)
A : action space; actions a∈A (discrete or continuous)
T : transition operator, a tensor
r : reward function; r(st,at):S×A→R
Partially Observed
partially observed Markov decision process M={S,A,O,T,E,r}
S : state space; states s∈S (discrete or continuous)
A : action space; actions a∈A (discrete or continuous)
O : observation space; observations o∈O (discrete or continuous)
T : transition operator, a tensor
E : emission probability p(ot∣st)
r : reward function; r(st,at):S×A→R