MDP Definition

Terminology and Notation

Let's begin with Markov Decision Process (MDP).

MDP Notation

We choose action ata_t at time tt when we saw observation oto_t, and the latent state sts_t transfered to st+1s_{t+1}, and we got reward r(s,t)r(s,t) from the environment.

Definitions

Fully Observed

Markov decision process M={S,A,T,r}\mathcal{M}=\{\mathcal{S,A,T,r}\}

  • S\mathcal{S} : state space; states sSs\in\mathcal{S} (discrete or continuous)

  • A\mathcal{A} : action space; actions aAa\in\mathcal{A} (discrete or continuous)

  • T\mathcal{T} : transition operator, a tensor

  • rr : reward function; r(st,at):S×ARr(s_t,a_t):\mathcal{S}\times\mathcal{A}\to \mathbb{R}

Partially Observed

partially observed Markov decision process M={S,A,O,T,E,r}\mathcal{M}=\{\mathcal{S,A,O,T,E, r}\}

  • S\mathcal{S} : state space; states sSs\in\mathcal{S} (discrete or continuous)

  • A\mathcal{A} : action space; actions aAa\in\mathcal{A} (discrete or continuous)

  • O\mathcal{O} : observation space; observations oOo\in\mathcal{O} (discrete or continuous)

  • T\mathcal{T} : transition operator, a tensor

  • E\mathcal{E} : emission probability p(otst)p(o_t|s_t)

  • rr : reward function; r(st,at):S×ARr(s_t,a_t):\mathcal{S}\times\mathcal{A}\to \mathbb{R}

Last updated

Was this helpful?