MDP Definition

Terminology and Notation

Let's begin with Markov Decision Process (MDP).

We choose action $a_t$ at time $t$ when we saw observation $o_t$ , and the latent state $s_t$ transfered to $s_{t+1}$ , and we got reward $r(s,t)$ from the environment.

Definitions

Fully Observed

Markov decision process $\mathcal{M}=\{\mathcal{S,A,T,r}\}$

$\mathcal{S}$ : state space; states $s\in\mathcal{S}$ (discrete or continuous)
$\mathcal{A}$ : action space; actions $a\in\mathcal{A}$ (discrete or continuous)
$\mathcal{T}$ : transition operator, a tensor
$r$ : reward function; $r(s_t,a_t):\mathcal{S}\times\mathcal{A}\to \mathbb{R}$

Partially Observed

partially observed Markov decision process $\mathcal{M}=\{\mathcal{S,A,O,T,E, r}\}$

$\mathcal{S}$ : state space; states $s\in\mathcal{S}$ (discrete or continuous)
$\mathcal{A}$ : action space; actions $a\in\mathcal{A}$ (discrete or continuous)
$\mathcal{O}$ : observation space; observations $o\in\mathcal{O}$ (discrete or continuous)
$\mathcal{T}$ : transition operator, a tensor
$\mathcal{E}$ : emission probability $p(o_t|s_t)$
$r$ : reward function; $r(s_t,a_t):\mathcal{S}\times\mathcal{A}\to \mathbb{R}$

PreviousIntro to RL NextRL Objective

Last updated 6 years ago

Was this helpful?