CSC477 - Fall 2024 – CSC477 Introduction to Mobile Robotics

Today’s agenda

Least Squares Estimation
Maximum Likelihood Estimation (MLE)
Maximum a Posteriori Estimation (MAP)
Bayesian Estimation
GraphSLAM

Estimating parameters of probability models

In the occupancy grid mapping problem we wanted to compute \(p(\mathbf{m}|\mathbf{z}_{1:t}, \mathbf{x}_{1:t})\) over all possible maps.
We can see this problem as a specific instance within a category of problems where we are given data (observations) and we want to “explain” or fit the data using a parametric function.

There are typically three ways to work with this type of problems:
1. Maximum Likelihood parameter estimation (MLE)
  - Least Squares
2. Maximum A Posteriori (MAP) parameter estimation
3. Bayesian parameter distribution estimation

Least Squares Parameter Estimation

Example: we think that the 2D data was generated by a line \(z = \theta_{0} + \theta_{1}x\) whose parameters we do not know, and was corrupted by noise.

We are given data points \(\mathbf{(x_1, z_1), \dots, (x_N, z_N)}\)

We think that the data was generated by a parametric function \(\mathbf{z = h(\theta, x)}\)

Least Squares Parameter Estimation

Example: we think that the 2D data was generated by a line \(z = \theta_{0} + \theta_{1}x\) whose parameters we do not know

We are given data points \(\mathbf{(x_1, z_1), \dots, (x_N, z_N)}\)

We think that the data was generated by a parametric function \(\mathbf{z = h(\theta, x)}\)

This parametric model will have a fitting error: \[e(\theta)=\sum_{i=1}^{N}||\mathbf{z}_{i}-\mathbf{h}(\theta,\mathbf{x}_{i})||^{2}\]

The least-squares estimator is: \[\theta_{LS} = \underset{\theta}{\operatorname*{argmin}} e(\theta)\]

Linear Least Squares Parameter Estimation

Example: we think that the 2D data was generated by a line \(z = \theta_{0} + \theta_{1}x\) whose parameters we do not know

We are given data points \(\mathbf{(x_1, z_1), \dots, (x_N, z_N)}\)

We think that the data was generated by a linear parametric function \(\mathbf{z = h(\theta, x) = H_x\theta}\) where \(\mathbf{H_x}\) is a matrix whose elements depend on \(\mathbf{x}\)

This parametric model will have a fitting error: \[e(\theta) = \sum_{i=1}^{N} ||\mathbf{z}_i - \mathbf{H}_{x_i}\theta||^2\]

The least-squares estimator is: \[\theta_{LS} = \underset{\theta}{\operatorname*{argmin}} e(\theta)\]

Linear Least Squares Parameter Estimation

Example: we think that the 2D data was generated by a line \(z = \theta_{0} + \theta_{1}x\) whose parameters we do not know

We are given data points \(\mathbf{(x_1, z_1), \dots, (x_N, z_N)}\)

We think that the data was generated by a linear parametric function \(\mathbf{z = h(\theta, x) = \mathbf{H}_x\theta}\)

This parametric model will have a fitting error:

\[\begin{align} e(\theta) &= \sum_{i=1}^{N} ||\mathbf{z}_i - \mathbf{H}_{x_i} \theta||^2 \\ &= \sum_{i=1}^{N} \mathbf{z}_i^T \mathbf{z}_i - 2\theta^T \mathbf{H}_{x_i}^T \mathbf{z}_i + \theta^T \mathbf{H}_{x_i}^T \mathbf{H}_{x_i} \theta \end{align}\]

The least-squares estimator minimizes the error:

\[\frac{\partial e(\theta)}{\partial \theta} = 0 \Leftrightarrow -2\sum_{i=1}^{N} \mathbf{H}_{x_i}^T \mathbf{z}_i + 2\mathbf{H}_{x_i}^T \mathbf{H}_{x_i} \theta = 0 \Leftrightarrow \left[\sum_{i=1}^{N} \mathbf{H}_{x_i}^T \mathbf{H}_{x_i}\right] \theta = \sum_{i=1}^{N} \mathbf{H}_{x_i}^T \mathbf{z}_i\]

\[\theta_{LS} = \operatorname*{argmin}_{\theta} e(\theta) \Leftrightarrow \left[\sum_{i=1}^{N} \mathbf{H}_{x_i}^T \mathbf{H}_{x_i}\right] \theta_{LS} = \sum_{i=1}^{N} \mathbf{H}_{x_i}^T \mathbf{z}_i\]

Example #1: Linear Least Squares

Example: we think that the 2D data was generated by a line \(z = \theta_{0} + \theta_{1}x\) whose parameters we do not know

We are given data points \((x_1, z_1), \dots, (x_N, z_N)\)

We think that the data was generated by a linear parametric function \(z = h(\mathbf{\theta}, x) = \begin{bmatrix} 1 & x \end{bmatrix}\mathbf{\theta} = \theta_{0} + \theta_{1}x\)

This parametric model will have a fitting error: \[e(\theta_{0},\theta_{1})=\sum_{i=1}^{N}(z_{i}-\theta_{0}-\theta_{1}x_{i})^{2}\]

The least-squares estimator minimizes the error:

\[\theta_{LS} = \operatorname*{argmin}_{\theta_0, \theta_1} e(\theta_0, \theta_1) \Leftrightarrow \left[\sum_{i=1}^{N} \begin{bmatrix} 1 \\ x_i \end{bmatrix} \begin{bmatrix} 1 & x_i \end{bmatrix}\right] \theta_{LS} = \sum_{i=1}^{N} \begin{bmatrix} 1 \\ x_i \end{bmatrix} z_i\]

Which is a linear system of 2 equations. If we have at least two data points we can solve for \(\theta_{LS}\) to define the line.

Example #2: Linear Least Squares

Example: we think that the 2D data was generated by a quadratic \(z = \theta_{0} + \theta_{1}x + \theta_2x^2\) whose parameters we do not know.

We are given data points \((x_1, z_1), \dots, (x_N, z_N)\)

We think that the data was generated by a linear parametric function \(z = h(\mathbf{\theta},x) = \begin{bmatrix} 1 & x & x^{2} \end{bmatrix}\mathbf{\theta}=\theta_{0}+\theta_{1}x+\theta_{2}x^{2}\)

This parametric model will have a fitting error:

\[e(\theta_0, \theta_1, \theta_2) = \sum_{i=1}^{N} (z_i - \theta_0 - \theta_1 x_i - \theta_2 x_i^2)^2\]

The least-squares estimator minimizes the error: \[\theta_{LS} = \operatorname*{argmin}_{\theta_0, \theta_1, \theta_2} e(\theta_0, \theta_1, \theta_2) \Leftrightarrow \left[\sum_{i=1}^{N} \begin{bmatrix} 1 \\ x_i \\ x_i^2 \end{bmatrix} \begin{bmatrix} 1 & x_i & x_i^2 \end{bmatrix}\right] \theta_{LS} = \sum_{i=1}^{N} \begin{bmatrix} 1 \\ x_i \\ x_i^2 \end{bmatrix} z_i\]

Which is a linear system of 3 equations. If we have at least three data points we can solve for \(\theta_{LS}\) to define the quadratic.

Today’s agenda

Least Squares Estimation

Maximum Likelihood Estimation (MLE)
Maximum a Posteriori Estimation (MAP)
Bayesian Estimation
GraphSLAM

Estimating parameters of probability models

In the occupancy grid mapping problem we wanted to compute \(p(\mathbf{m}|\mathbf{z}_{1:t}, \mathbf{x}_{1:t})\) over all possible maps.
We can see this problem as a specific instance within a category of problems where we are given data (observations) and we want to “explain” or fit the data using a parametric function.
There are typically three ways to work with this type of problems:
1. Maximum Likelihood parameter estimation (MLE)
  - Least Squares
2. Maximum A Posteriori (MAP) parameter estimation
3. Bayesian parameter distribution estimation

Maximum Likelihood Parameter Estimation

We are given data points \(\mathbf{d}_{1:N} = \mathbf{d}_1, \ldots, \mathbf{d}_N\)

We think the data has been generated from a probability distribution \(p(\mathbf{d}_{1:N}|\mathbf{\theta})\)

We want to find the parameter of the model that maximizes the likelihood function of the data \[L(\mathbf{\theta}) = p(\mathbf{d}_{1:N}|\mathbf{\theta})\]

which is a function of theta, not a probability distribution. \[\theta_{MLE} = \underset{\theta}{\operatorname*{argmax}} p(\mathbf{d}_{1:N} | \theta)\]

Maximum Likelihood Parameter Estimation

\[\theta_{MLE} = \underset{\theta}{\operatorname{argmax}} p(\mathbf{d}_{1:N}|\theta)\]

Find the parameters of the model that maximize the likelihood function of the data \[L(\mathbf{\theta}) = p(\mathbf{d}_{1:N}|\mathbf{\theta})\]

which is a function of theta, not a probability distribution.

Example: assume we know that 1D data points were generated independently from a Gaussian distribution \(\mathcal{N}(\mu, \sigma^2)\) , but we don’t know the mean and variance. The likelihood function of the data is

\[L(\mu, \sigma) = p(\mathbf{d}_{1:N}|\mu, \sigma) = \prod_{i=1}^{N} p(d_i|\mu, \sigma) = \prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma}} \exp(-0.5(d_i - \mu)^2/\sigma^2)\]

And the maximum-likelihood parameter estimates are

\[(\mu, \sigma)_{MLE} = \operatorname*{argmax}_{\mu,\sigma} p(\mathbf{d}_{1:N}|\mu, \sigma) = \operatorname*{argmax}_{\mu,\sigma} \log p(\mathbf{d}_{1:N}|\mu, \sigma) = \operatorname*{argmax}_{\mu,\sigma} \sum_{i=1}^{N} \log p(d_i|\mu, \sigma)\]

Maximum Likelihood Parameter Estimation

\[\theta_{MLE} = \underset{\theta}{\operatorname{argmax}} p(\mathbf{d}_{1:N}|\theta)\]

Find the parameters of the model that maximize the likelihood function of the data \[L(\mathbf{\theta}) = p(\mathbf{d}_{1:N}|\mathbf{\theta})\]

which is a function of theta, not a probability distribution.

Example: assume we know that 1D data points were generated independently from a Gaussian distribution \(\mathcal{N}(\mu, \sigma^2)\) , but we don’t know the mean and variance. The likelihood function of the data is

\[L(\mu, \sigma) = p(\mathbf{d}_{1:N}|\mu, \sigma) = \prod_{i=1}^{N} p(d_i|\mu, \sigma) = \prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma}} \exp(-0.5(d_i - \mu)^2/\sigma^2)\]

And the maximum-likelihood parameter estimates are

\[(\mu, \sigma)_{MLE} = \operatorname*{argmax}_{\mu,\sigma} \sum_{i=1}^{N} \log p(d_i|\mu, \sigma) = \operatorname*{argmax}_{\mu,\sigma} \left[ -N\log(\sqrt{2\pi}\sigma) - \frac{1}{2\sigma^2}\sum_{i=1}^{N}(d_i - \mu)^2 \right]\]

Maximum Likelihood Parameter Estimation

\[\theta_{MLE} = \underset{\theta}{\operatorname{argmax}} p(\mathbf{d}_{1:N}|\theta)\]

Find the parameters of the model that maximize the likelihood function of the data \[L(\mathbf{\theta}) = p(\mathbf{d}_{1:N}|\mathbf{\theta})\]

which is a function of theta, not a probability distribution.

Example: assume we know that 1D data points were generated independently from a Gaussian distribution \(\mathcal{N}(\mu, \sigma^2)\) , but we don’t know the mean and variance. The likelihood function of the data is

\[L(\mu, \sigma) = p(\mathbf{d}_{1:N}|\mu, \sigma) = \prod_{i=1}^{N} p(d_i|\mu, \sigma) = \prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma}} \exp(-0.5(d_i - \mu)^2/\sigma^2)\]

And the maximum-likelihood parameter estimates are

\[(\mu, \sigma)_{MLE} = \operatorname*{argmax}_{\mu,\sigma} \sum_{i=1}^{N} \log p(d_i|\mu, \sigma) = \operatorname*{argmax}_{\mu,\sigma} \left[ -N\log(\sqrt{2\pi}\sigma) - \frac{1}{2\sigma^2}\sum_{i=1}^{N}(d_i - \mu)^2 \right]\]

Set partial derivatives w.r.t. \(\mu\) and \(\sigma\) to zero

\[\mu_{MLE}=\sum_{i=1}^{N}d_{i}/N\]

\[\sigma_{MLE}^{2}=\frac{1}{N}\sum_{i=1}^{N}(d_{i}-\mu_{MLE})^{2}\]

Least Squares as Maximum Likelihood

\[\theta_{MLE} = \underset{\theta}{\operatorname{argmax}} p(\mathbf{d}_{1:N}|\theta)\]

Find the parameters of the model that maximize the likelihood function of the data \[L(\mathbf{\theta}) = p(\mathbf{d}_{1:N}|\mathbf{\theta})\]

which is a function of theta, not a probability distribution.

Example: assume we know that 1D data points were generated independently from a Gaussian distribution \(\mathcal{N}(\mu, \sigma^2)\) , but we don’t know the mean and variance. The likelihood function of the data is

\[L(\mu, \sigma) = p(\mathbf{d}_{1:N}|\mu, \sigma) = \prod_{i=1}^{N} p(d_i|\mu, \sigma) = \prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma}} \exp(-0.5(d_i - \mu)^2/\sigma^2)\]

And the maximum-likelihood parameter estimates are

\[(\mu, \sigma)_{MLE} = \operatorname*{argmax}_{\mu,\sigma} \sum_{i=1}^{N} \log p(d_i|\mu, \sigma) = \operatorname*{argmax}_{\mu,\sigma} \left[ -N\log(\sqrt{2\pi}\sigma) - \frac{1}{2\sigma^2}\sum_{i=1}^{N}(d_i - \mu)^2 \right]\]

Estimating parameters of probability models

In the occupancy grid mapping problem we wanted to compute \(p(\mathbf{m}|\mathbf{z}_{1:t}, \mathbf{x}_{1:t})\) over all possible maps.
We can see this problem as a specific instance within a category of problems where we are given data (observations) and we want to “explain” or fit the data using a parametric function.
There are typically three ways to work with this type of problems:
1. Maximum Likelihood parameter estimation (MLE)
  - Least Squares
2. Maximum A Posteriori (MAP) parameter estimation
3. Bayesian parameter distribution estimation

Maximum A Posteriori Parameter Estimation

Likelihood
\(p(d|\theta)\)

Prior
\(p(\theta)\)

Posterior
\(p(\theta|d) \propto p(d|\theta)p(\theta)\)

\[\begin{align} \theta_{MAP} &= \operatorname*{argmax}_{\theta} p(\theta|\mathbf{d}_{1:N}) \\ &= \operatorname*{argmax}_{\theta} \left[ \frac{p(\mathbf{d}_{1:N}|\theta)p(\theta)}{p(\mathbf{d}_{1:N})} \right] \\ &= \operatorname*{argmax}_{\theta} \left[ p(\mathbf{d}_{1:N}|\theta)p(\theta) \right] \end{align}\]

Estimating parameters of probability models

In the occupancy grid mapping problem we wanted to compute \(p(\mathbf{m}|\mathbf{z}_{1:t}, \mathbf{x}_{1:t})\) over all possible maps.
We can see this problem as a specific instance within a category of problems where we are given data (observations) and we want to “explain” or fit the data using a parametric function.
There are typically three ways to work with this type of problems:
1. Maximum Likelihood parameter estimation (MLE)
  - Least Squares
2. Maximum A Posteriori (MAP) parameter estimation
3. Bayesian parameter distribution estimation

Bayesian parameter estimation

Both MLE and MAP estimators give you a single point estimate .
But there might be many parameters that are compatible with the data.
Instead of point estimates, compute a distribution of estimates that explain the data

Bayesian parameter estimation:

\[p(\theta|\mathbf{d}_{1:N}) = \frac{p(\mathbf{d}_{1:N}|\theta)p(\theta)}{p(\mathbf{d}_{1:N})}\]

The probability of the
data is usually hard to
compute. But it does not
depend on the parameter
theta, so it is treated as a
normalizing factor, and we
can still compute how the
posterior varies with theta.

Bayesian parameter estimation

Both MLE and MAP estimators give you a single point estimate .
But there might be many parameters that are compatible with the data.
Instead of point estimates, compute a distribution of estimates that explain the data
Bayesian parameter estimation:

\[p(\theta|\mathbf{d}_{1:N}) = \frac{p(\mathbf{d}_{1:N}|\theta)p(\theta)}{p(\mathbf{d}_{1:N})}\]

This is what we used in occupancy grid mapping, when we approximated

\[p(\mathbf{m}|\mathbf{z}_{1:t}, \mathbf{x}_{1:t})\]

Today’s agenda

Least Squares Estimation
Maximum Likelihood Estimation (MLE)
Maximum a Posteriori Estimation (MAP)
Bayesian Estimation

GraphSLAM

Goal

Enable a robot to simultaneously build a map of its environment and estimate where it is in that map.
This is called SLAM (Simultaneous Localization And Mapping)
Today we are going to look at the batch version, i.e. collect all measurements and controls, and later form an estimate of the states and the map.
We are going to solve SLAM using least squares

Examples of SLAM systems

MORESLAM system, McGill, 2016

Examples of SLAM systems

Source Code: https://github.com/erik-nelson/blam

Examples of SLAM systems

Google Cartographer: 2D and 3D laser SLAM

Code: https://github.com/googlecartographer/cartographer

SLAM: possible problem definitions

Smoothing/Batch/Full SLAM

\[p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\]

Filtering SLAM

\[p(\mathbf{x}_t, \mathbf{m}_t \mid \mathbf{z}_{0:t}, \mathbf{u}_{0:t-1}, \mathbf{x}_0)\]

Background: Multivariate Gaussian Distribution

\(p(\mathbf{x}) = \frac{1}{(2\pi)^{D/2}\det(\mathbf{\Sigma})^{1/2}}\exp\left(-0.5(\mathbf{x}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu})\right)\)

\(p(\mathbf{x}) = \frac{1}{(2\pi)^{D/2}\det(\mathbf{\Sigma})^{1/2}} \exp\left(-0.5\|\mathbf{x} - \mathbf{\mu}\|_{\mathbf{\Sigma}}^2\right)\)

Shortcut notation: \(\|x\|_{\Sigma}^{2}=x^{T}\Sigma^{-1}x\)

From “Computer Vision: Models, Learning, and Inference” Simon Prince

Note: The shapes of these covariances are important, you should
know them well. In particular, when are x1 and x2 correlated?

Background: Multivariate Gaussian Distribution

\(p(\mathbf{x}) = \frac{1}{(2\pi)^{D/2}\det(\mathbf{\Sigma})^{1/2}}\exp\left(-0.5(\mathbf{x}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu})\right)\)

\(p(\mathbf{x}) = \frac{1}{(2\pi)^{D/2}\det(\mathbf{\Sigma})^{1/2}} \exp\left(-0.5\|\mathbf{x} - \mathbf{\mu}\|_{\mathbf{\Sigma}}^2\right)\)

Shortcut notation: \(\|x\|_{\Sigma}^{2}=x^{T}\Sigma^{-1}x\)

From “Computer Vision: Models, Learning, and Inference” Simon Prince

x1 and x2 are correlated when the shape of the ellipse is rotated,
i.e. when there are nonzero off-diagonal terms in the covariance matrix.
In this example, (e) and (f)

Confidence regions

To quantify confidence and uncertainty define a confidence region R about a point x (e.g. the mode) such that at a confidence level c ≤ 1

\[p(x \in R) = c\]

we can then say (for example) there is a 99% probability that the true value is in R
e.g. for a univariate normal distribution \(N(\mu, \sigma^2)\)

\(p(|x - \mu| < \sigma) \approx 0.67\)

\(p(|x - \mu| < 2\sigma) \approx 0.95\)

\(p(|x - \mu| < 3\sigma) \approx 0.997\)

Expectation

Expected value of a random variable X:

\[\mathbb{E}_{x\sim p(X)}[X]=\int_{x}xp(X=x)dx\]

E is linear: \(\quad \mathbb{E}_{x \sim p(X)}[X + c] = \mathbb{E}_{x \sim p(X)}[X] + c\)

\(\qquad\qquad\quad \mathbb{E}_{x \sim p(X)}[AX + b] = A\mathbb{E}_{x \sim p(X)}[X] + b\)

If X,Y are independent then [Note: inverse does not hold]

\[\mathbb{E}_{x,y \sim p(X,Y)}[XY] = \mathbb{E}_{x \sim p(X)}[X] \mathbb{E}_{x \sim p(Y)}[Y]\]

Covariance Matrix

Measures linear dependence between random variables X, Y. Does not measure independence.

\[\text{Cov}[X, Y] = E[XY] - E[X]E[Y]\]

Variance of X

\[\begin{align} & \text{Var}[X] = \text{Cov}[X] = \text{Cov}[X, X] = E[X^2] - E[X]^2 \\ & \text{Cov}[AX + b] = A\text{Cov}[X]A^T \\ & \text{Cov}[X + Y] = \text{Cov}[X] + \text{Cov}[Y] - 2\text{Cov}[X, Y] \end{align}\]

Covariance Matrix

Measures linear dependence between random variables X, Y. Does not measure independence.

\[\text{Cov}[X, Y] = E[XY] - E[X]E[Y]\]

Entry (i,j) of the covariance matrix measures whether changes in variable \(X_i\) co-occur with changes in variable \(Y_j\)
It does not measure whether one causes the other.

Correlation does not imply causation

Background: Multivariate Gaussian Distribution

\(p(\mathbf{x}) = \frac{1}{(2\pi)^{D/2}\det(\mathbf{\Sigma})^{1/2}}\exp\left(-0.5(\mathbf{x}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu})\right)\)

\(p(\mathbf{x}) = \frac{1}{(2\pi)^{D/2}\det(\mathbf{\Sigma})^{1/2}} \exp\left(-0.5\|\mathbf{x} - \mathbf{\mu}\|_{\mathbf{\Sigma}}^2\right)\)

For multivariate Gaussians:

\[\begin{align} E[\mathbf{x}] = \mu \\ \text{Cov}[\mathbf{x}] = \Sigma \end{align}\]

From “Computer Vision: Models, Learning, and Inference” Simon Prince

Background: Multivariate Gaussian Distribution

\(p(\mathbf{x}) = \frac{1}{(2\pi)^{D/2}\det(\mathbf{\Sigma})^{1/2}}\exp\left(-0.5(\mathbf{x}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{\mu})\right)\)

\(p(\mathbf{x}) = \frac{1}{(2\pi)^{D/2}\det(\mathbf{\Sigma})^{1/2}} \exp\left(-0.5\|\mathbf{x} - \mathbf{\mu}\|_{\mathbf{\Sigma}}^2\right)\)

Since we have 2D examples here:

\[\begin{align} \text{Cov}[\mathbf{x}] = \boldsymbol{\Sigma} &= \begin{bmatrix} \text{Cov}[x_1, x_1] & \text{Cov}[x_1, x_2] \\ \text{Cov}[x_2, x_1] & \text{Cov}[x_2, x_2] \end{bmatrix} \\ &= \begin{bmatrix} \text{Var}[x_1] & \text{Cov}[x_1, x_2] \\ \text{Cov}[x_2, x_1] & \text{Var}[x_2] \end{bmatrix} \end{align}\]

SLAM: graph representation

Map \(\mathbf{m} = \{\mathbf{m}_0, \mathbf{m}_1\}\) consists of landmarks that are easily identifiable and cannot be mistaken for one another.

i.e. we are avoiding
the data association
problem here.

SLAM: graph representation

Map \(\mathbf{m} = \{\mathbf{m}_0, \mathbf{m}_1\}\) consists of landmarks that are easily identifiable and cannot be mistaken for one another.

SLAM: graph representation

Notice that the graph is mostly sparse as long as not many states observe the same landmark.

That implies that there are many symbolic dependencies between random variables in \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) that are not necessary and can be dropped.

GraphSLAM: SLAM as a Maximum A Posteriori Estimate

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\mathbf{x}^*_{1:T}, \mathbf{m}^* = \underset{\mathbf{x}_{1:T},\mathbf{m}}{\operatorname{argmax}} p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_0)\]

See least
squares lecture

GraphSLAM: SLAM as a Maximum A Posteriori Estimate

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\begin{align} \mathbf{x}_{1:T}^*, \mathbf{m}^* &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m} | \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \frac{p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)}{p(\mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)} \right] \end{align}\]

by definition
of conditional
distribution

GraphSLAM: SLAM as a Maximum A Posteriori Estimate

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\begin{align} \mathbf{x}_{1:T}^*, \mathbf{m}^* &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m} | \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \frac{p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)}{p(\mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)} \right] \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0) \end{align}\]

denominator does
not depend on
optimization
variables

GraphSLAM: SLAM as a Maximum A Posteriori Estimate

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\begin{align} \mathbf{x}_{1:T}^*, \mathbf{m}^* &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m} | \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \frac{p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)}{p(\mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)} \right] \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0) \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \prod_{t=1}^{T} p(\mathbf{x}_t|\mathbf{x}_{t-1}, \mathbf{u}_{t-1}) \prod_{t=0}^{T} \prod_{\mathbf{z}_t^{(k)} \in \mathbf{z}_t} p(\mathbf{z}_t^{(k)}|\mathbf{x}_t, \mathbf{m}_k) \right] \end{align}\]

GraphSLAM: SLAM as a Maximum A Posteriori Estimate

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\begin{align} \mathbf{x}_{1:T}^*, \mathbf{m}^* &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m} | \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \frac{p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)}{p(\mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)} \right] \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0) \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \prod_{t=1}^{T} p(\mathbf{x}_t|\mathbf{x}_{t-1}, \mathbf{u}_{t-1}) \prod_{t=0}^{T} \prod_{z_t^{(k)} \in \mathbf{z}_t} p(z_t^{(k)}|\mathbf{x}_t, \mathbf{m}_k) \right] \end{align}\]

GraphSLAM: SLAM as a Maximum A Posteriori Estimate

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\begin{align} \mathbf{x}_{1:T}^*, \mathbf{m}^* &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m} | \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \frac{p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)}{p(\mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0)} \right] \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}|\mathbf{x}_0) \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \prod_{t=1}^{T} p(\mathbf{x}_t|\mathbf{x}_{t-1}, \mathbf{u}_{t-1}) \prod_{t=0}^{T} \prod_{\mathbf{z}_t^{(k)} \in \mathbf{z}_t} p(\mathbf{z}_t^{(k)}|\mathbf{x}_t, \mathbf{m}_k) \right] \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T}, \mathbf{m}} \left[ \sum_{t=1}^{T} \log p(\mathbf{x}_t|\mathbf{x}_{t-1}, \mathbf{u}_{t-1}) + \sum_{t=0}^{T} \sum_{\mathbf{z}_t^{(k)} \in \mathbf{z}_t} \log p(\mathbf{z}_t^{(k)}|\mathbf{x}_t, \mathbf{m}_k) \right] \end{align}\]

GraphSLAM: SLAM as a Maximum A Posteriori Estimate

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\mathbf{x}_{1:T}^{*},\mathbf{m}^{*} = \underset{\mathbf{x}_{1:T},\mathbf{m}}{\operatorname{argmax}}\left[\sum_{t=1}^{T}\log p(\mathbf{x}_{t}|\mathbf{x}_{t-1},\mathbf{u}_{t-1})+\sum_{t=0}^{T}\sum_{\mathbf{z}_{t}^{(k)}\in \mathbf{z}_{t}}\log p(\mathbf{z}_{t}^{(k)}|\mathbf{x}_{t},\mathbf{m}_{k})\right]\]

Main GraphSLAM assumptions:

1. Uncertainty in the dynamics model is Gaussian

\(\mathbf{x}_{t}=f(\mathbf{x}_{t-1}, \mathbf{u}_{t-1})+\mathbf{w}_{t}\)
\(\mathbf{w}_{t} \sim \mathcal{N}(0, \mathbf{R}_{t})\)
so

\(\mathbf{x}_t|\mathbf{x}_{t-1}, \mathbf{u}_{t-1} \sim \mathcal{N}(\mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t-1}), \mathbf{R}_t)\)

2. Uncertainty in the sensor model is Gaussian

\(\mathbf{z}_t^{(k)} = \mathbf{h}(\mathbf{x}_t, \mathbf{m}_k) + \mathbf{v}_t\)

\(\mathbf{v}_t \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}_t)\)

so

\(\mathbf{z}_t^{(k)} | \mathbf{x}_t, \mathbf{m}_k \sim \mathcal{N}(\mathbf{h}(\mathbf{x}_t, \mathbf{m}_k), \mathbf{Q}_t)\)

SLAM: noise/errors

\(\mathbf{x}_1|\mathbf{x}_0, \mathbf{u}_0 \sim \mathcal{N}(\mathbf{f}(\mathbf{x}_0, \mathbf{u}_0), \mathbf{R}_0)\)

Expected to end up at \(\mathbf{x}_1 = \mathbf{f}(\mathbf{x}_0, \mathbf{u}_0)\) from \(\mathbf{x}_0\)

SLAM: noise/errors

\(\mathbf{x}_1|\mathbf{x}_0, \mathbf{u}_0 \sim \mathcal{N}(\mathbf{f}(\mathbf{x}_0, \mathbf{u}_0), \mathbf{R}_0)\)

Expected to end up at \(\mathbf{x}_1 = \mathbf{f}(\mathbf{x}_0, \mathbf{u}_0)\) from \(\mathbf{x}_0\) but we might end up around it, within the ellipse defined by the covariance matrix \(\mathbf{R}_0\)

SLAM: noise/errors

\(\mathbf{z}_1^{(0)}|\mathbf{x}_1, \mathbf{m}_0 \sim \mathcal{N}(\mathbf{h}(\mathbf{x}_1, \mathbf{m}_0), \mathbf{Q}_1)\)

Expected to get measurement \(\mathbf{h}(\mathbf{x}_1, \mathbf{m}_0)\) at state \(\mathbf{x}_1\) but it might be somewhere within the ellipse defined by the covariance matrix \(\mathbf{Q}_1\)

GraphSLAM: SLAM as a least squares problem

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\begin{align} \mathbf{x}_{1:T}^{*}, \mathbf{m}^{*} &= \underset{\mathbf{x}_{1:T}, \mathbf{m}}{\operatorname{argmax}} \left[ \sum_{t=1}^{T} \log p(\mathbf{x}_{t}|\mathbf{x}_{t-1}, \mathbf{u}_{t-1}) + \sum_{t=0}^{T} \sum_{\mathbf{z}_{t}^{(k)} \in \mathbf{z}_{t}} \log p(\mathbf{z}_{t}^{(k)}|\mathbf{x}_{t}, \mathbf{m}_{k}) \right] \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T},\mathbf{m}} \left[ - \sum_{t=1}^{T} \|\mathbf{x}_t - \mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t-1})\|_{\mathbf{{R}}_t}^2 - \sum_{t=0}^{T} \sum_{z_t^{(k)} \in \mathbf{z}_t} \|\mathbf{z}_t^{(k)} - \mathbf{h}(\mathbf{x}_t, \mathbf{m}_k)\|_{\mathbf{{Q}}_t}^2 \right] \end{align}\]

Notation:
\(\mathbf{x}^T \mathbf{Q}^{-1} \mathbf{x} = ||\mathbf{x}||_{\mathbf{Q}}^2\)

\(\mathbf{x}_t | \mathbf{x}_{t-1}, \mathbf{u}_{t-1} \sim \mathcal{N}(\mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t-1}), \mathbf{R}_t) \qquad\qquad\qquad \mathbf{z}_{t}^{(k)}|\mathbf{x}_{t}, \mathbf{m}_{k} \sim \mathcal{N}(\mathbf{h}(\mathbf{x}_{t}, \mathbf{m}_{k}), \mathbf{Q}_{t})\)

GraphSLAM: SLAM as a least squares problem

Instead of computing the posterior \(p(\mathbf{x}_{1:T}, \mathbf{m} \mid \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0})\) we are going to compute its max

\[\begin{align} \mathbf{x}_{1:T}^{*}, \mathbf{m}^{*} &= \underset{\mathbf{x}_{1:T}, \mathbf{m}}{\operatorname{argmax}} \left[ \sum_{t=1}^{T} \log p(\mathbf{x}_{t}|\mathbf{x}_{t-1}, \mathbf{u}_{t-1}) + \sum_{t=0}^{T} \sum_{\mathbf{z}_{t}^{(k)} \in \mathbf{z}_{t}} \log p(\mathbf{z}_{t}^{(k)}|\mathbf{x}_{t}, \mathbf{m}_{k}) \right] \\ &= \operatorname*{argmax}_{\mathbf{x}_{1:T},\mathbf{m}} \left[ - \sum_{t=1}^{T} \|\mathbf{x}_t - \mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t-1})\|_{\mathbf{{R}}_t}^2 - \sum_{t=0}^{T} \sum_{z_t^{(k)} \in \mathbf{z}_t} \|\mathbf{z}_t^{(k)} - \mathbf{h}(\mathbf{x}_t, \mathbf{m}_k)\|_{\mathbf{{Q}}_t}^2 \right] \\ &= \underset{\mathbf{x}_{1:T}, \mathbf{m}}{\operatorname{argmin}} \left[ \sum_{t=1}^{T} \|\mathbf{x}_t - \mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t-1})\|_{\mathbf{R}_t}^2 + \sum_{t=0}^{T} \sum_{\mathbf{z}_t^{(k)} \in \mathbf{z}_t} \|\mathbf{z}_t^{(k)} - \mathbf{h}(\mathbf{x}_t, \mathbf{m}_k)\|_{\mathbf{Q}_t}^2 \right] \end{align}\]

Notation:
\(\mathbf{x}^T \mathbf{Q}^{-1} \mathbf{x} = ||\mathbf{x}||_{\mathbf{Q}}^2\)

\(\mathbf{x}_t | \mathbf{x}_{t-1}, \mathbf{u}_{t-1} \sim \mathcal{N}(\mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t-1}), \mathbf{R}_t) \qquad\qquad\qquad \mathbf{z}_{t}^{(k)}|\mathbf{x}_{t}, \mathbf{m}_{k} \sim \mathcal{N}(\mathbf{h}(\mathbf{x}_{t}, \mathbf{m}_{k}), \mathbf{Q}_{t})\)

GraphSLAM: example

Need to minimize the sum of the following quadratic terms:

\[\begin{align} & ||\mathbf{x}_1 - \mathbf{f}(\mathbf{x}_0, \mathbf{u}_0)||_{\mathbf{R}_1}^2 \\ & ||\mathbf{x}_2 - \mathbf{f}(\mathbf{x}_1, \mathbf{u}_1)||_{\mathbf{R}_2}^2 \\ & ||\mathbf{x}_3 - \mathbf{f}(\mathbf{x}_2, \mathbf{u}_2)||_{\mathbf{R}_3}^2 \\ & ||\mathbf{z}_0^{(0)} - \mathbf{h}(\mathbf{x}_0, \mathbf{m}_0)||_{\mathbf{Q}_0}^2 \\ & ||\mathbf{z}_1^{(0)} - \mathbf{h}(\mathbf{x}_1, \mathbf{m}_0)||_{\mathbf{Q}_1}^2 \\ & ||\mathbf{z}_1^{(1)} - \mathbf{h}(\mathbf{x}_1, \mathbf{m}_1)||_{\mathbf{Q}_1}^2 \\ & ||\mathbf{z}_2^{(1)} - \mathbf{h}(\mathbf{x}_2, \mathbf{m}_1)||_{\mathbf{Q}_2}^2 \end{align}\]

with respect to variables: \(\mathbf{x_1 \quad x_2 \quad x_3 \quad m_0 \quad m_1}\)

initial state \(\mathbf{x}_0\) is given

Covariance matrices \(\mathbf{R_t, Q_t}\)
are problem-dependent, but
they usually do not change
with time.

Examples of dynamics and sensor models

\(\mathbf{x}_t = \mathbf{f}(\mathbf{x}_{t-1}, \mathbf{u}_{t-1}) + \mathbf{w}_t\)

\(\mathbf{z}_{t}^{(k)} = \mathbf{h}(\mathbf{x}_{t}, \mathbf{m}_{k}) + \mathbf{v}_{t}\)

Can be any of the dynamical systems we saw in Lecture 2.

\(\mathbf{z}_t^{(k)}\) can be any of the sensors we saw in Lecture 4:

Laser scan \(\{(r_{i},\theta_{i})\}_{i=1:K}\) where \(\mathbf{m}_k\) is an occupancy grid
Range and bearing \((r, \theta)\) to the landmark \(\mathbf{m}_k = (x_k, y_k, z_k)\)
Bearing measurements from images
Altitude/Depth
Gyroscope
Accelerometer

Appendix 1

Claim:

\[p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_{0}) = p(\mathbf{x}_{0}) \prod_{t=1}^{T} p(\mathbf{x}_{t}|\mathbf{x}_{t-1}, \mathbf{u}_{t-1}) \prod_{t=0}^{T} \prod_{\mathbf{z}_{t}^{(k)}\in \mathbf{z}_{t}} p(\mathbf{z}_{t}^{(k)}|\mathbf{x}_{t}, \mathbf{m}_{k})\]

Proof:

\[\begin{align} p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) &= p(\mathbf{z}_T | \mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T-1}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \, p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T-1}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= p(\mathbf{z}_T | \mathbf{x}_T, \mathbf{m}) \, p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T-1}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= p(\mathbf{z}_T | \mathbf{x}_T, \mathbf{m}) \, p(\mathbf{z}_{T-1} | \mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T-2}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \, p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T-2}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= p(\mathbf{z}_T | \mathbf{x}_T, \mathbf{m}) \, p(\mathbf{z}_{T-1} | \mathbf{x}_{T-1}, \mathbf{m}) \, p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{z}_{0:T-2}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &\quad \dots \\ &= \prod_{t=0}^{T} p(\mathbf{z}_t | \mathbf{x}_t, \mathbf{m}) \, p(\mathbf{x}_{1:T}, \mathbf{m}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= \prod_{t=0}^{T} p(\mathbf{z}_t | \mathbf{x}_t, \mathbf{m}) \, p(\mathbf{x}_T | \mathbf{x}_{1:T-1}, \mathbf{m}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \, p(\mathbf{x}_{1:T-1}, \mathbf{m}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= \prod_{t=0}^{T} p(\mathbf{z}_t | \mathbf{x}_t, \mathbf{m}) \, p(\mathbf{x}_T | \mathbf{x}_{T-1}, \mathbf{u}_{T-1}) \, p(\mathbf{x}_{1:T-1}, \mathbf{m}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &= \prod_{t=0}^{T} p(\mathbf{z}_t | \mathbf{x}_t, \mathbf{m}) \, p(\mathbf{x}_T | \mathbf{x}_{T-1}, \mathbf{u}_{T-1}) \, p(\mathbf{x}_{T-1} | \mathbf{x}_{T-2}, \mathbf{m}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \, p(\mathbf{x}_{1:T-2}, \mathbf{m}, \mathbf{u}_{0:T-1}, \mathbf{x}_0) \\ &\quad \dots \\ &= \left[ \prod_{t=0}^{T} p(\mathbf{z}_t | \mathbf{x}_t, \mathbf{m}) \right] p(\mathbf{x}_0) \prod_{t=1}^{T} p(\mathbf{x}_t | \mathbf{x}_{t-1}, \mathbf{u}_{t-1}) \end{align}\]

Appendix 1

Claim: \[p(\mathbf{z}_{t}|\mathbf{x}_{t},\mathbf{m}) = \prod_{\mathbf{z}_{t}^{(k)}\in\mathbf{z}_{t}}p(\mathbf{z}_{t}^{(k)}|\mathbf{x}_{t},\mathbf{m}_{k})\]

where \(\mathbf{z}_t = \left\{ \mathbf{z}_t^{(k)} \text{ iff landmark } \mathbf{m}_k \text{ was observed} \right\}\)

\(\mathbf{m} = \{\text{landmarks } \mathbf{m}_k\}\)

Proof:

Suppose without loss of generality that \(\mathbf{z}_t = \left\{ \mathbf{z}_t^{(k)}, k = 1...K \right\}\) and \(\mathbf{m} = \{\mathbf{m}_{k}, k = 1...K\}\)

i.e. that all landmarks were observed from the state at time t. Then:

\[\begin{align} p(\mathbf{z}_t^{(1)}, \ldots, \mathbf{z}_t^{(K)}|\mathbf{x}_t, \mathbf{m}) &= p(\mathbf{z}_t^{(1)}|\mathbf{z}_t^{(2)}, \ldots, \mathbf{z}_t^{(K)}, \mathbf{x}_t, \mathbf{m}) \, p(\mathbf{z}_t^{(2)}, \ldots, \mathbf{z}_t^{(K)}|\mathbf{x}_t, \mathbf{m}) \\ &= p(\mathbf{z}_t^{(1)}|\mathbf{x}_t, \mathbf{m}_1) \, p(\mathbf{z}_t^{(2)}, \ldots, \mathbf{z}_t^{(K)}|\mathbf{x}_t, \mathbf{m}) \\ &= p(\mathbf{z}_t^{(1)}|\mathbf{x}_t, \mathbf{m}_1) \, p(\mathbf{z}_t^{(2)}|\mathbf{z}_t^{(3)}, \ldots, \mathbf{z}_t^{(K)}, \mathbf{x}_t, \mathbf{m}) \, p(\mathbf{z}_t^{(3)}, \ldots, \mathbf{z}_t^{(K)}|\mathbf{x}_t, \mathbf{m}) \\ &= p(\mathbf{z}_t^{(1)}|\mathbf{x}_t, \mathbf{m}_1) \, p(\mathbf{z}_t^{(2)}|\mathbf{x}_t, \mathbf{m}_2) \, p(\mathbf{z}_t^{(3)}, \ldots, \mathbf{z}_t^{(K)}|\mathbf{x}_t, \mathbf{m}) \\ &\quad \vdots \\ &= \prod_{k=1}^{K} p(\mathbf{z}_t^{(k)}|\mathbf{x}_t, \mathbf{m}_k) \end{align}\]