Week #5: Optimal Control and the Linear Quadratic Regulator (LQR)
\[ \begin{align} \operatorname*{minimize}_{\pi_0, \ldots, \pi_{T-1}} \quad &\mathbb{E}_{e_t} \left[ \sum_{t=0}^{T} c(\mathbf{x}_t, \mathbf{u}_t) \right] \\ \text{subject to} \quad &\mathbf{x}_{t+1} = f_t(\mathbf{x}_t, \mathbf{u}_t, e_t) \quad \color{red}{\text{known dynamics}} \\ &\mathbf{u}_t = \pi_t(\mathbf{x}_{0:t}, \mathbf{u}_{0:t-1}) \\ & \color{red}\text{control law / policy} \end{align} \]
Pieter Abbeel, Helicopter Aerobatics
\[ \mathbb{R}^d \qquad\qquad\qquad\qquad \mathbb{R}^k \]
\[ A \in \mathbb{R}^{d \times d} \qquad\qquad\qquad\qquad B \in \mathbb{R}^{d \times k} \]
\[ \begin{align} x_{t+1} &= x_t + v_x(t)\delta t & \mathbf{x}_{t+1} &= I\mathbf{x}_t + \delta t I \mathbf{u}_t \\ y_{t+1} &= y_t + v_y(t)\delta t \qquad \Rightarrow & A &= I \\ \theta_{t+1} &= \theta_t + \omega_z(t)\delta t & B &= \delta t I \end{align} \]
\[ \begin{align} x_{t+1} &= x_t + v_x(t)\cos(\theta_t)\delta t \\ y_{t+1} &= y_t + v_x(t)\sin(\theta_t)\delta t \qquad \Rightarrow \\ \theta_{t+1} &= \theta_t + \omega_z\delta t \end{align} \]
\[ \begin{align} x_{t+1} &= x_t + v_x(t)\delta t & \mathbf{x}_{t+1} &= I\mathbf{x}_t + \delta t I \mathbf{u}_t \\ y_{t+1} &= y_t + v_y(t)\delta t \qquad \Rightarrow & A &= I \\ \theta_{t+1} &= \theta_t + \omega_z(t)\delta t & B &= \delta t I \end{align} \]
\[ \begin{align} x_{t+1} &= x_t + v_x(t)\cos(\theta_t)\delta t & \mathbf{x}_{t+1} &= I\mathbf{x}_t + \begin{bmatrix} \delta t\cos(\theta_t) & 0 & 0 \\ 0 & \delta t\sin(\theta_t) & 0 \\ 0 & 0 & \delta t \end{bmatrix} \mathbf{u}_t \\ y_{t+1} &= y_t + v_x(t)\sin(\theta_t)\delta t \qquad \Rightarrow & A &= I \\ \theta_{t+1} &= \theta_t + \omega_z\delta t & B &= B(\mathbf{x}_t) \end{align} \]
Stabilize the system around \(x_t = 0\) state with control \(u_t = 0\)
Then \(x_{t+1} = 0\) and the system will remain at zero forever
If we want to stabilize around x* then let x – x* be the state
You know the dynamics model of the system
It is linear: \(x_{t+1} = Ax_t + Bu_t\)
Quadratic state cost:
Penalizes deviation
from the zero vector
Quadratic control cost:
Penalizes high control
signals
You know the dynamics model of the system
It is linear: \(x_{t+1} = Ax_t + Bu_t\)
Square matrices Q and R must be positive definite:
\(\color{black}Q = Q^T \quad \text{and} \quad \forall x, x^T Qx > 0\)
\(\color{black}R = R^T \quad \text{and} \quad \forall u, u^T Ru > 0\)
i.e. positive cost for ANY nonzero state and control vector
\[ \begin{align} u_0^*, \ldots, u_{N-1}^* &= \operatorname*{argmin}_{u_0,\ldots,u_N} \quad \sum_{t=0}^{N} c(\mathbf{x}_t, \mathbf{u}_t) \\ \text{s.t.} \quad \mathbf{x}_1 &= A\mathbf{x}_0 + B\mathbf{u}_0 \\ \mathbf{x}_2 &= A\mathbf{x}_1 + B\mathbf{u}_1 \\ &\vdots \\ \mathbf{x}_N &= A\mathbf{x}_{N-1} + B\mathbf{u}_{N-1} \end{align} \]
We could solve this as a constrained
nonlinear optimization problem. But,
there is a better way: we can find a
closed-form solution.
Open-loop plan!
Given first state compute
action sequence
The gains for PID are good for a small region of state-space.
We would need to tune PID gains for every control variable.
We would need to tune PID gains for different regions of the state-space and guarantee smooth gain transitions
The gains for PID are good for a small region of state-space.
We would need to tune PID gains for every control variable.
We would need to tune PID gains for different regions of the state-space and guarantee smooth gain transitions
Let \(J_n(\mathbf{x})\) denote the cumulative cost-to-go starting from state x and moving for n time steps.
I.e. cumulative future cost from now till n more steps
\(J_0(\mathbf{x}) = \mathbf{x}^T Q_\mathbf{x}\) is the terminal cost of ending up at state x, with no actions left to perform. Recall that \(c(\mathbf{x}, \mathbf{u}) = \mathbf{x}^T Q\mathbf{x} + {\mathbf{u}^T R\mathbf{u}}\)
Q: What is the optimal cumulative cost-to-go function with 1 time step left?
\[ \begin{align} J_0(\mathbf{x}) &= \mathbf{x}^T Q \mathbf{x} \end{align} \]
\[ \begin{align} J_0(\mathbf{x}) &= \mathbf{x}^T P_0 \mathbf{x} \end{align} \]
For notational convenience later on
\[ \begin{align} J_0(\mathbf{x}) &= \mathbf{x}^T P_0 \mathbf{x} \\ J_{1}(\mathbf{x}) &= \min_{\mathbf{u}} \underbrace{\left[ \mathbf{x}^T Q \mathbf{x} + \mathbf{u}^T R \mathbf{u} + J_{0}(A\mathbf{x} + B\mathbf{u}) \right]}_{\color{red}\text{ In RL this would be the state-action value function}} \end{align} \]
Bellman Update
Dynamic Programming
Value Iteration
\[ \begin{align} J_0(\mathbf{x}) &= \mathbf{x}^T P_0 \mathbf{x} \\ J_1(\mathbf{x}) &= \min_{\mathbf{u}} [\mathbf{x}^T Q\mathbf{x} + \mathbf{u}^T R\mathbf{u} + J_0(A\mathbf{x} + B\mathbf{u})] \\ &= \min_{\mathbf{u}} [\mathbf{x}^T Q\mathbf{x} + \mathbf{u}^T R\mathbf{u} + (A\mathbf{x} + B\mathbf{u})^T P_0 (A\mathbf{x} + B\mathbf{u})] \end{align} \]
Q: How do we optimize a multivariable function with respect to some variables (in our case, the controls)?
\[ \begin{align} J_0(\mathbf{x}) &= \mathbf{x}^T P_0 \mathbf{x} \\ J_1(\mathbf{x}) &= \min_{\mathbf{u}} [\mathbf{x}^T Q\mathbf{x} + \mathbf{u}^T R\mathbf{u} + J_0(A\mathbf{x} + B\mathbf{u})] \\ &= \min_{\mathbf{u}} [\mathbf{x}^T Q\mathbf{x} + \mathbf{u}^T R\mathbf{u} + (A\mathbf{x} + B\mathbf{u})^T P_0 (A\mathbf{x} + B\mathbf{u})] \\ &= \mathbf{x}^T Q\mathbf{x} + \min_{\mathbf{u}} [\mathbf{u}^T R\mathbf{u} + (A\mathbf{x} + B\mathbf{u})^T P_0 (A\mathbf{x} + B\mathbf{u})] \end{align} \]
\[ \begin{align} J_0(\mathbf{x}) &= \mathbf{x}^T P_0 \mathbf{x} \\ J_1(\mathbf{x}) &= \min_{\mathbf{u}} [\mathbf{x}^T Q\mathbf{x} + \mathbf{u}^T R\mathbf{u} + J_0(A\mathbf{x} + B\mathbf{u})] \\ &= \min_{\mathbf{u}} [\mathbf{x}^T Q\mathbf{x} + \mathbf{u}^T R\mathbf{u} + (A\mathbf{x} + B\mathbf{u})^T P_0 (A\mathbf{x} + B\mathbf{u})] \\ &= \mathbf{x}^T Q\mathbf{x} + \min_{\mathbf{u}} [\mathbf{u}^T R\mathbf{u} + (A\mathbf{x} + B\mathbf{u})^T P_0 (A\mathbf{x} + B\mathbf{u})] \\ &= \mathbf{x}^T Q\mathbf{x} + \mathbf{x}^T A^T P_0 A\mathbf{x} + \min_{\mathbf{u}} [\mathbf{u}^T R\mathbf{u} + 2\mathbf{u}^T B^T P_0 A\mathbf{x} + \mathbf{u}^T B^T P_0 B\mathbf{u}] \end{align} \]
A: Take the partial derivative w.r.t. controls and set it to zero. That will give you a critical point.
\[ J_1(\mathbf{x}) = \mathbf{x}^T Q\mathbf{x} + \mathbf{x}^T A^T P_0 A\mathbf{x} + \min_{\mathbf{u}} [\mathbf{u}^T R\mathbf{u} + 2\mathbf{u}^T B^T P_0 A\mathbf{x} + \mathbf{u}^T B^T P_0 B\mathbf{u}] \]
The minimum is attained at:
\(2R\mathbf{u} + 2B^T P_0 A\mathbf{x} + 2B^T P_0 B\mathbf{u} = 0\)
\((R + B^T P_0 B)\mathbf{u} = -B^T P_0 A\mathbf{x}\)
Q: Is this matrix invertible? Recall R, Po are positive definite matrices.
From calculus/algebra:
\[ \frac{\partial}{\partial \mathbf{u}} (\mathbf{u}^T M\mathbf{u}) = (M + M^T)\mathbf{u} \]
\[ \frac{\partial}{\partial \mathbf{u}} (\mathbf{u}^T M\mathbf{b}) = M\mathbf{b} \]
If M is symmetric:
\[ \frac{\partial}{\partial \mathbf{u}} (\mathbf{u}^T M\mathbf{u}) = 2M\mathbf{u} \]
\[ J_1(\mathbf{x}) = \mathbf{x}^T Q\mathbf{x} + \mathbf{x}^T A^T P_0 A\mathbf{x} + \min_{\mathbf{u}} [\mathbf{u}^T R\mathbf{u} + 2\mathbf{u}^T B^T P_0 A\mathbf{x} + \mathbf{u}^T B^T P_0 B\mathbf{u}] \]
The minimum is attained at:
\(2R\mathbf{u} + 2B^T P_0 A\mathbf{x} + 2B^T P_0 B\mathbf{u} = 0\)
\((R + B^T P_0 B)\mathbf{u} = -B^T P_0 A\mathbf{x}\)
Q: Is this matrix invertible? Recall R, Po are positive definite matrices.
\(R + B^TP_0B\) is positive definite, so it is invertible
\[ J_1(\mathbf{x}) = \mathbf{x}^T Q\mathbf{x} + \mathbf{x}^T A^T P_0 A\mathbf{x} + \min_{\mathbf{u}} [\mathbf{u}^T R\mathbf{u} + 2\mathbf{u}^T B^T P_0 A\mathbf{x} + \mathbf{u}^T B^T P_0 B\mathbf{u}] \]
The minimum is attained at:
\(2R\mathbf{u} + 2B^T P_0 A\mathbf{x} + 2B^T P_0 B\mathbf{u} = 0\)
\((R + B^T P_0 B)\mathbf{u} = -B^T P_0 A\mathbf{x}\)
So, the optimal control for the last time step is:
\(\mathbf{u} = -(R + B^TP_0B)^{-1} B^TP_0A_x\)
\(\mathbf{u} = K_1x\)
Linear controller in terms of the state
\(J_0(\mathbf{x}) = \mathbf{x}^T P_0 \mathbf{x}\)
\[ \begin{align} J_1(\mathbf{x}) &= \mathbf{x}^T Q\mathbf{x} + \mathbf{x}^T A^T P_0 A\mathbf{x} + \min_{\mathbf{u}} [\mathbf{u}^T R\mathbf{u} + 2\mathbf{u}^T B^T P_0 A\mathbf{x} + \mathbf{u}^T B^T P_0 B\mathbf{u}] \\ &= \mathbf{x}^T \underbrace{(Q + K_1^T R K_1 + (A + B K_1)^T P_0 (A + B K_1))}_{P_1} \mathbf{x} \end{align} \]
Q: Why is this a big deal?
A: The cost-to-go function remains quadratic after the first recursive step.
\(J_0(\mathbf{x}) = \mathbf{x}^T P_0 \mathbf{x}\)
\(J_1(\mathbf{x}) = \mathbf{x}^T (Q + K_1^T R K_1 + (A + B K_1)^T P_0 (A + B K_1)) \mathbf{x}\)
\(\qquad = \mathbf{x}^T P_1 \mathbf{x}\)
…
J remains quadratic in x throughout the recursion
\(J_n(\mathbf{x}) = \mathbf{x}^T (Q + K_n^T R K_n + (A + B K_n)^T P_{n-1} (A + B K_n)) \mathbf{x}\)
\(\qquad = \mathbf{x}^T P_n \mathbf{x}\)
…
\(\mathbf{u} = -(R + B^T P_0 B)^{-1} B^T P_0 A\mathbf{x}\) \(\mathbf{u} = K_1 \mathbf{x}\)
\(\mathbf{u} = -(R + B^T P_{n-1} B)^{-1} B^T P_{n-1} A\mathbf{x}\) \(\mathbf{u} = K_n \mathbf{x}\)
u remains linear in x throughout
the recursion
\(P_0 = Q\)
// n is the # of steps left
for n = 1…N
\(K_n = -(R + B^T P_{n-1} B)^{-1} B^T P_{n-1} A\)
\(P_n = Q + K_n^T R K_n + (A + B K_n)^T P_{n-1} (A + B K_n)\)
Optimal control for time t = N – n is \(u_t = K_t x_t\) with cost-to-go \(J_t(x) = x^T P_tx\)
where the states are predicted forward in time according to linear dynamics
\(P_0 = Q\)
// n is the # of steps left
for n = 1…N
\(K_n = -(R + B^T P_{n-1} B)^{-1} B^T P_{n-1} A\)
\(P_n = Q + K_n^T R K_n + (A + B K_n)^T P_{n-1} (A + B K_n)\)
Optimal control for time t = N – n is \(u_t = K_t x_t\) with cost-to-go \(J_t(x) = x^T P_tx\)
where the states are predicted forward in time according to linear dynamics
Potential problem for states of dimension >> 100:
Matrix inversion is expensive: O(k^2.3) for the best
known algorithm and O(k^3) for Gaussian Elimination
Advantages:
Drawbacks:
\[ \begin{align} u_0^*, \ldots, u_{N-1}^* &= \operatorname*{argmin}_{u_0, \ldots, u_N} \sum_{t=0}^{N} c(\mathbf{x}_t, \mathbf{u}_t) \\ \text{s.t.} \quad \mathbf{x}_1 &= f(\mathbf{x}_0, \mathbf{u}_0) \\ \mathbf{x}_2 &= f(\mathbf{x}_1, \mathbf{u}_1) \\ &\vdots \\ \mathbf{x}_N &= f(\mathbf{x}_{N-1}, \mathbf{u}_{N-1}) \end{align} \]
Arbitrary differentiable functions c, f
Idea: iteratively approximate solution by solving linearized versions of the problem via LQR
What can we do when \(x_{t+1} = A_t x_t + B_t \mathbf{u}_t\) and \(c(x_t, \mathbf{u}_t) = x_t^T Q x_t + \mathbf{u}_t^T R\mathbf{u}_t\)?
Turns out, the proof and the algorithm are almost the same
\(P_0 = Q_N\)
// n is the # of steps left
for n = 1…N
\(K_n = -\big( R_{N-n} + B_{N-n}^T P_{n-1} B_{N-n} \big)^{-1} B_{N-n}^T P_{n-1} A_{N-n}\)
\(P_n = Q_{N-n} + K_n^T R_{N-n} K_n + (A_{N-n} + B_{N-n} K_n)^T P_{n-1} (A_{N-n} + B_{N-n} K_n)\)
Optimal controller for n-step horizon is \(u_n = K_n x_n\) with cost-to-go \(J_n(x) = x^T P_n x\)
\[ m \mathbf{\ddot p} = \mathbf{u} - \alpha \mathbf{\dot p} \]
Set \(\dot p = v\) and then you get:
\[ m \mathbf{\dot v} = \mathbf{u} -\alpha \mathbf{v} \]
\[ \frac{\mathbf{p}_{t+1} - \mathbf{p}_t}{\delta t} \simeq \mathbf{v}_t \]
\[ m \frac{\mathbf{v}_{t+1} - \mathbf{v}_t}{\delta t} \simeq \mathbf{u}_t - \alpha \mathbf{v}_t \]
\[ \frac{\mathbf{p}_{t+1} - \mathbf{p}_t}{\delta t} \simeq \mathbf{v}_t \]
\[ m \frac{\mathbf{v}_{t+1} - \mathbf{v}_t}{\delta t} \simeq \mathbf{u}_t - \alpha \mathbf{v}_t \]
Q: How can we express this as a linear system?
\[ \mathbf{x}_{t+1} = \begin{bmatrix} \mathbf{p}_{t+1} \\ \mathbf{v}_{t+1} \end{bmatrix} = \begin{bmatrix} \mathbf{p}_t + \delta t \mathbf{v}_t \\ \mathbf{v}_t + \frac{\delta t}{m} \mathbf{u}_t - \frac{\alpha \delta t}{m} \mathbf{v}_t \end{bmatrix} = \begin{bmatrix} \mathbf{p}_t + \delta t \mathbf{v}_t \\ \mathbf{v}_t - \frac{\alpha \delta t}{m} \mathbf{v}_t \end{bmatrix} + \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ \frac{\delta t}{m} & 0 \\ 0 & \frac{\delta t}{m} \end{bmatrix} \mathbf{u}_t \]
\[ \frac{\mathbf{p}_{t+1} - \mathbf{p}_t}{\delta t} \simeq \mathbf{v}_t \]
\[ m \frac{\mathbf{v}_{t+1} - \mathbf{v}_t}{\delta t} \simeq \mathbf{u}_t - \alpha \mathbf{v}_t \]
\[ \mathbf{x}_{t+1} = \begin{bmatrix} \mathbf{p}_{t+1} \\ \mathbf{v}_{t+1} \end{bmatrix} = \begin{bmatrix} \mathbf{p}_t + \delta t \mathbf{v}_t \\ \mathbf{v}_t + \frac{\delta t}{m} \mathbf{u}_t - \frac{\alpha \delta t}{m} \mathbf{v}_t \end{bmatrix} = \begin{bmatrix} 1 & 0 & \delta t & 0 \\ 0 & 1 & 0 & \delta t \\ 0 & 0 & 1 - \alpha \delta t/m & 0 \\ 0 & 0 & 0 & 1 - \alpha \delta t/m \end{bmatrix} \mathbf{x}_t + \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ \frac{\delta t}{m} & 0 \\ 0 & \frac{\delta t}{m} \end{bmatrix} \mathbf{u}_t \]
A
B
\[ \mathbf{x}_{t+1} = \begin{bmatrix} \mathbf{p}_{t+1} \\ \mathbf{v}_{t+1} \end{bmatrix} = \begin{bmatrix} \mathbf{p}_t + \delta t \mathbf{v}_t \\ \mathbf{v}_t + \frac{\delta t}{m} \mathbf{u}_t - \frac{\alpha \delta t}{m} \mathbf{v}_t \end{bmatrix} = \begin{bmatrix} 1 & 0 & \delta t & 0 \\ 0 & 1 & 0 & \delta t \\ 0 & 0 & 1 - \alpha \delta t/m & 0 \\ 0 & 0 & 0 & 1 - \alpha \delta t/m \end{bmatrix} \mathbf{x}_t + \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ \frac{\delta t}{m} & 0 \\ 0 & \frac{\delta t}{m} \end{bmatrix} \mathbf{u}_t \]
A
B
\[ \begin{align} c(\mathbf{x}, \mathbf{u}) &= \mathbf{x}^T Q \mathbf{x} + \mathbf{u}^T R \mathbf{u} \\ &= \mathbf{x}^T \mathbf{x} + \rho\, \mathbf{u}^T \mathbf{u} \\ &= \|\mathbf{x}\|^2 + \rho \|\mathbf{u}\|^2 \end{align} \]
With initial state
\[ \mathbf{x}_0 = \begin{bmatrix} 10 \\ 30 \\ 10 \\ -5 \end{bmatrix} \]
Instantaneous cost function
\(c(x, \mathbf{u}) = ||x||^2 + 100||\mathbf{u}||^2\)
With initial state
\[ \mathbf{x}_0 = \begin{bmatrix} 10 \\ 30 \\ 10 \\ -5 \end{bmatrix} \]
Instantaneous cost function
\(c(x, \mathbf{u}) = ||x||^2 + 100||\mathbf{u}||^2\)
Notice how the controls tend to zero. It’s because
the state tends to zero as well.
Also note that in the current LQR framework,
we have not included hard constraints on the controls,
i.e. upper or lower bounds. We only penalize large
norm for controls.
With initial state
\[ \mathbf{x}_0 = \begin{bmatrix} 10 \\ 30 \\ 10 \\ -5 \end{bmatrix} \]
Instantaneous cost function
\(c(x, \mathbf{u}) = ||x||^2 + 100||\mathbf{u}||^2\)
Notice how the state tends to zero.
A
B
\[ \mathbf{x}_{t+1} = \begin{bmatrix} \mathbf{p}_{t+1} \\ \mathbf{v}_{t+1} \end{bmatrix} = \begin{bmatrix} 1 & 0 & \delta t & 0 \\ 0 & 1 & 0 & \delta t \\ 0 & 0 & 1 - \alpha \delta t/m & 0 \\ 0 & 0 & 0 & 1 - \alpha \delta t/m \end{bmatrix} \mathbf{x}_t + \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ \frac{\delta t}{m} & 0 \\ 0 & \frac{\delta t}{m} \end{bmatrix} \mathbf{u}_t \]
We are given a desired trajectory: \(\mathbf{p}_0^*, \mathbf{p}_1^*, \dots, \mathbf{p}_T^*\)
Instantaneous cost: \(c(x_t, \mathbf{u}_t) = (\mathbf{p}_t - \mathbf{p}_t^*)^T Q (\mathbf{p}_t - \mathbf{p}_t^*) + \mathbf{u}_t^T R \mathbf{u}_t\)
A
B
\[ \mathbf{x}_{t+1} = \begin{bmatrix} \mathbf{p}_{t+1} \\ \mathbf{v}_{t+1} \end{bmatrix} = \begin{bmatrix} 1 & 0 & \delta t & 0 \\ 0 & 1 & 0 & \delta t \\ 0 & 0 & 1 - \alpha \delta t/m & 0 \\ 0 & 0 & 0 & 1 - \alpha \delta t/m \end{bmatrix} \mathbf{x}_t + \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ \frac{\delta t}{m} & 0 \\ 0 & \frac{\delta t}{m} \end{bmatrix} \mathbf{u}_t \]
Define
\[ \begin{align} \bar{\mathbf{x}}_{t+1} &= \mathbf{x}_{t+1} - \mathbf{x}^{*}_{t+1} \\ &= A \mathbf{x}_t + B \mathbf{u}_t - \mathbf{x}^{*}_{t+1} \\ &= A \bar{\mathbf{x}}_t + B \mathbf{u}_t - \mathbf{x}^{*}_{t+1} - A \mathbf{x}^{*}_t \end{align} \]
We want \(\bar{\mathbf{x}}_{t+1} = \bar{A} \bar{\mathbf{x}}_t + \bar{B} \mathbf{u}_t\)
C
Redefine state:
\[ z_{t+1} = \begin{bmatrix} \bar{\mathbf{x}}_{t+1} \\ 1 \end{bmatrix} = \begin{bmatrix} A & c \\ 0 & 1 \end{bmatrix} \begin{bmatrix} \bar{\mathbf{x}}_t \\ 1 \end{bmatrix} + \begin{bmatrix} B \\ 0 \end{bmatrix} \mathbf{u}_t = \bar{A} z_t + \bar{B} \mathbf{u}_t \]
Redefine cost function: \(c(z_t, \mathbf{u}_t) = z_t^{T} \, \bar{Q} \, z_t + \mathbf{u}_t^{T} R \, \mathbf{u}_t\)
Idea: augment the state
With initial state
\[ \mathbf{z}_0 = \begin{bmatrix} 10 \\ 30 \\ 0 \\ 0 \\ 1 \end{bmatrix} \]
Instantaneous cost function
\(c(\mathbf{z}, \mathbf{u}) = ||\mathbf{z}||^2 + ||\mathbf{u}||^2\)
With initial state
\[ \mathbf{z}_0 = \begin{bmatrix} 10 \\ 30 \\ 0 \\ 0 \\ 1 \end{bmatrix} \]
Instantaneous cost function
\(c(\mathbf{z}, \mathbf{u}) = ||\mathbf{z}||^2 + ||\mathbf{u}||^2\)
\[ \mathbf{x}_0^*, \mathbf{x}_1^*, \ldots, \mathbf{x}_N^* \qquad\qquad \mathbf{u}_0^*, \mathbf{u}_1^*, \ldots, \mathbf{u}_N^* \]
Linearize the nonlinear dynamics \(\mathbf{x}_{t+1} = f(\mathbf{x}_t, \mathbf{u}_t)\) around the reference point \((\mathbf{x}_t^*, \mathbf{u}_t^*)\)
\[ \mathbf{x}_{t+1} \simeq f(\mathbf{x}_t^*, \mathbf{u}_t^*) + \frac{\partial f}{\partial \mathbf{x}}(\mathbf{x}_t^*, \mathbf{u}_t^*)(\mathbf{x}_t - \mathbf{x}_t^*) + \frac{\partial f}{\partial \mathbf{u}}(\mathbf{x}_t^*, \mathbf{u}_t^*)(\mathbf{u}_t - \mathbf{u}_t^*) \]
\[ \begin{align} & \bar{\mathbf{x}}_{t+1} \simeq A_t \bar{\mathbf{x}}_t + B_t \bar{\mathbf{u}}_t \\ & c(\mathbf{x}_t, \mathbf{u}_t) = \bar{\mathbf{x}}_t^T Q \bar{\mathbf{x}}_t + \bar{\mathbf{u}}_t^T R \bar{\mathbf{u}}_t \end{align} \]
where
\[ \begin{align} & \bar{\mathbf{x}}_t = \mathbf{x}_t - \mathbf{x}_t^* \\ & \bar{\mathbf{u}}_t = \mathbf{u}_t - \mathbf{u}_t^* \end{align} \]
Trajectory following can be implemented as a time-varying LQR approximation. Not always clear if this is the best way though.
What can we do when \(\mathbf{x}_{t+1} = f(\mathbf{x}_t, \mathbf{u}_t)\) but the cost is quadratic \(c(\mathbf{x}_t, \mathbf{u}_t) = \mathbf{x}_t^T Q\mathbf{x}_t + \mathbf{u}_t^T R \mathbf{u}_t\) ?
We want to stabilize the system around state \(\mathbf{x}_t = 0\)
But with nonlinear dynamics we do not know if \(\mathbf{u}_t = 0\) will keep the system at the zero state.
-> Need to compute \(\mathbf{u}^*\) such that \(\mathbf{0}_{t+1} = f(\mathbf{0}_t, \mathbf{u}*)\)
Taylor expansion: linearize the nonlinear dynamics around the point \((0, \mathbf{u}^*)\)
\[ \mathbf{x}_{t+1} \simeq f(0, \mathbf{u}^*) + \underbrace{\frac{\partial f}{\partial \mathbf{x}}(0, \mathbf{u}^*)}_{\mathbf{A}}(\mathbf{x}_t - 0) + \underbrace{\frac{\partial f}{\partial \mathbf{u}}(0, \mathbf{u}^*)}_{\mathbf{B}}(\mathbf{u}_t - \mathbf{u}^*) \]
What can we do when \(\mathbf{x}_{t+1} = f(\mathbf{x}_t, \mathbf{u}_t)\) but the cost is quadratic \(c(\mathbf{x}_t, \mathbf{u}_t) = \mathbf{x}_t^T Q\mathbf{x}_t + \mathbf{u}_t^T R \mathbf{u}_t\) ?
We want to stabilize the system around state \(\mathbf{x}_t = 0\)
But with nonlinear dynamics we do not know if \(\mathbf{u}_t = 0\) will keep the system at the zero state.
-> Need to compute \(\mathbf{u}^*\) such that \(\mathbf{0}_{t+1} = f(\mathbf{0}_t, \mathbf{u}*)\)
Taylor expansion: linearize the nonlinear dynamics around the point \((0, \mathbf{u}^*)\)
\[ \begin{align} & \mathbf{x}_{t+1} \simeq f(0, \mathbf{u}^*) + \frac{\partial f}{\partial \mathbf{x}}(0, \mathbf{u}^*)(\mathbf{x}_t - 0) + \frac{\partial f}{\partial \mathbf{u}}(0, \mathbf{u}^*)(\mathbf{u}_t - \mathbf{u}^*) \\ & \mathbf{x}_{t+1} \simeq Ax_t + B(\mathbf{u}_t - \mathbf{u}^*) \end{align} \]
\(\qquad\qquad\qquad\) Solve this via LQR
Look under csc477_fall19/lqr_examples/python