从LQR到iLQR的简明易懂过程(一) - 指南

LQR基础

状态转移:
x k + 1 = A k x t + B k u t x_{k+1} = A_k x_t + B_ku_txk+1=Akxt+Bkut
要求使如下目标函数最优
J ( x 0 , U ) = 1 2 Σ k = 1 N − 1 ( x k T Q k x k + u k T R k u k ) + 1 2 x N T Q f x N J(x_0,U) = \frac{1}{2} \Sigma_{k=1}^ {N-1} (x_k^TQ_kx_k + u_k^TR_ku_k) + \frac{1}{2}x_N^TQ_fx_NJ(x0,U)=21Σk=1N1(xkTQkxk+ukTRkuk)+21xNTQfxN

其中三部分为状态成本,控制成本和终点状态成本。

在终端时刻 N NN,剩余成本即为终端惩罚:
V N ( x N ) = 1 2 x N T Q f x N V_N(x_N) = \frac{1}{2} x_N^T Q_f x_NVN(xN)=21xNTQfxN
为便于递推,我们定义:
S N = Q f \boldsymbol{S_N = Q_f}SN=Qf
所以:
V N ( x N ) = 1 2 x N T S N x N V_N(x_N) = \frac{1}{2} x_N^T S_N x_NVN(xN)=21xNTSNxN

k = N − 1 k=N-1k=N1时刻的最优控制律u N − 1 ∗ u^*_{N-1}uN1

V N ( x N ) = 1 2 x N T Q f x N = 1 2 x N T S N x N V_N(x_N) = \frac{1}{2} x_N^TQ_fx_N=\frac{1}{2}x_N^TS_Nx_NVN(xN)=21xNTQfxN=21xNTSNxN

根据贝尔曼最优性原理
V i ( x i ) = min ⁡ u i V i + 1 ( x i + 1 ) + 1 2 u i T R i u i + 1 2 x i T Q i x i = min ⁡ u i 1 2 u i T R i u i + 1 2 x i T Q i x i + 1 2 ( A i x i + B i u i ) T S i + 1 ( A i x i + B i u i ) = min ⁡ u i 1 2 u i T R i u i + 1 2 x i T Q i x i + 1 2 x i T A T S i + 1 A x i + 1 2 u i T B i T S i + 1 B i u i + x i T A i T S i + 1 B i u i = min ⁡ u i 1 2 u i T ( R i + B i T S i + 1 B i ) u i + 1 2 x i T ( Q i + A i T S i + 1 A i ) x i + u i T ( B i T S i + 1 A i ) x i \begin{aligned}V_{i}(x_{i}) &= \min_{u_{i}} V_{i+1}(x_{i+1})+ \frac{1}{2}u_{i}^TR_{i}u_{i} + \frac{1}{2}x_{i}^TQ_{i}x_{i} \\&=\min_{u_{i}} \frac{1}{2}u_{i}^TR_{i}u_{i} + \frac{1}{2}x_{i}^TQ_{i}x_{i} + \frac{1}{2}(A_{i}x_{i} + B_{i}u_{i})^TS_{i+1}(A_{i}x_{i} + B_{i}u_{i})\\ &=\min_{u_{i}} \frac{1}{2}u_{i}^TR_{i}u_{i} + \frac{1}{2}x_{i}^TQ_{i}x_{i} +\frac{1}{2} x_i^TA^TS_{i+1} Ax_i + \frac{1}{2} u_i^TB_i^TS_{i+1}B_iu_i + x_i^TA_i^TS_{i+1}B_iu_i \\ &=\min_{u_{i}} \frac{1}{2} u_i^T(R_i + B_i^TS_{i+1}B_i)u_i + \frac{1}{2} x_i^T(Q_{i} + A_i^TS_{i+1}A_i)x_i + u_i^T(B_i^TS_{i+1}A_i)x_i \\ \end{aligned}Vi(xi)=uiminVi+1(xi+1)+21uiTRiui+21xiTQixi=uimin21uiTRiui+21xiTQixi+21(Aixi+Biui)TSi+1(Aixi+Biui)=uimin21uiTRiui+21xiTQixi+21xiTATSi+1Axi+21uiTBiTSi+1Biui+xiTAiTSi+1Biui=uimin21uiT(Ri+BiTSi+1Bi)ui+21xiT(Qi+AiTSi+1Ai)xi+uiT(BiTSi+1Ai)xi

接下来求导
∂ V ∂ u = ( R i + B i T S i + 1 B i ) u i + ( B i T S i + 1 A i ) x i = 0 \begin{aligned} \frac{\partial V}{\partial u} &= (R_i + B_i^TS_{i+1}B_i)u_i + (B_i^TS_{i+1}A_i)x_i &= 0 \end{aligned}uV=(Ri+BiTSi+1Bi)ui+(BiTSi+1Ai)xi=0

u i = − ( R i + B i T S i + 1 B i ) − 1 ( B i T S i + 1 A i ) x i u_i = - (R_i + B_i^TS_{i+1}B_i)^{-1}(B_i^TS_{i+1}A_i)x_iui=(Ri+BiTSi+1Bi)1(BiTSi+1Ai)xi
u i = − K i x i , K i = ( R i + B i T S i + 1 B i ) − 1 ( B i T S i + 1 A i ) u_i = -K_ix_i, \ \ \ \ \ K_i = (R_i + B_i^TS_{i+1}B_i)^{-1}(B_i^TS_{i+1}A_i)ui=Kixi,Ki=(Ri+BiTSi+1Bi)1(BiTSi+1Ai)

Riccati

代入最优控制

V i ( x i ) = 1 2 u i ∗ T ( R i + B i T S i + 1 B i ) u i ∗ + 1 2 x i T ( Q i + A i T S i + 1 A i ) x i + u i ∗ T ( B i T S i + 1 A i ) x i = 1 2 x i T K i T ( R i + B i T S i + 1 B i ) K i x i + 1 2 x i T ( Q i + A i T S i + 1 A i ) x i − x i T K i T ( B i T S i + 1 A i ) x i = 1 2 x i T ( K i T ( R i + B i T S i + 1 B i ) K i + Q i + A i T S i + 1 A i − 2 K i T ( B i T S i + 1 A i ) ) x i \begin{aligned}V_i(x_i) = &\frac{1}{2} u_i^{*T}(R_i + B_i^TS_{i+1}B_i)u_i^* + \frac{1}{2} x_i^T(Q_{i} + A_i^TS_{i+1}A_i)x_i + u_i^{*T}(B_i^TS_{i+1}A_i)x_i \\ = & \frac{1}{2} x_i^TK_i^T(R_i + B_i^TS_{i+1}B_i)K_ix_i + \frac{1}{2} x_i^T(Q_{i} + A_i^TS_{i+1}A_i)x_i - x_i^TK_i^T(B_i^TS_{i+1}A_i)x_i \\ = & \frac{1}{2} x_i^T (K_i^T(R_i + B_i^TS_{i+1}B_i)K_i + Q_{i} + A_i^TS_{i+1}A_i - 2K_i^T(B_i^TS_{i+1}A_i)) x_i \end{aligned}Vi(xi)===21uiT(Ri+BiTSi+1Bi)ui+21xiT(Qi+AiTSi+1Ai)xi+uiT(BiTSi+1Ai)xi21xiTKiT(Ri+BiTSi+1Bi)Kixi+21xiT(Qi+AiTSi+1Ai)xixiTKiT(BiTSi+1Ai)xi21xiT(KiT(Ri+BiTSi+1Bi)Ki+Qi+AiTSi+1Ai2KiT(BiTSi+1Ai))xi

完整形式Riccati方程

S i = Q i + A i T S i + 1 A i − 2 K T ( B i T S i + 1 A i ) + K i T R i K i + K i T ( B i T S i + 1 B i ) K i = Q i + K i T R i K i + ( A i − B i K i ) T S i + 1 ( A i − B i K i ) \begin{aligned}S_i = & Q_i + A_i^TS_{i+1}A_i - 2K^T(B_i^TS_{i+1}A_i) + K_i^TR_iK_i + K_i^T(B_i^TS_{i+1}B_i)K_i \\ = & Q_i + K_i^TR_iK_i + (A_i - B_iK_i)^TS_{i+1}(A_i - B_iK_i) \end{aligned}Si==Qi+AiTSi+1Ai2KT(BiTSi+1Ai)+KiTRiKi+KiT(BiTSi+1Bi)KiQi+KiTRiKi+(AiBiKi)TSi+1(AiBiKi)

标准的 离散时间 Riccati 矩阵方程 (DARE):

将Ki代入

S i = Q i + K i T R i K i + ( A i − B i K i ) T S i + 1 ( A i − B i K i ) = Q i + ( ( R i + B i T S i + 1 B i ) − 1 ( B i T S i + 1 A i ) ) T R i ( R i + B i T S i + 1 B i ) − 1 ( B i T S i + 1 A i ) + ( A i − B i ( R i + B i T S i + 1 B i ) − 1 ( B i T S i + 1 A i ) ) T S i + 1 ( A i − B i ( R i + B i T S i + 1 B i ) − 1 ( B i T S i + 1 A i ) ) = Q i + A i T S i + 1 A i − A i T S i + 1 B i ( R i + B i T S i + 1 B i ) − 1 B i T S i + 1 A i \begin{aligned} S_i =& Q_i + K_i^TR_iK_i + (A_i - B_iK_i)^TS_{i+1}(A_i - B_iK_i) \\ = & Q_i + ((R_i + B_i^TS_{i+1}B_i)^{-1}(B_i^TS_{i+1}A_i)) ^T R_i (R_i + B_i^TS_{i+1}B_i)^{-1}(B_i^TS_{i+1}A_i) + (A_i - B_i(R_i + B_i^TS_{i+1}B_i)^{-1}(B_i^TS_{i+1}A_i))^TS_{i+1}(A_i - B_i(R_i + B_i^TS_{i+1}B_i)^{-1}(B_i^TS_{i+1}A_i)) \\ = & Q_i + A_i^T S_{i+1} A_i - A_i^T S_{i+1} B_i (R_i + B_i^T S_{i+1} B_i)^{-1} B_i^T S_{i+1} A_i \end{aligned}Si===Qi+KiTRiKi+(AiBiKi)TSi+1(AiBiKi)Qi+((Ri+BiTSi+1Bi)1(BiTSi+1Ai))TRi(Ri+BiTSi+1Bi)1(BiTSi+1Ai)+(AiBi(Ri+BiTSi+1Bi)1(BiTSi+1Ai))TSi+1(AiBi(Ri+BiTSi+1Bi)1(BiTSi+1Ai))Qi+AiTSi+1AiAiTSi+1Bi(Ri+BiTSi+1Bi)1BiTSi+1Ai

iLQR

状态转移方程变成非线性,即

x k + 1 = f ( x k , u k ) x_{k+1} = f(x_k, u _k)xk+1=f(xk,uk)

目标函数为

J ( x 0 , U ) = h ( x N ) + Σ k = 0 N − 1 l ( x k , u k ) J(x_0,U) = h(x_N) + \Sigma_{k = 0} ^ {N-1} l (x_k,u_k)J(x0,U)=h(xN)+Σk=0N1l(xk,uk)

如果我们用iLQR去优化轨迹

假设大家有一个粗节轨迹( x ˉ , u ˉ ) \bold{(\bar x, \bar u)}(xˉ,uˉ),大家希望计算小的扰动δ x , δ u \delta x, \delta uδx,δu来改善轨迹。

扰动为

δ x k = x k − x ˉ k δ u k = u k − u ˉ k \begin{aligned} \delta x_k = x_k - \bar x_k\\ \delta u_k = u_k - \bar u_k \end{aligned}δxk=xkxˉkδuk=ukuˉk

我们需将非线性泰勒展开,得到线性的局部目标函数及状态转移方程。

对状态转移方程进行展开
x k + 1 ≈ f ( x ˉ k , u ˉ k ) + ∂ f ∂ x ∣ k ˉ ( x k − x ˉ k ) + ∂ f ∂ u ∣ k ˉ ( u k − u ˉ k ) A k = ∂ f ∂ x ∣ k ˉ B k = ∂ f ∂ u ∣ k ˉ c k = f ( x ˉ k , u ˉ k ) − x ˉ k + 1 \begin{aligned} & x_{k+1} \approx f(\bar x_k, \bar u_k) + \frac{\partial f}{\partial x} |_{\bar k} (x_k -\bar x_k) + \frac{\partial f}{\partial u} | _{\bar k} (u_k - \bar u_k) \\ & A_k = \frac{\partial f}{\partial x} |_{\bar k} \\ & B_k = \frac{\partial f}{\partial u} |_{\bar k} \\ & c_k = f(\bar x_k, \bar u_k) - \bar x_{k+1} \end{aligned}xk+1f(xˉk,uˉk)+xfkˉ(xkxˉk)+ufkˉ(ukuˉk)Ak=xfkˉBk=ufkˉck=f(xˉk,uˉk)xˉk+1

δ x k + 1 = A k δ x k + B k δ u k + c k \delta x_{k+1} = A_k\delta x_k + B_k \delta u_k + c_kδxk+1=Akδxk+Bkδuk+ck

目标函数的二次化(局部成本)

δ J = δ h ( x N ) + Σ k = 0 N − 1 δ l ( x k , u k ) \delta J = \delta h(x_N) + \Sigma_{k=0}^{N-1} \delta l (x_k, u_k)δJ=δh(xN)+Σk=0N1δl(xk,uk)

运行成本

对l进行泰勒展开(在粗解的point上)

l ( x k , u k ) ≈ l ( x ˉ k , u ˉ k ) + l x T δ x k + l u T δ u k + 1 2 δ x k T l x x δ x k + 1 2 δ u k T l u u δ u k + δ x k T l x u δ u k , l ( x ˉ k , u ˉ k ) 为常数 l(x_k, u_k) \approx l(\bar x_k, \bar u_k) + l_x^T\delta x_k + l_u^T\delta u_k + \frac{1}{2} \delta x_k^T l_{xx} \delta x_k + \frac{1}{2} \delta u_k^T l_{uu} \delta u_k + \delta x_k^Tl_{xu}\delta u_k, \ \ \ \ \ \ \ l(\bar x_k, \bar u_k) 为常数l(xk,uk)l(xˉk,uˉk)+lxTδxk+luTδuk+21δxkTlxxδxk+21δukTluuδuk+δxkTlxuδuk,l(xˉk,uˉk)为常数

终端成本

h ( x N ) ≈ h ( x ˉ N ) + h x T δ x N + 1 2 δ x N T h x x δ x N , h ( x ˉ N ) 为常数 h(x_N) \approx h(\bar x_N) + h^T_x\delta x_N + \frac{1}{2} \delta x_N^T h_{xx}\delta x_N, \ \ \ \ \ \ \ \ h(\bar x_N) 为常数h(xN)h(xˉN)+hxTδxN+21δxNThxxδxN,h(xˉN)为常数

此时可以发现,与LQR相比,iLQR除了二次型外,还有线性项

iLQR 与 LQR 的线性项差异

LQR (Linear Quadratic Regulator):纯二次型

核心目标:将状态驱动到原点x = 0 \boldsymbol{x}=\mathbf{0}x=0

A. 零参考和零梯度 (Zero Reference & Zero Gradient)

LQR 的局部目标函数是纯二次型,且假设最优轨迹是x = 0 , u = 0 \boldsymbol{x}=\mathbf{0}, \boldsymbol{u}=\mathbf{0}x=0,u=0
在原点 x = 0 \boldsymbol{x}=\mathbf{0}x=0处,成本函数对于状态x \boldsymbol{x}x梯度(一阶导数)总是
∂ l ∂ x ∣ x = 0 , u = 0 = 0 \frac{\partial l}{\partial \boldsymbol{x}}\bigg|_{\boldsymbol{x}=\mathbf{0}, \boldsymbol{u}=\mathbf{0}} = \mathbf{0}xlx=0,u=0=0

B. 贝尔曼方程的性质:线性项消失

由于成本函数在原点没有梯度,所以值函数V k V_kVkx = 0 \boldsymbol{x}=\mathbf{0}x=0附近也没有梯度。
值函数 V k V_kVk的泰勒展开式为:
V k ( x k ) ≈ V k ( 0 ) + v k T ⏟ 梯度 x k + 1 2 x k T V x x x k V_k(\boldsymbol{x}_k) \approx V_k(\mathbf{0}) + \underbrace{\boldsymbol{v}_k^T}_{\text{梯度}} \boldsymbol{x}_k + \frac{1}{2} \boldsymbol{x}_k^T \boldsymbol{V}_{\boldsymbol{xx}} \boldsymbol{x}_kVk(xk)Vk(0)+梯度vkTxk+21xkTVxxxk
由于 V k ( 0 ) = 0 V_k(\mathbf{0})=0Vk(0)=0v k = ∂ V k ∂ x ∣ x = 0 = 0 \boldsymbol{v}_k = \frac{\partial V_k}{\partial \boldsymbol{x}}|_{\boldsymbol{x}=\mathbf{0}} = \mathbf{0}vk=xVkx=0=0,所有线性项都消失了。

LQR 结论: 值函数是纯二次型
V k ( x k ) = 1 2 x k T S k x k V_k(\boldsymbol{x}_k) = \frac{1}{2} \boldsymbol{x}_k^T \boldsymbol{S}_k \boldsymbol{x}_kVk(xk)=21xkTSkxk


iLQR (Iterative LQR):仿射二次型 (Affine Quadratic)

核心目标: 在一条任意名义轨迹( x ˉ , u ˉ ) (\bar{\boldsymbol{x}}, \bar{\boldsymbol{u}})(xˉ,uˉ)附近进行局部优化。

A. 轨迹偏离原点,存在非零梯度 (Non-Zero Gradient)

名义轨迹 ( x ˉ , u ˉ ) (\bar{\boldsymbol{x}}, \bar{\boldsymbol{u}})(xˉ,uˉ)通常不经过原点,且在优化过程中通常不是最优的
我们在名义点x ˉ k \bar{\boldsymbol{x}}_kxˉk处计算局部成本δ l \boldsymbol{\delta l}δl。由于 x ˉ k \bar{\boldsymbol{x}}_kxˉk最优轨迹,局部成本函数就是不l ( x , u ) l(\boldsymbol{x}, \boldsymbol{u})l(x,u)x ˉ k \bar{\boldsymbol{x}}_kxˉk 处对 x \boldsymbol{x}xu \boldsymbol{u}u 的导数不为零
l x = ∂ l ∂ x ∣ k ˉ ≠ 0 \boldsymbol{l}_{\boldsymbol{x}} = \frac{\partial l}{\partial \boldsymbol{x}}\bigg|_{\bar{k}} \neq \mathbf{0}lx=xlkˉ=0
l u = ∂ l ∂ u ∣ k ˉ ≠ 0 \boldsymbol{l}_{\boldsymbol{u}} = \frac{\partial l}{\partial \boldsymbol{u}}\bigg|_{\bar{k}} \neq \mathbf{0}lu=ulkˉ=0

B. 线性项的引入:驱动优化过程

这些非零的梯度l x \boldsymbol{l}_{\boldsymbol{x}}lxl u \boldsymbol{l}_{\boldsymbol{u}}lu在 Q 函数Q k \mathcal{Q}_kQk的泰勒展开中引入了线性分量q \boldsymbol{q}qr \boldsymbol{r}r
Q k ≈ ⋯ + q T δ x k ⏟ 非零 + r T δ u k ⏟ 非零 + 1 2 δ x k T Q δ x k + … \mathcal{Q}_k \approx \dots + \underbrace{\boldsymbol{q}^T \boldsymbol{\delta x}_k}_{\text{非零}} + \underbrace{\boldsymbol{r}^T \boldsymbol{\delta u}_k}_{\text{非零}} + \frac{1}{2} \boldsymbol{\delta x}_k^T \boldsymbol{Q} \boldsymbol{\delta x}_k + \dotsQk+非零qTδxk+非零rTδuk+21δxkTQδxk+
由于 q \boldsymbol{q}qr \boldsymbol{r}r不为零,利用贝尔曼方程逆向递推得到的值函数V k V_kVk 的梯度 v k \boldsymbol{v}_kvk也不为零
v k = q − M R − 1 r ≠ 0 \boldsymbol{v}_k = \boldsymbol{q} - \boldsymbol{M} \boldsymbol{R}^{-1} \boldsymbol{r} \neq \mathbf{0}vk=qMR1r=0

iLQR 结论: 值函数是仿射二次型(包含线性项):
V k ( δ x k ) ≈ const + v k T δ x k ⏟ 非零线性项 + 1 2 δ x k T V x x δ x k V_k(\boldsymbol{\delta x}_k) \approx \text{const} + \underbrace{\boldsymbol{v}_k^T \boldsymbol{\delta x}_k}_{\text{非零线性项}} + \frac{1}{2} \boldsymbol{\delta x}_k^T \boldsymbol{V}_{\boldsymbol{xx}} \boldsymbol{\delta x}_kVk(δxk)const+非零线性项vkTδxk+21δxkTVxxδxk


总结:线性项的作用

iLQR 中的非零线性项v k T δ x k \boldsymbol{v}_k^T \boldsymbol{\delta x}_kvkTδxk意味着最优值函数在当前名义点处有一个非零的斜率

该斜率就是优化过程的驱动力:它告诉我们沿着哪个方向(即最优反馈控制δ u k ∗ \boldsymbol{\delta u}_k^*δuk)移动 δ x \boldsymbol{\delta x}δx 可以获得最大的成本下降,从而不断地将名义轨迹推向局部最优。

iLQR的最优目标函数

终端:
在k = N 时,最优未来成本是终端成本
V N ( x N ) = h ( x N ) V_N(x_N) = h (x_N)VN(xN)=h(xN)
将其泰勒展开,去掉常数项,可以得到

V N ( δ x N ) ≈ h x T δ x N + 1 2 δ x N T h x x δ x N V_N(\delta x_N) \approx h^T_x\delta x_N + \frac{1}{2} \delta x^T _N h_{xx} \delta x_NVN(δxN)hxTδxN+21δxNThxxδxN

一般:
V k ( δ x k ) = C k + v k T δ x k + 1 2 δ x k T V x x , k δ x k V_k(\delta x_k) = C_k + v_k^T\delta x_k + \frac{1}{2}\delta x^T_kV_{xx,k} \delta x_kVk(δxk)=Ck+vkTδxk+21δxkTVxx,kδxk

其中,在终端的时候
v N = h x , V x x , N = h x x v_N = h_x,\ \ \ V_{xx,N} = h_{xx}vN=hx,Vxx,N=hxx

递推过程k + 1 ----> k:

Q k ( δ x k , δ u k ) = l ( x k , u k ) + V k + 1 ( x k + 1 ) Q_k(\delta x_k, \delta u_k) = l (x_k,u_k) + V_{k+1}(x_{k+1})Qk(δxk,δuk)=l(xk,uk)+Vk+1(xk+1)

我们的目标是将Q k \mathcal{Q}_kQk在名义轨迹点( x ˉ k , u ˉ k ) (\bar{\boldsymbol{x}}_k, \bar{\boldsymbol{u}}_k)(xˉk,uˉk)附近展开,得到一个关于扰动( δ x k , δ u k ) (\boldsymbol{\delta x}_k, \boldsymbol{\delta u}_k)(δxk,δuk)二次近似

Q k ≈ 常数 + q T δ x k + r T δ u k + 1 2 δ x k T Q δ x k + 1 2 δ u k T R δ u k + δ x k T M δ u k \mathcal{Q}_k \approx \text{常数} + \boldsymbol{q}^T \boldsymbol{\delta x}_k + \boldsymbol{r}^T \boldsymbol{\delta u}_k + \frac{1}{2} \boldsymbol{\delta x}_k^T \boldsymbol{Q} \boldsymbol{\delta x}_k + \frac{1}{2} \boldsymbol{\delta u}_k^T \boldsymbol{R} \boldsymbol{\delta u}_k + \boldsymbol{\delta x}_k^T \boldsymbol{M} \boldsymbol{\delta u}_kQk常数+qTδxk+rTδuk+21δxkTQδxk+21δukTRδuk+δxkTMδuk

分别展开 l llV k + 1 V_{k+1}Vk+1,然后将它们的系数合成

2. 已知信息 (输入)

k kk时刻的逆向递推中,我们已知以下信息:

  1. 瞬时成本 l ll 的导数:
    l x , l u \boldsymbol{l}_{\boldsymbol{x}}, \boldsymbol{l}_{\boldsymbol{u}}lx,lu (梯度)
    l x x , l u u , l x u \boldsymbol{l}_{\boldsymbol{xx}}, \boldsymbol{l}_{\boldsymbol{uu}}, \boldsymbol{l}_{\boldsymbol{xu}}lxx,luu,lxu(Hessian)

  2. 未来成本 V k + 1 V_{k+1}Vk+1 的近似:
    v k + 1 \boldsymbol{v}_{k+1}vk+1 (V k + 1 V_{k+1}Vk+1δ x k + 1 \boldsymbol{\delta x}_{k+1}δxk+1 的梯度)
    V x x , k + 1 \boldsymbol{V}_{\boldsymbol{xx}, k+1}Vxx,k+1 (V k + 1 V_{k+1}Vk+1δ x k + 1 \boldsymbol{\delta x}_{k+1}δxk+1的 Hessian)

  3. 线性化动力学:
    δ x k + 1 = A k δ x k + B k δ u k + c k \boldsymbol{\delta x}_{k+1} = \boldsymbol{A}_k \boldsymbol{\delta x}_k + \boldsymbol{B}_k \boldsymbol{\delta u}_k + \boldsymbol{c}_kδxk+1=Akδxk+Bkδuk+ck

瞬时成本 l ll的展开 (直接展开)

我们首先对 l ( x k , u k ) l(\boldsymbol{x}_k, \boldsymbol{u}_k)l(xk,uk)( x ˉ k , u ˉ k ) (\bar{\boldsymbol{x}}_k, \bar{\boldsymbol{u}}_k)(xˉk,uˉk)附近进行二阶泰勒展开:

l ( x k , u k ) ≈ l ( x ˉ k , u ˉ k ) + l x T δ x k + l u T δ u k + 1 2 δ x k T l x x δ x k + 1 2 δ u k T l u u δ u k + δ x k T l x u δ u k l(\boldsymbol{x}_k, \boldsymbol{u}_k) \approx l(\bar{\boldsymbol{x}}_k, \bar{\boldsymbol{u}}_k) + \boldsymbol{l}_{\boldsymbol{x}}^T \boldsymbol{\delta x}_k + \boldsymbol{l}_{\boldsymbol{u}}^T \boldsymbol{\delta u}_k + \frac{1}{2} \boldsymbol{\delta x}_k^T \boldsymbol{l}_{\boldsymbol{xx}} \boldsymbol{\delta x}_k + \frac{1}{2} \boldsymbol{\delta u}_k^T \boldsymbol{l}_{\boldsymbol{uu}} \boldsymbol{\delta u}_k + \boldsymbol{\delta x}_k^T \boldsymbol{l}_{\boldsymbol{xu}} \boldsymbol{\delta u}_kl(xk,uk)l(xˉk,uˉk)+lxTδxk+luTδuk+21δxkTlxxδxk+21δukTluuδuk+δxkTlxuδuk

未来成本 V k + 1 V_{k+1}Vk+1的展开 (链式法则)

我们必须将 V k + 1 V_{k+1}Vk+1(它是 δ x k + 1 \boldsymbol{\delta x}_{k+1}δxk+1的函数)转换为( δ x k , δ u k ) (\boldsymbol{\delta x}_k, \boldsymbol{\delta u}_k)(δxk,δuk) 的函数。

我们从 V k + 1 V_{k+1}Vk+1的已知近似开始:
V k + 1 ( δ x k + 1 ) ≈ 常数 + v k + 1 T δ x k + 1 + 1 2 δ x k + 1 T V x x , k + 1 δ x k + 1 V_{k+1}(\boldsymbol{\delta x}_{k+1}) \approx \text{常数} + \boldsymbol{v}_{k+1}^T \boldsymbol{\delta x}_{k+1} + \frac{1}{2} \boldsymbol{\delta x}_{k+1}^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{\delta x}_{k+1}Vk+1(δxk+1)常数+vk+1Tδxk+1+21δxk+1TVxx,k+1δxk+1

现在,我们将线性化动力学δ x k + 1 = A k δ x k + B k δ u k + c k \boldsymbol{\delta x}_{k+1} = \boldsymbol{A}_k \boldsymbol{\delta x}_k + \boldsymbol{B}_k \boldsymbol{\delta u}_k + \boldsymbol{c}_kδxk+1=Akδxk+Bkδuk+ck 代入上式。

V k + 1 V_{k+1}Vk+1的线性项展开

δ x k + 1 \boldsymbol{\delta x}_{k+1}δxk+1 代入 v k + 1 T δ x k + 1 \boldsymbol{v}_{k+1}^T \boldsymbol{\delta x}_{k+1}vk+1Tδxk+1
v k + 1 T δ x k + 1 = v k + 1 T ( A k δ x k + B k δ u k + c k ) \boldsymbol{v}_{k+1}^T \boldsymbol{\delta x}_{k+1} = \boldsymbol{v}_{k+1}^T (\boldsymbol{A}_k \boldsymbol{\delta x}_k + \boldsymbol{B}_k \boldsymbol{\delta u}_k + \boldsymbol{c}_k)vk+1Tδxk+1=vk+1T(Akδxk+Bkδuk+ck)
v k + 1 T δ x k + 1 = ( A k T v k + 1 ) T δ x k ⏟ 对 δ x k 线性 + ( B k T v k + 1 ) T δ u k ⏟ 对 δ u k 线性 + v k + 1 T c k ⏟ 常数 \boldsymbol{v}_{k+1}^T \boldsymbol{\delta x}_{k+1} = \underbrace{(\boldsymbol{A}_k^T \boldsymbol{v}_{k+1})^T \boldsymbol{\delta x}_k}_{\text{对 } \boldsymbol{\delta x}_k \text{ 线性}} + \underbrace{(\boldsymbol{B}_k^T \boldsymbol{v}_{k+1})^T \boldsymbol{\delta u}_k}_{\text{对 } \boldsymbol{\delta u}_k \text{ 线性}} + \underbrace{\boldsymbol{v}_{k+1}^T \boldsymbol{c}_k}_{\text{常数}}vk+1Tδxk+1=δxk线性(AkTvk+1)Tδxk+δuk线性(BkTvk+1)Tδuk+常数vk+1Tck

V k + 1 V_{k+1}Vk+1的二次项展开

δ x k + 1 \boldsymbol{\delta x}_{k+1}δxk+1 代入 1 2 δ x k + 1 T V x x , k + 1 δ x k + 1 \frac{1}{2} \boldsymbol{\delta x}_{k+1}^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{\delta x}_{k+1}21δxk+1TVxx,k+1δxk+1
1 2 ( A k δ x k + B k δ u k + c k ) T V x x , k + 1 ( A k δ x k + B k δ u k + c k ) \frac{1}{2} (\boldsymbol{A}_k \boldsymbol{\delta x}_k + \boldsymbol{B}_k \boldsymbol{\delta u}_k + \boldsymbol{c}_k)^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} (\boldsymbol{A}_k \boldsymbol{\delta x}_k + \boldsymbol{B}_k \boldsymbol{\delta u}_k + \boldsymbol{c}_k)21(Akδxk+Bkδuk+ck)TVxx,k+1(Akδxk+Bkδuk+ck)

展开这个二次型(我们只保留到二阶,忽略c k \boldsymbol{c}_kck的二次项,因为它只是常数):

δ x k \boldsymbol{\delta x}_kδxk 的二次项:1 2 ( A k δ x k ) T V x x , k + 1 ( A k δ x k ) = 1 2 δ x k T ( A k T V x x , k + 1 A k ) δ x k \frac{1}{2} (\boldsymbol{A}_k \boldsymbol{\delta x}_k)^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} (\boldsymbol{A}_k \boldsymbol{\delta x}_k) = \frac{1}{2} \boldsymbol{\delta x}_k^T (\boldsymbol{A}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{A}_k) \boldsymbol{\delta x}_k21(Akδxk)TVxx,k+1(Akδxk)=21δxkT(AkTVxx,k+1Ak)δxk
δ u k \boldsymbol{\delta u}_kδuk 的二次项:1 2 ( B k δ u k ) T V x x , k + 1 ( B k δ u k ) = 1 2 δ u k T ( B k T V x x , k + 1 B k ) δ u k \frac{1}{2} (\boldsymbol{B}_k \boldsymbol{\delta u}_k)^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} (\boldsymbol{B}_k \boldsymbol{\delta u}_k) = \frac{1}{2} \boldsymbol{\delta u}_k^T (\boldsymbol{B}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{B}_k) \boldsymbol{\delta u}_k21(Bkδuk)TVxx,k+1(Bkδuk)=21δukT(BkTVxx,k+1Bk)δuk
交叉项 (δ x k , δ u k \boldsymbol{\delta x}_k, \boldsymbol{\delta u}_kδxk,δuk):δ x k T ( A k T V x x , k + 1 B k ) δ u k \boldsymbol{\delta x}_k^T (\boldsymbol{A}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{B}_k) \boldsymbol{\delta u}_kδxkT(AkTVxx,k+1Bk)δuk
线性项 (来自c k \boldsymbol{c}_kck):δ x k T ( A k T V x x , k + 1 c k ) + δ u k T ( B k T V x x , k + 1 c k ) \boldsymbol{\delta x}_k^T (\boldsymbol{A}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{c}_k) + \boldsymbol{\delta u}_k^T (\boldsymbol{B}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{c}_k)δxkT(AkTVxx,k+1ck)+δukT(BkTVxx,k+1ck)

(注:iLQR 简化忽略了f ff 的二阶导数 f x x , f u u \boldsymbol{f}_{\boldsymbol{xx}}, \boldsymbol{f}_{\boldsymbol{uu}}fxx,fuu,它们本应出现在V k + 1 V_{k+1}Vk+1的展开中。)

合成 Q 函数系数

现在我们将第 1 部分 (l ll的展开) 和第 2 部分 (V k + 1 V_{k+1}Vk+1的展开) 的同类项系数相加,得到Q k \mathcal{Q}_kQk的最终系数。

梯度 q \boldsymbol{q}q (对 δ x k \boldsymbol{\delta x}_kδxk 的线性项)

q = ∂ Q k ∂ δ x k = l x ⏟ 来自 l + A k T v k + 1 ⏟ 来自 V k + 1 线性项 + A k T V x x , k + 1 c k ⏟ 来自 V k + 1 二次项与 c k 交叉 \boldsymbol{q} = \frac{\partial \mathcal{Q}_k}{\partial \boldsymbol{\delta x}_k} = \underbrace{\boldsymbol{l}_{\boldsymbol{x}}}_{\text{来自 } l} + \underbrace{\boldsymbol{A}_k^T \boldsymbol{v}_{k+1}}_{\text{来自 } V_{k+1} \text{ 线性项}} + \underbrace{\boldsymbol{A}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{c}_k}_{\text{来自 } V_{k+1} \text{ 二次项与 } \boldsymbol{c}_k \text{ 交叉}}q=δxkQk=来自llx+来自Vk+1线性项AkTvk+1+来自Vk+1二次项与ck交叉AkTVxx,k+1ck

梯度 r \boldsymbol{r}r (对 δ u k \boldsymbol{\delta u}_kδuk 的线性项)

r = ∂ Q k ∂ δ u k = l u ⏟ 来自 l + B k T v k + 1 ⏟ 来自 V k + 1 线性项 + B k T V x x , k + 1 c k ⏟ 来自 V k + 1 二次项与 c k 交叉 \boldsymbol{r} = \frac{\partial \mathcal{Q}_k}{\partial \boldsymbol{\delta u}_k} = \underbrace{\boldsymbol{l}_{\boldsymbol{u}}}_{\text{来自 } l} + \underbrace{\boldsymbol{B}_k^T \boldsymbol{v}_{k+1}}_{\text{来自 } V_{k+1} \text{ 线性项}} + \underbrace{\boldsymbol{B}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{c}_k}_{\text{来自 } V_{k+1} \text{ 二次项与 } \boldsymbol{c}_k \text{ 交叉}}r=δukQk=来自llu+来自Vk+1线性项BkTvk+1+来自Vk+1二次项与ck交叉BkTVxx,k+1ck

HessianQ \boldsymbol{Q}Q (对 δ x k , δ x k \boldsymbol{\delta x}_k, \boldsymbol{\delta x}_kδxk,δxk 的二次项)

Q = ∂ 2 Q k ∂ δ x k 2 = l x x ⏟ 来自 l + A k T V x x , k + 1 A k ⏟ 来自 V k + 1 二次项 \boldsymbol{Q} = \frac{\partial^2 \mathcal{Q}_k}{\partial \boldsymbol{\delta x}_k^2} = \underbrace{\boldsymbol{l}_{\boldsymbol{xx}}}_{\text{来自 } l} + \underbrace{\boldsymbol{A}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{A}_k}_{\text{来自 } V_{k+1} \text{ 二次项}}Q=δxk22Qk=来自llxx+来自Vk+1二次项AkTVxx,k+1Ak

HessianR \boldsymbol{R}R (对 δ u k , δ u k \boldsymbol{\delta u}_k, \boldsymbol{\delta u}_kδuk,δuk 的二次项)

R = ∂ 2 Q k ∂ δ u k 2 = l u u ⏟ 来自 l + B k T V x x , k + 1 B k ⏟ 来自 V k + 1 二次项 \boldsymbol{R} = \frac{\partial^2 \mathcal{Q}_k}{\partial \boldsymbol{\delta u}_k^2} = \underbrace{\boldsymbol{l}_{\boldsymbol{uu}}}_{\text{来自 } l} + \underbrace{\boldsymbol{B}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{B}_k}_{\text{来自 } V_{k+1} \text{ 二次项}}R=δuk22Qk=来自lluu+来自Vk+1二次项BkTVxx,k+1Bk

HessianM \boldsymbol{M}M (对 δ x k , δ u k \boldsymbol{\delta x}_k, \boldsymbol{\delta u}_kδxk,δuk 的交叉项)

M = ∂ 2 Q k ∂ δ x k ∂ δ u k = l x u ⏟ 来自 l + A k T V x x , k + 1 B k ⏟ 来自 V k + 1 二次项 \boldsymbol{M} = \frac{\partial^2 \mathcal{Q}_k}{\partial \boldsymbol{\delta x}_k \partial \boldsymbol{\delta u}_k} = \underbrace{\boldsymbol{l}_{\boldsymbol{xu}}}_{\text{来自 } l} + \underbrace{\boldsymbol{A}_k^T \boldsymbol{V}_{\boldsymbol{xx}, k+1} \boldsymbol{B}_k}_{\text{来自 } V_{k+1} \text{ 二次项}}M=δxkδuk2Qk=来自llxu+来自Vk+1二次项AkTVxx,k+1Bk

posted @ 2025-12-15 08:06  gccbuaa  阅读(0)  评论(0)    收藏  举报