完整教程:线性代数 · SVD | 令人困扰的精度 2
注:本文为 “线性代数 · SVD” 相关英文引文,机翻未校。
如有内容异常,请看原文。
csdn 篇幅字数限制,分为两篇,此为第 2 篇。
- 线性代数 · SVD | 令人困扰的精度 1-CSDN博客
https://blog.csdn.net/u013669912/article/details/152056616
Properties
性质
Singular value decomposition has lots of useful properties, some of which we’ll prove here. First, note that taking the transpose of a singular value decompositionM = U Σ V T M = U \Sigma V^{T}M=UΣVTgives another singular value decomposition
奇异值分解具有许多实用性质,以下将证明其中一部分。首先,对奇异值分解M = U Σ V T M = U \Sigma V^{T}M=UΣVT取转置,可得到另一个奇异值分解
M T = V Σ T U T M^{T} = V \Sigma^{T} U^{T}MT=VΣTUT
showing thatM T M^{T}MThas the same singular values asM MM, but with the left and right singular vectors swapped. This can be proven more conceptually as follows.
这表明 M T M^{T}MT 与 M MM具有相同的奇异值,但左、右奇异向量互换。我们也可以从更概念化的角度证明这一点,如下所示。
Key lemma #2: WriteB ( u , v ) = ⟨ u , M v ⟩ = ⟨ M T u , v ⟩ B(u, v) = \langle u, M v \rangle = \langle M^{T} u, v \rangleB(u,v)=⟨u,Mv⟩=⟨MTu,v⟩. Then for every1 ≤ i ≤ r 1 \leq i \leq r1≤i≤r, the left and right singular vectorsu i u_{i}ui, v i v_{i}vimaximize the value ofB ( u , v ) B(u, v)B(u,v)subject to the constraint that∥ u ∥ = ∥ v ∥ = 1 \|u\| = \|v\| = 1∥u∥=∥v∥=1, u uuis orthogonal tou j u_{j}ujfor allj ≤ i − 1 j \leq i - 1j≤i−1, and v vvis orthogonal tov j v_{j}vjfor allj ≤ i − 1 j \leq i - 1j≤i−1. This maximum value isσ i \sigma_{i}σi.
关键引理2:定义B ( u , v ) = ⟨ u , M v ⟩ = ⟨ M T u , v ⟩ B(u, v) = \langle u, M v \rangle = \langle M^{T} u, v \rangleB(u,v)=⟨u,Mv⟩=⟨MTu,v⟩。则对任意 1 ≤ i ≤ r 1 \leq i \leq r1≤i≤r,左奇异向量u i u_{i}ui和右奇异向量v i v_{i}vi满足:在约束条件(∥ u ∥ = ∥ v ∥ = 1 \|u\| = \|v\| = 1∥u∥=∥v∥=1、u uu 与所有 j ≤ i − 1 j \leq i - 1j≤i−1 的 u j u_{j}uj 正交、v vv 与所有 j ≤ i − 1 j \leq i - 1j≤i−1 的 v j v_{j}vj 正交)下,B ( u , v ) B(u, v)B(u,v)取得最大值,且该最大值为σ i \sigma_{i}σi。
Proof. At the maximum value ofB ( u , v ) B(u, v)B(u,v)subject to the above constraints, if we fixv vv then B ( ⋅ , v ) B(\cdot, v)B(⋅,v)takes its maximum value atu uu. But for fixedv vv, B ( u , v ) = ⟨ u , M v ⟩ B(u, v) = \langle u, M v \rangleB(u,v)=⟨u,Mv⟩uniquely takes its maximum value whenu uuis proportional toM v M vMv (if M v ≠ 0 M v \neq 0Mv=0), hence must in fact be equal toM v ∥ M v ∥ \frac{M v}{\|M v\|}∥Mv∥Mv; moreover, this is always possible thanks to key lemma #1. So we are in fact maximizing
证明:在上述约束条件下,当B ( u , v ) B(u, v)B(u,v)取得最大值时,若固定v vv,则 B ( ⋅ , v ) B(\cdot, v)B(⋅,v) 在 u uu处取得最大值。对固定的v vv,B ( u , v ) = ⟨ u , M v ⟩ B(u, v) = \langle u, M v \rangleB(u,v)=⟨u,Mv⟩的最大值唯一地在u uu 与 M v M vMv 成比例(若 M v ≠ 0 M v \neq 0Mv=0)时取得,因此u uu 必等于 M v ∥ M v ∥ \frac{M v}{\|M v\|}∥Mv∥Mv在上述约束条件下最大化就是;且借助关键引理1,这种选择始终可行。因此,我们实际上
⟨ M v ∥ M v ∥ , M v ⟩ = ∥ M v ∥ \left\langle \frac{M v}{\|M v\|}, M v \right\rangle = \|M v\|⟨∥Mv∥Mv,Mv⟩=∥Mv∥
subject to the above constraints, and we already know the solution is given byv = v i v = v_{i}v=vi.
而大家已知该最大化问题的解为v = v i v = v_{i}v=vi,即证得所需结论。
Left-right symmetry: Letσ i \sigma_{i}σi, u i u_{i}ui, v i v_{i}vibe the singular values, left singular vectors, and right singular vectors ofM MMas above. Thenσ i \sigma_{i}σi, v i v_{i}vi, u i u_{i}uiare the singular values, left singular vectors, and right singular vectors ofM T M^{T}MT. In particular,M T u i = σ i v i M^{T} u_{i} = \sigma_{i} v_{i}MTui=σivi.
左右对称性:设σ i \sigma_{i}σi、u i u_{i}ui、v i v_{i}vi 分别为上述 M MM的奇异值、左奇异向量、右奇异向量,则σ i \sigma_{i}σi、v i v_{i}vi、u i u_{i}ui 分别为 M T M^{T}MT的奇异值、左奇异向量、右奇异向量。特别地,有M T u i = σ i v i M^{T} u_{i} = \sigma_{i} v_{i}MTui=σivi。
Proof. Apply key lemma #2 toM T M^{T}MT, and note thatB ( u , v ) B(u, v)B(u,v)is the same forM MM and M T M^{T}MT, just with the roles ofu uu and v vvswitched.
证明:对 M T M^{T}MT应用关键引理2,注意到M MM 和 M T M^{T}MT 对应的 B ( u , v ) B(u, v)B(u,v)完全相同,仅需交换u uu 和 v vv的角色,即证得所需结论。
Singular = eigen: The left singular vectorsu i u_{i}uiare the eigenvectors ofM M T M M^{T}MMTcorresponding to its nonzero eigenvalues, which areσ i 2 \sigma_{i}^{2}σi2 for 1 ≤ i ≤ r 1 \leq i \leq r1≤i≤r. The right singular vectorsv i v_{i}viare the eigenvectors ofM T M M^{T} MMTMcorresponding to its nonzero eigenvalues, which are alsoσ i 2 \sigma_{i}^{2}σi2 for 1 ≤ i ≤ r 1 \leq i \leq r1≤i≤r.
奇异向量与特征向量的关系:左奇异向量u i u_{i}ui 是矩阵 M M T M M^{T}MMT对应于非零特征值的特征向量,这些非零特征值为1 ≤ i ≤ r 1 \leq i \leq r1≤i≤r 对应的 σ i 2 \sigma_{i}^{2}σi2;右奇异向量v i v_{i}vi 是矩阵 M T M M^{T} MMTM对应于非零特征值的特征向量,这些非零特征值同样为1 ≤ i ≤ r 1 \leq i \leq r1≤i≤r 对应的 σ i 2 \sigma_{i}^{2}σi2。
Proof. We now know thatM v i = σ i u i M v_{i} = \sigma_{i} u_{i}Mvi=σiuiand thatM T u i = σ i v i M^{T} u_{i} = \sigma_{i} v_{i}MTui=σivi, hence
证明:由前文可知M v i = σ i u i M v_{i} = \sigma_{i} u_{i}Mvi=σiui 且 M T u i = σ i v i M^{T} u_{i} = \sigma_{i} v_{i}MTui=σivi,因此
M T M v i = M T ( σ i u i ) = σ i 2 v i M^{T} M v_{i} = M^{T}(\sigma_{i} u_{i}) = \sigma_{i}^{2} v_{i}MTMvi=MT(σiui)=σi2vi
and
同时有
M M T u i = M ( σ i v i ) = σ i 2 u i . M M^{T} u_{i} = M(\sigma_{i} v_{i}) = \sigma_{i}^{2} u_{i}.MMTui=M(σivi)=σi2ui.
Hence v i v_{i}vi, u i u_{i}uiare orthonormal eigenvectors ofM T M M^{T} MMTM, M M T M M^{T}MMTrespectively. Moreover, these matrices have rank at most (in fact exactly)r rr, so this exhausts all eigenvectors corresponding to nonzero eigenvalues.
因此,v i v_{i}vi 和 u i u_{i}ui 分别是 M T M M^{T} MMTM 和 M M T M M^{T}MMT的标准正交特征向量。此外,这两个矩阵的秩至多为r rr(实际上恰好为r rr),因此上述特征向量已穷尽所有对应于非零特征值的特征向量。
This gives an alternative route to understanding singular value decomposition which comes from writing∥ M v ∥ 2 \|M v\|^{2}∥Mv∥2 as
这为理解奇异值分解提供了另一种途径:将∥ M v ∥ 2 \|M v\|^{2}∥Mv∥2 表示为
∥ M v ∥ 2 = ⟨ M v , M v ⟩ = ⟨ v , M T M v ⟩ \|M v\|^{2} = \langle M v, M v \rangle = \langle v, M^{T} M v \rangle∥Mv∥2=⟨Mv,Mv⟩=⟨v,MTMv⟩
and then applying the spectral theorem (https://en.wikipedia.org/wiki/Spectral_theorem) toM T M M^{T} MMTMto diagonalize it. But I think it’s worth knowing that there’s a route to singular value decomposition which is independent of the spectral theorem.
然后对 M T M M^{T} MMTM,存在一种不依赖于谱定理的奇异值分解构造方法(即前文所述办法)。就是应用谱定理(https://en.wikipedia.org/wiki/Spectral_theorem)将其对角化。但值得注意的
In addition to the above algebraic characterization of singular values, the singular values also admit the following variational characterization.
除了上述奇异值的代数刻画外,奇异值还具有如下变分刻画。
Variational characterizations of singular values (Courant-Fischer): We have
奇异值的变分刻画(柯朗-费希尔定理):
σ k = max V ⊆ R m , dim V = k min v ∈ V , ∥ v ∥ = 1 ∥ M v ∥ \sigma_{k} = \max_{V \subseteq \mathbb{R}^{m}, \dim V = k} \min_{v \in V, \|v\| = 1} \|M v\|σk=V⊆Rm,dimV=kmaxv∈V,∥v∥=1min∥Mv∥
以及
σ k + 1 = min V ⊆ R m , dim V = m − k max v ∈ V , ∥ v ∥ = 1 ∥ M v ∥ \sigma_{k + 1} = \min_{V \subseteq \mathbb{R}^{m}, \dim V = m - k} \max_{v \in V, \|v\| = 1} \|M v\|σk+1=V⊆Rm,dimV=m−kminv∈V,∥v∥=1max∥Mv∥
Proof. For the first characterization, anyk kk-dimensional subspaceV VVintersectss p a n ( v k , … , v m ) span(v_{k}, \dots, v_{m})span(vk,…,vm)nontrivially, hence contains a unit vector of the form
证明:对于第一个刻画,任意k kk 维子空间 V VV 与 s p a n ( v k , … , v m ) span(v_{k}, \dots, v_{m})span(vk,…,vm)的交集非空,因此V VV中必包含形如
v = ∑ i = k m c i v i , ∥ v ∥ = ∑ i = k m c i 2 = 1. v = \sum_{i = k}^{m} c_{i} v_{i}, \|v\| = \sum_{i = k}^{m} c_{i}^{2} = 1.v=i=k∑mcivi,∥v∥=i=k∑mci2=1.
We compute that
计算可得
M v = ∑ i = k m c i σ i u i M v = \sum_{i = k}^{m} c_{i} \sigma_{i} u_{i}Mv=i=k∑mciσiui
and hence that
因此
∥ M v ∥ 2 = ∑ i = k m c i 2 σ i 2 ≤ σ k 2 . \|M v\|^{2} = \sum_{i = k}^{m} c_{i}^{2} \sigma_{i}^{2} \leq \sigma_{k}^{2}.∥Mv∥2=i=k∑mci2σi2≤σk2.
We conclude that everyV VVcontains av vvsuch that∥ M v ∥ ≤ σ k \|M v\| \leq \sigma_{k}∥Mv∥≤σk, hencemin v ∈ V , ∥ v ∥ = 1 ∥ M v ∥ ≤ σ k \min_{v \in V, \|v\| = 1} \|M v\| \leq \sigma_{k}minv∈V,∥v∥=1∥Mv∥≤σk. Equality is obtained whenV = s p a n ( v 1 , … , v k ) V = span(v_{1}, \dots, v_{k})V=span(v1,…,vk).
由此可知,每个子空间V VV中都存在向量v vv 使得 ∥ M v ∥ ≤ σ k \|M v\| \leq \sigma_{k}∥Mv∥≤σk,因此 min v ∈ V , ∥ v ∥ = 1 ∥ M v ∥ ≤ σ k \min_{v \in V, \|v\| = 1} \|M v\| \leq \sigma_{k}minv∈V,∥v∥=1∥Mv∥≤σk。当 V = s p a n ( v 1 , … , v k ) V = span(v_{1}, \dots, v_{k})V=span(v1,…,vk)时,等号成立。
The second characterization is very similar. Anym − k m - km−k-dimensional subspaceV VVintersectss p a n ( v 1 , … , v k + 1 ) span(v_{1}, \dots, v_{k + 1})span(v1,…,vk+1)nontrivially, hence contains a unit vector of the form
第二个刻画的证明非常类似。任意m − k m - km−k 维子空间 V VV 与 s p a n ( v 1 , … , v k + 1 ) span(v_{1}, \dots, v_{k + 1})span(v1,…,vk+1)的交集非空,因此V VV中必包含形如
v = ∑ i = 1 k + 1 c i v i , ∥ v ∥ = ∑ i = 1 k + 1 c i 2 = 1. v = \sum_{i = 1}^{k + 1} c_{i} v_{i}, \|v\| = \sum_{i = 1}^{k + 1} c_{i}^{2} = 1.v=i=1∑k+1civi,∥v∥=i=1∑k+1ci2=1.
v = ∑ i = 1 k + 1 c i v i v = \sum_{i = 1}^{k + 1} c_{i} v_{i}v=i=1∑k+1civi的单位向量,其中∥ v ∥ = ∑ i = 1 k + 1 c i 2 = 1 \|v\| = \sum_{i = 1}^{k + 1} c_{i}^{2} = 1∥v∥=∑i=1k+1ci2=1
M v = ∑ i = 1 k + 1 c i σ i u i M v = \sum_{i = 1}^{k + 1} c_{i} \sigma_{i} u_{i}Mv=i=1∑k+1ciσiui
∥ M v ∥ 2 = ∑ i = 1 k + 1 c i 2 σ i 2 ≥ σ k + 1 2 . \|M v\|^{2} = \sum_{i = 1}^{k + 1} c_{i}^{2} \sigma_{i}^{2} \geq \sigma_{k + 1}^{2}.∥Mv∥2=i=1∑k+1ci2σi2≥σk+12.
We conclude that everyV VVcontains a vectorv vvsuch that∥ M v ∥ ≥ σ k + 1 \|M v\| \geq \sigma_{k + 1}∥Mv∥≥σk+1, hencemax v ∈ V , ∥ v ∥ = 1 ∥ M v ∥ ≥ σ k + 1 \max_{v \in V, \|v\| = 1} \|M v\| \geq \sigma_{k + 1}maxv∈V,∥v∥=1∥Mv∥≥σk+1. Equality is obtained whenV = s p a n ( v k + 1 , … , v m ) V = span(v_{k + 1}, \dots, v_{m})V=span(vk+1,…,vm).
由此可知,每个子空间V VV中都存在向量v vv 使得 ∥ M v ∥ ≥ σ k + 1 \|M v\| \geq \sigma_{k + 1}∥Mv∥≥σk+1,因此 max v ∈ V , ∥ v ∥ = 1 ∥ M v ∥ ≥ σ k + 1 \max_{v \in V, \|v\| = 1} \|M v\| \geq \sigma_{k + 1}maxv∈V,∥v∥=1∥Mv∥≥σk+1。当 V = s p a n ( v k + 1 , … , v m ) V = span(v_{k + 1}, \dots, v_{m})V=span(vk+1,…,vm)时,等号成立。
The second variational characterization above can be used to prove the following important theorem.
上述第二个变分刻画可用于证明下述重要定理。
Low rank approximation (Eckart-Young): IfM = U Σ V T M = U \Sigma V^{T}M=UΣVTis the SVD ofM MM, let M k = U Σ k V T M_{k} = U \Sigma_{k} V^{T}Mk=UΣkVT where Σ k \Sigma_{k}Σkhas diagonal entriesσ 1 , … , σ k \sigma_{1}, \dots, \sigma_{k}σ1,…,σkand all other entries zero. ThenM k M_{k}Mkis the closest matrix toM MMin operator norm with rank at mostk kk; that is,M k M_{k}Mkminimizes∥ M − X ∥ \|M - X\|∥M−X∥subject to the constraint thatr a n k ( X ) ≤ k rank(X) \leq krank(X)≤k. This minimum value isσ k + 1 \sigma_{k + 1}σk+1.
低秩逼近(埃卡特-杨定理):设M = U Σ V T M = U \Sigma V^{T}M=UΣVT 是 M MM的奇异值分解,定义M k = U Σ k V T M_{k} = U \Sigma_{k} V^{T}Mk=UΣkVT,其中 Σ k \Sigma_{k}Σk 的对角元为 σ 1 , … , σ k \sigma_{1}, \dots, \sigma_{k}σ1,…,σk,其余元素均为零。则M k M_{k}Mk是算子范数意义下与M MM最接近的秩至多为k kk 的矩阵;即 M k M_{k}Mk 在约束条件 r a n k ( X ) ≤ k rank(X) \leq krank(X)≤k 下最小化 ∥ M − X ∥ \|M - X\|∥M−X∥,且该最小值为σ k + 1 \sigma_{k + 1}σk+1。
Proof. SupposeX XXis a matrix of rank at mostk kk. Let W = k e r ( X ) W = ker(X)W=ker(X)be the nullspace ofX XX, which by hypothesis has dimension at leastm − k m - km−k. By the second variational characterization above, this means thatW WWcontains a vectorw wwsuch that∥ M w ∥ ≥ σ k + 1 \|M w\| \geq \sigma_{k + 1}∥Mw∥≥σk+1, and sinceX w = 0 X w = 0Xw=0this gives
证明:设 X XX 是秩至多为 k kk 的矩阵,令 W = k e r ( X ) W = ker(X)W=ker(X)(即 X XX的零空间),由假设可知dim W ≥ m − k \dim W \geq m - kdimW≥m−k。根据上述第二个变分刻画,W WW中必存在向量w ww 使得 ∥ M w ∥ ≥ σ k + 1 \|M w\| \geq \sigma_{k + 1}∥Mw∥≥σk+1;又因为 X w = 0 X w = 0Xw=0,因此有
∥ ( M − X ) w ∥ = ∥ M w ∥ ≥ σ k + 1 \|(M - X) w\| = \|M w\| \geq \sigma_{k + 1}∥(M−X)w∥=∥Mw∥≥σk+1
and hence that∥ M − X ∥ ≥ σ k + 1 \|M - X\| \geq \sigma_{k + 1}∥M−X∥≥σk+1. Equality is obtained whenX = M k X = M_{k}X=Mkas defined above.
进而可得 ∥ M − X ∥ ≥ σ k + 1 \|M - X\| \geq \sigma_{k + 1}∥M−X∥≥σk+1。当 X = M k X = M_{k}X=Mk(如上述定义)时,等号成立。
The variational characterizations can also be used to prove the following inequality relating the singular values of two matrices and of their sum, which can be thought of as a quantitative refinement of the observation that the rank of a sumM + N M + NM+Nof two matrices is at most the sum of their ranks.
变分刻画还可用于证明下述不等式——该不等式将两个矩阵及其和的奇异值联系起来,可视为对“两个矩阵之和M + N M + NM+N的秩至多为两矩阵秩之和”这一结论的定量细化。
Additive perturbation (Weyl): LetM MM, N NN be n × m n \times mn×mmatrices with singular valuesσ i ( M ) \sigma_{i}(M)σi(M), σ i ( N ) \sigma_{i}(N)σi(N). Then
加性扰动(外尔不等式):设M MM、N NN 均为 n × m n \times mn×m矩阵,其奇异值分别为σ i ( M ) \sigma_{i}(M)σi(M)、σ i ( N ) \sigma_{i}(N)σi(N),则
σ k + ℓ + 1 ( M + N ) ≤ σ k + 1 ( M ) + σ ℓ + 1 ( N ) . \sigma_{k + \ell + 1}(M + N) \leq \sigma_{k + 1}(M) + \sigma_{\ell + 1}(N).σk+ℓ+1(M+N)≤σk+1(M)+σℓ+1(N).
Proof. We want to boundσ k + ℓ + 1 ( M + N ) \sigma_{k + \ell + 1}(M + N)σk+ℓ+1(M+N)in terms of the singular values ofM MM and N NN. By the second variational characterization, we have
证明:我们希望用M MM 和 N NN的奇异值来估计σ k + ℓ + 1 ( M + N ) \sigma_{k + \ell + 1}(M + N)σk+ℓ+1(M+N)。根据第二个变分刻画,有
σ k + ℓ + 1 ( M + N ) = min V ⊆ R m , dim V = m − k − ℓ max v ∈ V , ∥ v ∥ = 1 ∥ ( M + N ) v ∥ . \sigma_{k + \ell + 1}(M + N) = \min_{V \subseteq \mathbb{R}^{m}, \dim V = m - k - \ell} \max_{v \in V, \|v\| = 1} \|(M + N) v\|.σk+ℓ+1(M+N)=V⊆Rm,dimV=m−k−ℓminv∈V,∥v∥=1max∥(M+N)v∥.
To give an upper bound on a minimum value of a function, we just need to give an upper bound on some value that it takes. LetV M V_{M}VM and V N V_{N}VNbe the subspaces ofR m \mathbb{R}^{m}Rmof dimensionsm − k m - km−k, m − ℓ m - \ellm−ℓrespectively which achieve the minimum values ofmax v ∈ V M , ∥ v ∥ = 1 ∥ M v ∥ \max_{v \in V_{M}, \|v\| = 1} \|M v\|maxv∈VM,∥v∥=1∥Mv∥ and max v ∈ V N , ∥ v ∥ = 1 ∥ N v ∥ \max_{v \in V_{N}, \|v\| = 1} \|N v\|maxv∈VN,∥v∥=1∥Nv∥respectively, and letW = V M ∩ V N W = V_{M} \cap V_{N}W=VM∩VNbe their intersection. This intersection has dimension at leastm − k − ℓ m - k - \ellm−k−ℓ, and by construction
要对一个函数的最小值给出上界,只需对该函数的某个取值给出上界即可。设V M V_{M}VM 和 V N V_{N}VN 分别是 R m \mathbb{R}^{m}Rm 中维数为 m − k m - km−k 和 m − ℓ m - \ellm−ℓ的子空间,且分别使得max v ∈ V M , ∥ v ∥ = 1 ∥ M v ∥ \max_{v \in V_{M}, \|v\| = 1} \|M v\|maxv∈VM,∥v∥=1∥Mv∥ 和 max v ∈ V N , ∥ v ∥ = 1 ∥ N v ∥ \max_{v \in V_{N}, \|v\| = 1} \|N v\|maxv∈VN,∥v∥=1∥Nv∥取得最小值。令W = V M ∩ V N W = V_{M} \cap V_{N}W=VM∩VN(即两子空间的交集),则dim W ≥ m − k − ℓ \dim W \geq m - k - \elldimW≥m−k−ℓ;根据构造,有
max v ∈ W , ∥ v ∥ = 1 ∥ M v + N v ∥ ≤ max v ∈ W , ∥ v ∥ = 1 ∥ M v ∥ + max v ∈ W , ∥ v ∥ = 1 ∥ N v ∥ ≤ σ k + 1 ( M ) + σ ℓ + 1 ( N ) . \max_{v \in W, \|v\| = 1} \|M v + N v\| \leq \max_{v \in W, \|v\| = 1} \|M v\| + \max_{v \in W, \|v\| = 1} \|N v\| \leq \sigma_{k + 1}(M) + \sigma_{\ell + 1}(N).v∈W,∥v∥=1max∥Mv+Nv∥≤v∈W,∥v∥=1max∥Mv∥+v∈W,∥v∥=1max∥Nv∥≤σk+1(M)+σℓ+1(N).
Since W WWhas dimension at leastm − k − ℓ m - k - \ellm−k−ℓ, the above is an upper bound on the value ofmax v ∈ V , ∥ v ∥ = 1 ∥ ( M + N ) v ∥ \max_{v \in V, \|v\| = 1} \|(M + N) v\|maxv∈V,∥v∥=1∥(M+N)v∥for any( m − k − ℓ ) (m - k - \ell)(m−k−ℓ)-dimensional subspaceV ⊆ W V \subseteq WV⊆W, from which the conclusion follows.
由于 W WW的维数至少为m − k − ℓ m - k - \ellm−k−ℓ,上述不等式给出了对任意( m − k − ℓ ) (m - k - \ell)(m−k−ℓ) 维子空间 V ⊆ W V \subseteq WV⊆W 而言,max v ∈ V , ∥ v ∥ = 1 ∥ ( M + N ) v ∥ \max_{v \in V, \|v\| = 1} \|(M + N) v\|maxv∈V,∥v∥=1∥(M+N)v∥的上界,由此可推出所需结论。
The slightly curious off-by-one indexing in the above inequality can be understood as follows: ifσ k + 1 ( M ) \sigma_{k + 1}(M)σk+1(M) and σ ℓ + 1 ( N ) \sigma_{\ell + 1}(N)σℓ+1(N)are both very small, this means thatM MM and N NNare close to matrices of rank at mostk kk and ℓ \ellℓrespectively, and henceM + N M + NM+Nis close to a matrix of rank at mostk + ℓ k + \ellk+ℓ, henceσ k + ℓ + 1 ( M + N ) \sigma_{k + \ell + 1}(M + N)σk+ℓ+1(M+N)also ought to be small.
上述不等式中略显特别的“错位1”下标可解释如下:若σ k + 1 ( M ) \sigma_{k + 1}(M)σk+1(M) 和 σ ℓ + 1 ( N ) \sigma_{\ell + 1}(N)σℓ+1(N)都特别小,则意味着M MM接近秩至多为k kk 的矩阵,N NN接近秩至多为ℓ \ellℓ的矩阵,因此M + N M + NM+N接近秩至多为k + ℓ k + \ellk+ℓ的矩阵,进而σ k + ℓ + 1 ( M + N ) \sigma_{k + \ell + 1}(M + N)σk+ℓ+1(M+N)也应当很小。
Settingℓ = 0 \ell = 0ℓ=0in the additive perturbation inequality we deduce the following corollary.
在加性扰动不等式中令ℓ = 0 \ell = 0ℓ=0,可推出如下推论。
Singular values are Lipschitz: The singular values, as functions on matrices, are uniformly Lipschitz with respect to the operator norm with Lipschitz constant 1: that is,
奇异值的 Lipschitz 连续性:奇异值作为定义在矩阵上的函数,关于算子范数是一致 Lipschitz 连续的,且 Lipschitz 常数为 1,即
∣ σ k ( M ) − σ k ( N ) ∣ ≤ ∥ M − N ∥ . \left|\sigma_{k}(M) - \sigma_{k}(N)\right| \leq \|M - N\|.∣σk(M)−σk(N)∣≤∥M−N∥.
Proof. Apply additive perturbation twice withℓ = 0 \ell = 0ℓ=0, first to get
证明:令 ℓ = 0 \ell = 0ℓ=0,两次应用加性扰动不等式。第一次应用可得
σ k ( M ) ≤ σ k ( N ) + σ 1 ( M − N ) \sigma_{k}(M) \leq \sigma_{k}(N) + \sigma_{1}(M - N)σk(M)≤σk(N)+σ1(M−N)
(remembering thatσ 1 \sigma_{1}σ1is the operator norm), and second to get
(注意 σ 1 \sigma_{1}σ1即为算子范数);第二次应用可得
σ k ( N ) ≤ σ k ( M ) + σ 1 ( N − M ) \sigma_{k}(N) \leq \sigma_{k}(M) + \sigma_{1}(N - M)σk(N)≤σk(M)+σ1(N−M)
(remembering that 注意σ 1 ( N − M ) = σ 1 ( M − N ) \sigma_{1}(N - M) = \sigma_{1}(M - N)σ1(N−M)=σ1(M−N)).
Combining these two inequalities gives∣ σ k ( M ) − σ k ( N ) ∣ ≤ ∥ M − N ∥ \left|\sigma_{k}(M) - \sigma_{k}(N)\right| \leq \|M - N\|∣σk(M)−σk(N)∣≤∥M−N∥.
将这两个不等式结合,即可得到∣ σ k ( M ) − σ k ( N ) ∣ ≤ ∥ M − N ∥ \left|\sigma_{k}(M) - \sigma_{k}(N)\right| \leq \|M - N\|∣σk(M)−σk(N)∣≤∥M−N∥。
This is very much not the case with eigenvalues: a small perturbation of a square matrix can have a large effect on its eigenvalues. This is explained e.g. in thisblog postby Terence Tao, and is related topseudospectra.
这一性质与特征值形成鲜明对比:方阵的微小扰动可能导致其特征值发生巨大变化。陶哲轩(Terence Tao)在其博客文章中对此进行了阐述,该现象与伪谱相关。
Settingσ ℓ + 1 ( N ) = 0 \sigma_{\ell + 1}(N) = 0σℓ+1(N)=0(or equivalentlyr a n k ( N ) ≤ ℓ rank(N) \leq \ellrank(N)≤ℓ) in the additive perturbation inequality, we deduce the following corollary.
在加性扰动不等式中令σ ℓ + 1 ( N ) = 0 \sigma_{\ell + 1}(N) = 0σℓ+1(N)=0(等价于 r a n k ( N ) ≤ ℓ rank(N) \leq \ellrank(N)≤ℓ),可推出如下推论。
Interlacing: SupposeM MM, N NNare matrices such thatr a n k ( M − N ) ≤ ℓ rank(M - N) \leq \ellrank(M−N)≤ℓ. Then
交错性:设 M MM、N NN为矩阵,且满足r a n k ( M − N ) ≤ ℓ rank(M - N) \leq \ellrank(M−N)≤ℓ,则
σ k + ℓ ( M ) ≤ σ k ( N ) ≤ σ k − ℓ ( M ) . \sigma_{k + \ell}(M) \leq \sigma_{k}(N) \leq \sigma_{k - \ell}(M).σk+ℓ(M)≤σk(N)≤σk−ℓ(M).
(Here, we takeσ i ( A ) = 0 \sigma_{i}(A) = 0σi(A)=0 if i > r a n k ( A ) i > rank(A)i>rank(A) and σ i ( A ) = ∞ \sigma_{i}(A) = \inftyσi(A)=∞ if i < 1 i < 1i<1for any matrixA AA.)
(对于任意矩阵A AA,此处约定:若i > r a n k ( A ) i > rank(A)i>rank(A),则 σ i ( A ) = 0 \sigma_{i}(A) = 0σi(A)=0;若 i < 1 i < 1i<1,则 σ i ( A ) = ∞ \sigma_{i}(A) = \inftyσi(A)=∞。)
Proof. Apply additive perturbation twice, first to get
证明:两次应用加性扰动不等式。第一次应用可得
σ k + ℓ ( M ) = σ ( k ) + ( ℓ ) + 1 − 1 ( M ) ≤ σ k ( N ) + σ ℓ + 1 ( M − N ) = σ k ( N ) \sigma_{k + \ell}(M) = \sigma_{(k) + (\ell) + 1 - 1}(M) \leq \sigma_{k}(N) + \sigma_{\ell + 1}(M - N) = \sigma_{k}(N)σk+ℓ(M)=σ(k)+(ℓ)+1−1(M)≤σk(N)+σℓ+1(M−N)=σk(N)
σ k + ℓ ( M ) ≤ σ k ( N ) + σ ℓ + 1 ( M − N ) = σ k ( N ) \sigma_{k + \ell}(M) \leq \sigma_{k}(N) + \sigma_{\ell + 1}(M - N) = \sigma_{k}(N)σk+ℓ(M)≤σk(N)+σℓ+1(M−N)=σk(N)
(since 由于σ ℓ + 1 ( M − N ) = 0 \sigma_{\ell + 1}(M - N) = 0σℓ+1(M−N)=0because 故r a n k ( M − N ) ≤ ℓ rank(M - N) \leq \ellrank(M−N)≤ℓ), and second to get 第二次应用可得
σ k ( N ) ≤ σ k − ℓ ( M ) + σ ℓ + 1 ( N − M ) = σ k − ℓ ( M ) \sigma_{k}(N) \leq \sigma_{k - \ell}(M) + \sigma_{\ell + 1}(N - M) = \sigma_{k - \ell}(M)σk(N)≤σk−ℓ(M)+σℓ+1(N−M)=σk−ℓ(M)
(sinceσ ℓ + 1 ( N − M ) = σ ℓ + 1 ( M − N ) = 0 \sigma_{\ell + 1}(N - M) = \sigma_{\ell + 1}(M - N) = 0σℓ+1(N−M)=σℓ+1(M−N)=0).
This completes the proof.
至此证明完毕。
Interlacing gives us some control over what happens to the singular values under a low-rank perturbation (as opposed to a low-norm perturbation; a low-rank perturbation may have arbitrarily high norm, and vice versa). For example, we learn that if all of the singular values ofM MMare clumped together, then a rank-ℓ \ellℓperturbation will keep most of the singular values clumped together, except possibly for either theℓ \ellℓlargest orℓ \ellℓsmallest singular values. We can’t expect any control over these, since in the worst case a rank-ℓ \ellℓperturbation can make theℓ \ellℓlargest singular values arbitrarily large, or make theℓ \ellℓsmallest singular values arbitrarily small.
交错性使大家能够控制低秩扰动下奇异值的变化(与低范数扰动不同:低秩扰动的范数可能任意大,反之亦然)。例如,若M MM的所有奇异值都聚集在一起,则秩为ℓ \ellℓ的扰动会使大部分奇异值仍保持聚集状态,仅可能影响最大的ℓ \ellℓ 个或最小的 ℓ \ellℓ个奇异值。我们无法对这部分奇异值的变化进行控制,由于在最坏情况下,秩为ℓ \ellℓ的扰动可使最大的ℓ \ellℓ个奇异值变得任意大,或使最小的ℓ \ellℓ个奇异值变得任意小。
A particular special case of a low-rank perturbation is deleting a small number of rows or columns (note that a row or column which is entirely zero does not affect the singular values, so deleting a row or column is equivalent to setting all of its entries to zero), in which case the upper bound above can be tightened.
低秩扰动的一个特殊情形是删除少量行或列(注意:全零行或全零列不影响奇异值,因此删除一行或一列等价于将该行或列的所有元素设为零),此时上述上界可进一步收紧。
Cauchy interlacing: SupposeM MMis a matrix andN NNis obtained fromM MMby deleting at mostℓ \ellℓrows. Then
柯西交错性:设M MM 为矩阵,N NN 是由 M MM 删除至多 ℓ \ellℓ行后得到的矩阵,则
σ k ( M ) ≥ σ k ( N ) ≥ σ k + ℓ ( M ) \sigma_{k}(M) \geq \sigma_{k}(N) \geq \sigma_{k + \ell}(M)σk(M)≥σk(N)≥σk+ℓ(M)
Proof. The lower bound follows from interlacing (since deletingℓ \ellℓrows is a rank-ℓ \ellℓperturbation). The upper bound follows from the observation that we have∥ N v ∥ ≤ ∥ M v ∥ \|N v\| \leq \|M v\|∥Nv∥≤∥Mv∥for allv vv, then applying either variational characterization of the singular values.
证明:下界可由交错性推出(删除ℓ \ellℓ 行属于秩为 ℓ \ellℓ的扰动)。上界可由如下观察推出:对所有向量v vv,有 ∥ N v ∥ ≤ ∥ M v ∥ \|N v\| \leq \|M v\|∥Nv∥≤∥Mv∥;再应用奇异值的任意一个变分刻画,即可得到上界。
Cauchy interlacing also applies to deleting columns, or combinations of rows and columns, because the singular values are unchanged by transposition. In particular, we learn that ifN NNis obtained fromM MMby deleting either a single row or a single column, then the singular values ofN NNinterlace with the singular values ofM MM, hence the name.
柯西交错性同样适用于删除列或同时删除行和列的情形,因为转置不改变矩阵的奇异值。特有地,若N NN 是由 M MM删除一行或一列得到的矩阵,则N NN 的奇异值与 M MM的奇异值满足交错关系,“交错性”由此得名。
In particular, if all of the singular values ofM MMare clumped together then so are those ofN NN, with no exceptions. Taking the contrapositive, if the singular values ofN NNare spread out, then the singular values ofM MMmust be as well.
特别地,若 M MM的所有奇异值都聚集在一起,则N NN的所有奇异值也必然聚集在一起,无一例外。取其逆否命题:若N NN的奇异值分散,则M MM的奇异值也必然分散。
Three special cases
三种特殊情形
Three special cases of the general singular value decompositionM = U Σ V T M = U \Sigma V^{T}M=UΣVTare worth pointing out.
一般形式的奇异值分解M = U Σ V T M = U \Sigma V^{T}M=UΣVT有三种特殊情形值得关注。
First, ifM MMhas orthogonal columns, or equivalently ifM T M M^{T} MMTMis diagonal, then the singular valuesσ i \sigma_{i}σiare the lengths of its columns, we can take the right singular vectors to be the standard basis vectorsv i = e i v_{i} = e_{i}vi=eiand we can take the left singular vectors to be the unit rescalings of its columns. This means that we can takeV = I V = IV=Ito be the identity matrix, and in general suggests that∥ I − V ∥ \|I - V\|∥I−V∥is a measure of the extent to which the columns ofM MMfail to be orthogonal (with the caveat thatV VVis not unique and so in general we would want to look at theV VVclosest toI II).
第一种情形:若M MM的列向量正交(等价于M T M M^{T} MMTM为对角矩阵),则奇异值σ i \sigma_{i}σi等于各列向量的长度;右奇异向量可取标准基向量v i = e i v_{i} = e_{i}vi=ei;左奇异向量可取各列向量单位化后的向量。这意味着大家可令V = I V = IV=I(单位矩阵),且通常而言,∥ I − V ∥ \|I - V\|∥I−V∥ 可用于衡量 M MM的列向量偏离正交性的程度(需注意:V VV并非唯一,因此通常需选取与I II 最接近的 V VV)。
Second, ifM MMhas orthogonal rows, or equivalently ifM M T M M^{T}MMTis diagonal, then the singular valuesσ i \sigma_{i}σiare the lengths of its rows, we can take the left singular vectors to be the standard basis vectorsu i = e i u_{i} = e_{i}ui=ei, and we can take the right singular vectors to be the unit rescalings of its rows. This means that we can takeU = I U = IU=Ito be the identity matrix, and in general suggests that∥ I − U ∥ \|I - U\|∥I−U∥is a measure of the extent to which the rows ofM MMfail to be orthogonal (with the same caveat as above).
第二种情形:若M MM的行向量正交(等价于M M T M M^{T}MMT为对角矩阵),则奇异值σ i \sigma_{i}σi等于各行向量的长度;左奇异向量可取标准基向量u i = e i u_{i} = e_{i}ui=ei;右奇异向量可取各行向量单位化后的向量。这意味着我们可令U = I U = IU=I(单位矩阵),且通常而言,∥ I − U ∥ \|I - U\|∥I−U∥ 可用于衡量 M MM的行向量偏离正交性的程度(需注意:与上述情形相同,U UU并非唯一)。
Finally, ifM MMis square and an orthogonal matrix, so thatM T M = M M T = I M^{T} M = M M^{T} = IMTM=MMT=I, then the singular valuesσ i \sigma_{i}σiare all equal to 1, and an arbitrary choice of either the left or the right singular vectors uniquely determines the other. This means that we can takeΣ = I \Sigma = IΣ=Ito be the identity matrix, and in general suggests that∥ I − Σ ∥ \|I - \Sigma\|∥I−Σ∥is a measure of the extent to whichM MMfails to be orthogonal. In fact it is possible to show that the closest orthogonal matrix toM = U Σ V T M = U \Sigma V^{T}M=UΣVTis given byU V T U V^{T}UVT, or in other words by replacing all of the singular values ofM MMwith 1, so
第三种情形:若M MM是方阵且为正交矩阵(即满足M T M = M M T = I M^{T} M = M M^{T} = IMTM=MMT=I),则所有奇异值σ i \sigma_{i}σi均等于 1;且任意选定左奇异向量或右奇异向量后,另一组奇异向量会被唯一确定。这意味着我们可令Σ = I \Sigma = IΣ=I(单位矩阵),且通常而言,∥ I − Σ ∥ \|I - \Sigma\|∥I−Σ∥ 可用于衡量 M MM偏离正交性的程度。事实上可证明:与M = U Σ V T M = U \Sigma V^{T}M=UΣVT最接近的正交矩阵为U V T U V^{T}UVT,换言之,是将M MM的所有奇异值替换为 1 后得到的矩阵,因此
∥ I − Σ ∥ = max i ∣ 1 − σ i ∣ \|I - \Sigma\| = \max_{i} \left|1 - \sigma_{i}\right|∥I−Σ∥=imax∣1−σi∣
is precisely the distance fromM MMto the nearest orthogonal matrix. This fact can be used to solve the orthogonal Procrustes problem (https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem).
∥ I − Σ ∥ \|I - \Sigma\|∥I−Σ∥ 恰好等于 M MM到最近正交矩阵的距离。这一事实可用于求解正交普罗克拉斯提斯疑问(https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)。
In general, we should expect that the SVD of a matrixM MMis relevant to answering any question aboutM MMwhose answer is invariant under left and right multiplication by orthogonal matrices. This includes, for example, the question of low-rank approximations toM MMwith respect to operator norm we answered above, since both rank and operator norm are invariant.
一般而言,对于任何关于矩阵M MM的问题,若其答案在M MM左右两侧均乘以正交矩阵后保持不变,则M MM的奇异值分解(SVD)必然与该问题的解答相关。例如,我们前文讨论的“算子范数意义下M MM的低秩逼近”障碍就属于此类——由于矩阵的秩和算子范数均具有正交不变性。
Posted in math.SP | 4 Comments
4 Responses
Ramsay
on March 18, 2017 at 11:17 am | Reply
Btw. In the proof of interlacing, I don’t see the first displayed equality:σ k ( N ) + σ ℓ + 1 ( M − N ) = σ k + ℓ ( N ) \sigma_{k}(N)+\sigma_{\ell+1}(M-N)=\sigma_{k+\ell}(N)σk(N)+σℓ+1(M−N)=σk+ℓ(N). What am I missing?
顺便提一句:在交错性的证明中,我无法理解第一个显式等式σ k ( N ) + σ ℓ + 1 ( M − N ) = σ k + ℓ ( N ) \sigma_{k}(N)+\sigma_{\ell+1}(M-N)=\sigma_{k+\ell}(N)σk(N)+σℓ+1(M−N)=σk+ℓ(N),我哪里理解错了?
Qiaochu Yuan
on March 18, 2017 at 2:14 pm | Reply
Oops, that’s a typo; it should just beσ k ( N ) \sigma_{k}(N)σk(N)on the RHS.
哦,那是一个笔误;等式右边应该只有σ k ( N ) \sigma_{k}(N)σk(N)。
on March 18, 2017 at 11:14 am | Reply
Thanks for that concise and clear introduction to the SVD. I do not understand why it is often not even touched in a first class in linear algebra. It seems to me that it would make sense to introduce it even before the spectral theorem.
感谢你对奇异值分解(SVD)简洁清晰的介绍。我不明白为什么线性代数入门课程通常甚至不会提及它——在我看来,甚至应该在谱定理之前介绍奇异值分解才合理。
Regarding “weighted projections”: up to a scale factor (i.e., a single weightσ 1 \sigma_{1}σ1), you can view a linear transformationT : X → Y T: X \to YT:X→Ybetween Euclidean spaces as an orthogonal projection. Specifically, ifX XX and Y YYare subspaces ofR N \mathbb{R}^{N}RNof dimensionn nn and m mmrespectively, andP Y ∣ X P_{Y|X}PY∣Xis the orthogonal projection ontoY YYrestricted toX XX, then the singular values ofP Y ∣ X P_{Y|X}PY∣Xare the cosines of the principal angles (https://en.wikipedia.org/wiki/Angles_between_flats). IfN ≥ m + n N \geq m + nN≥m+n, then these singular values can take any value in[ 0 , 1 ] [0,1][0,1]. So we see that anyT : X → Y T: X \to YT:X→Ycan be represented asσ 1 ( T ) P Y ∣ X \sigma_{1}(T) P_{Y|X}σ1(T)PY∣Xfor an appropriate choice ofX , Y X,YX,Yas subspaces ofR m + n − 1 \mathbb{R}^{m + n - 1}Rm+n−1.
关于“加权投影”:在相差一个比例因子(即单个权重σ 1 \sigma_{1}σ1)的意义下,欧几里得空间之间的线性变换T : X → Y T: X \to YT:X→Y可视为正交投影。具体而言,若X XX 和 Y YY 分别是 R N \mathbb{R}^{N}RN 中维数为 n nn 和 m mm的子空间,且P Y ∣ X P_{Y|X}PY∣X 是“到 Y YY的正交投影”在X XX上的限制,则P Y ∣ X P_{Y|X}PY∣X的奇异值等于主角度的余弦值(https://en.wikipedia.org/wiki/Angles_between_flats)。若N ≥ m + n N \geq m + nN≥m+n,则这些奇异值可取[ 0 , 1 ] [0,1][0,1]中的任意值。因此可知,对R m + n − 1 \mathbb{R}^{m + n - 1}Rm+n−1中适当选取的子空间X , Y X,YX,Y,任意线性变换T : X → Y T: X \to YT:X→Y 均可表示为 σ 1 ( T ) P Y ∣ X \sigma_{1}(T) P_{Y|X}σ1(T)PY∣X。
Also, regarding the best orthogonal transformation to representM MM, it is worth pointing out that you are talking about the orthogonal factor in thepolar decomposition, which is an immediate consequence of the SVD. We can always represent our matrixM MMas a composition of an orthogonal matrix and a positive semidefinite matrix:M = Φ R = R ′ Φ M = \Phi R = R' \PhiM=ΦR=R′Φ where Φ = U V T \Phi = U V^{T}Φ=UVT and R = V Σ V T R = V \Sigma V^{T}R=VΣVT and R ′ = U Σ U T R' = U \Sigma U^{T}R′=UΣUT.
此外,关于“表示M MM的最佳正交变换”,值得指出的是:你所讨论的是极分解中的正交因子,而极分解是奇异值分解(SVD)的直接推论。我们总可将矩阵M MM表示为一个正交矩阵与一个半正定矩阵的乘积:M = Φ R = R ′ Φ M = \Phi R = R' \PhiM=ΦR=R′Φ,其中 Φ = U V T \Phi = U V^{T}Φ=UVT,R = V Σ V T R = V \Sigma V^{T}R=VΣVT,R ′ = U Σ U T R' = U \Sigma U^{T}R′=UΣUT。
Ammar Husain
on March 15, 2017 at 2:57 am | Reply
Thinking of the QR algorithm and the Toda lattice, you gain when you swap and refactor repeatedly as dressing transformations. Haven’t thought about what if anything reasonable happens when you permute the factors in a square SVD.
否会产生任何有意义的结果。就是考虑到 QR 算法和托达(Toda)格子,当你将“交换因子”和“重新分解”作为修饰变换(dressing transformations)反复进行时,会得到一些有用的结果。但我尚未思考过:对 square SVD(方阵的奇异值分解)中的因子进行置换时,
via:
- Singular value decomposition | Annoying Precision
https://qchu.wordpress.com/2017/03/13/singular-value-decomposition/
浙公网安备 33010602011771号