Linear Algebra 期末复习
Linear System
consistent/inconsistent, coefficient matrix, augmented matrix
Elementary row operations (ERO)
- replacement: replace one row by the sum of itself and a multiple of another row. (\(A_i\to A_i+rA_j\))
- interchange: interchange two rows. (\(A_i\to A_j,A_j\to A_i\))
- scaling: multiple a nonzero scaler to a row. (\(A_i\to rA_i\))
echelon form, reduced echelon form, pivot position
Def. A matrix is in an echelon form if the following conditions hold.
- all nonzero rows are above all zero rows;
- each leading entry (the leftmost nonzero entry) of a row is in a column to the right of the leading entry of the row above it.
- (a consequence) all entries in a column below leading entry are zeros.
Def. A matrix is in reduced row echelon form if the following additional conditions are satisfied.
- the leading entry in each row is \(1\).
- each leading \(1\) is the only nonzero entry in its column.
Def. A pivot position in a matrix \(A\) is a location in \(A\) that corresponds to a leading \(1\) in the reduced echelon form of \(A\) (i.e. a leading entry of a row in the echelon form of \(A\)). A pivot column is a column of \(A\) containing a pivot position.
The row reduction algorithm to solve a linear system
Vector Spaces
field, vector space, subspace
\(F\) is called a field if it's a set where \(+\) and \(\cdot\) are defined.
\(V\) is called a vector space if it's a set where addition and scalar multiplication are defined.
If \(W\subseteq V\), and \(W\) is still a vector space with addition and scalar multiplication from \(V\), then \(W\) is called subspace of \(V\).
Thm. \(W\) is a subspace of \(V\) if and only if \(0\in W\), and \(W\) is closed under addition and scalar multiplication.
linear dependent/linear independent
Thm. Suppose \(S\) is linearly independent, then \(S\cup \{v\}\) is linearly dependent if and only if \(v\in \text{span}(S)\).
basis, dimension
Thm. Suppose \(W\) is a subspace of \(V\) (containing more than one vector). Then any linearly independent set in \(H\) can be expanded, if necessary, to a basis for \(H\).
Thm.(The basis theorem) Let \(V\) be a \(p\)-dimensional vector space, then any linearly independent set of exactly \(p\) elements in \(V\) is automatically a basis for \(V\).
Any set of exactly \(p\) elements that spans \(V\) is automatically a basis for \(V\).
Linear Transformation
linear transformation, range/image, kernel/null space, rank, nullity
Thm.(Dimension Theorem) Suppose \(T:V\to W\), then \(\text{nullity}(T)+\text{rank}(T)=\dim(V)\).
one-to-one, onto
Thm. Suppose \(\{v_1,v_2,\cdots,v_n\}\) is a basis for \(V\). For \(w_1,w_2,\cdots,w_n\in W\), there exists exactly one linear transformation \(T:V\to W\) such that \(T(v_i)=w_i\) for \(i=1,2,\cdots,n\).
coordinate vector of \(x\) relative to \(\beta\)
Let \(\beta=\set{u_1,u_2,\cdots,u_n}\) be an ordered basis for a finite-dimensional vector space \(V\). For \(x\in V\), let \(a_1,a_2,\cdots,a_n\) be the unique scalars such that \(x=\sum_{i=1}^{n}a_iu_i\). We define the coordinate vector of \(x\) relative to \(\beta\)(坐标向量), denoted \([x]_{\beta}\), by \([x]_{\beta}=\begin{pmatrix}a_1&a_2&\cdots&a_n\end{pmatrix}^T\).
the vector space of all linear transformations from \(V\) to \(W\)
We denote the vector space of all linear transformations from \(V\) into \(W\) by \(\mathcal L(V,W)\). In the case that \(V=W\), we write \(\mathcal L(V)\) instead of \(\mathcal L(V,W)\).
matrix representation
Def. Suppose that \(V\) and \(W\) are finite-dimensional vector spaces with ordered bases \(\beta=\set{v_1,v_2,\cdots,v_n}\) and \(\gamma=\set{w_1,w_2,\cdots,w_m}\), respectively. Let \(T:V\to W\) be linear. Then for each \(j\), \(1\le j\le n\), there exist unique scalars \(a_{ij}\in F,1\le i\le m\), such that \(T(v_j)=\sum_{i=1}^ma_{ij}w_i\) for \(1\le j\le n\). We call the \(m\times n\) matrix \(A\) defined by \(A_{ij}=a_{ij}\) the matrix representation of \(T\) in the ordered bases \(\beta\) and \(\gamma\) and write \(A=[T]_{\beta}^{\gamma}\).
Thm. Let \(T:V\to W\) be linear, and let \(\beta,\gamma\) be ordered bases for \(V,W\), respectively. Then we have \([T(x)]_{\gamma}=[T]_{\beta}^{\gamma}[x]_{\beta}\).
Thm. Let \(T:V\to W\) and \(U:W\to Z\) be linear transformations, and let \(\alpha,\beta,\gamma\) be ordered bases for \(V,W\) and \(Z\), respectively. Then we have \([UT]_{\alpha}^{\gamma}=[U]_{\beta}^{\gamma}[T]_{\alpha}^{\beta}\).
left-multiplication transformation
Def. Let \(A\) be an \(m\times n\) matrix with entries from a field \(F\). We denote by \(L_A\) the mapping \(F^{n}\to F^{m}\) defined by \(L_A(x)=Ax\) (the matrix product of \(A\) and \(x\)) for each column vector \(x\in F^n\). We call \(L_A\) a left-multiplication transformation.
inverse, invertible
Thm. A linear transformation is invertible if and only if it’s both one-to-one and onto.
Thm. Let \(T:V\to W\) be a linear transformation. If \(\dim(V)=\dim(W)\), then \(T\) is invertible if and only if \(\text{rank}(T)=\dim (V)\).
isomorphic, isomorphism
Def. Let \(V\) and \(W\) be vector spaces. We say \(V\) is isomorphic to \(W\) if there exists a linear transformation \(T:V\to W\) that is invertible. Such a linear transformation is called an isomorphism from \(V\) onto \(W\).
Thm. Let \(V\) and \(W\) be finite-dimensional vector spaces over the same field. Then \(V\) is isomorphic to \(W\) if and only if \(\dim(V)=\dim(W)\).
standard representation of \(V\) with respect to \(\beta\)
Def. Let \(\beta\) be an ordered basis for an \(n\)-dimensional vector space \(V\) over a field \(F\). The standard representation of \(V\) with respect to \(\beta\) is the function \(\phi_{\beta}:V\to F^n\) defined by \(\phi_{\beta}(x)=[x]_{\beta}\) for each \(x\in V\).
change of coordinate matrix
Def. Let \(\beta\) and \(\beta'\) be two ordered bases for a finite-dimensional vector space \(V\). The matrix \(Q=[I_V]_{\beta'}^{\beta}\) is called a change of coordinate matrix, and we say that \(Q\) changes \(\beta'\)-coordinates into \(\beta\)-coordinates.
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\), and let \(\beta\) and \(\beta'\) be ordered bases of \(V\). Suppose that \(Q\) is the change of coordinate matrix that changes \(\beta'\)-coordinates into \(\beta\)-coordinates, then \([T]_{\beta'}=Q^{-1}[T]_{\beta}Q\).
similar
linear operator, linear functional
coordinate function, dual space, dual basis
Def. Let \(V\) be a finite-dimensional vector space and let \(\beta=\set{x_1,\cdots,x_n}\) be an ordered basis for \(V\). For each \(i=1,2,\cdots,n\), define \(f_i(x)=a_i\), where \([x]_{\beta}=\begin{pmatrix}a_1&a_2&\cdots&a_n\end{pmatrix}^T\) is the coordinate vector of \(x\) relative to \(\beta\). Then \(f_i\) is a linear functional on \(V\) called the \(i\)-th coordinate function with respect to the basis \(\beta\).
Def. For a vector space \(V\) over \(F\), we define the dual space of \(V\) to be the vector space \(\mathcal L(V,F)\), denoted by \(V^{\ast}\). We also define the double dual space \(V^{**}\) of \(V\) to be the dual space of \(V^{*}\).
Thm. Suppose that \(V\) is a finite-dimensional vector space with the ordered basis \(\beta=\set{x_1,\cdots,x_n}\). Let \(f_i(1\le i\le n)\) be the \(i\)-th coordinate function with respect to \(\beta\), and let \(\beta^{*}=\set{f_1,\cdots,f_n}\). Then \(\beta^{*}\) is an ordered basis for \(V^{}\), and for any \(f\in V^{\ast}\), we have \(f=\sum_{i=1}^{n}f(x_i)f_i\). The ordered basis \(\beta^{*}\) is called the dual basis of \(\beta\).
Thm. Let \(V\) be a finite-dimensional vector space. For a vector \(x\in V\), we define \(\hat x:V^{*}\to F\) by \(\hat x(f)=f(x)\) for every \(f\in V^{*}\). Define \(\psi:V\to V^{}\) by \(\psi(x)=\hat x\). Then \(\psi\) is an isomorphism.
Matrix
rank
Def. If \(A\in M_{m\times n}(F)\), we define the rank of \(A\), denoted \(\text{rank}(A)\), to be the rank of the linear transformation \(L_A:F^{n}\to F^{m}\).
Thm. Elementary row and column operations on a matrix are rank-preserving.
所有的ERO可以用左乘一个矩阵描述,这些矩阵的秩都是 \(n\)。
交换 \(i,j\) 两行:在单位矩阵的基础上,把 \((i,i),(j,j)\) 置 \(0\),把 \((i,j),(j,i)\) 置 \(1\);
把第 \(i\) 行的 \(k\) 倍加到第 \(j\) 行上:在单位矩阵的基础上,把 \((j,i)\) 置 \(k\)。
把第 \(i\) 行翻 \(k\) 倍:在单位矩阵的基础上,把 \((i,i)\) 置 \(k\)。
Thm. The rank of any matrix equals the maximum number of its linearly independent columns; that is, the rank of a matrix is the dimension of the subspace generated by its columns.
Corollary. The rank of a matrix equals the number of pivot columns.
Thm. Let \(A\) be an \(m\times n\) matrix of rank \(r\). Then \(r\le m,r\le n\), and by means of a finite number of elementary row and column operations, \(A\) can be transformed into the matrix \(D=\begin{pmatrix}I_r&O_1\\O_2&O_3\end{pmatrix}\), where \(O_1,O_2,O_3\) are zero matrices. Thus \(D_{ii}=1\) for \(i\le r\) and \(D_{ij}=0\) otherwise.
Corollary. Let \(A\) be an \(m\times n\) matrix of rank \(r\), then there exist invertible matrices \(B\) and \(C\) of sizes \(m\times m\) and \(n\times n\), respectively, such that \(D=BAC\) and \(D\) has the form above.
这个“对角化”的做法就是做行变换和列变换。
partitioned matrices
LU decomposition, inverse
For a linear system \(Ax=b\), if we can decompose \(A\) as \(LU\), we can divide the linear system into two linear systems \(Lc=b\) and \(Ux=c\). Here, for an \(m\times n\) matrix \(A\), \(L\) is an \(m\times m\) lower triangular matrix, and \(U\) is in echelon form.
对 \(A\) 做 ERO 把 \(A\) 变成 echelon form 就得到了 \(U\);\(L_{ij}\) 是过程中第 \(j\) 行加到第 \(i\) 行的系数的相反数。
若 \(A\) 可逆,对 \((A\mid I)\) 做 ERO 把 \(A\) 变成单位矩阵,得到的矩阵就是 \((I\mid A^{-1})\)。
homogeneous system
Def. A system \(Ax=b\) of \(m\) linear equations in \(n\) unknowns is said to be homogeneous(齐次的)if \(b=0\). Otherwise the system is said to be nonhomogeneous.
Thm. Let \(K\) be the solution set of a system of linear equations \(Ax=b\), and let \(K_H\) be the solution set of the corresponding homogeneous system \(Ax=0\). Then for any solution \(s\) to \(Ax=b\), \(K=\set{s}+K_H=\set{s+k\mid k\in K_H}\).
Thm. Let \(Ax=b\) be a system of \(n\) linear equations in \(n\) unknowns. If \(A\) is invertible, then the system has exactly one solution, namely, \(A^{-1}b\). Conversely, if the system has exactly one solution, then \(A\) is invertible.
Determinant
Def. Let \(A\in M_{n\times n}(F)\). If \(n=1\), so that \(A=(A_{11})\), we define \(\det(A)=A_{11}\). For \(n\ge 2\), we define \(\det(A)\) recursively as \(\det(A)=\sum_{j=1}^{n}(-1)^{1+j}A_{1j}\det(\tilde A_{1j})\). Here, \(\tilde A_{ij}\) means the \((n-1)\times (n-1)\) matrix obtained from \(A\) by deleting row \(i\) and column \(j\).
The scalar \(\det(A)\) is called the determinant of \(A\) and is also denoted by \(|A|\). The scalar \(c_{ij}=(-1)^{i+j}\det(\tilde A_{ij})\) is called the cofactor of the entry of \(A\) in row \(i\), column \(j\). We can express the formula for the determinant of \(A\) as \(\det(A)=\sum_{i=1}^{n}A_{1i}c_{1i}\), and this formula is called cofactor expansion along the first row of \(A\).
Thm. The determinant of a square matrix can be evaluated by cofactor expansion along any row. That is, if \(A\in M_{n\times n}(F)\), then for any integer \(i(1\le i\le n)\), \(\det(A)=\sum_{j=1}^{n}A_{ij}c_{ij}\).
The following rules summarize the effect of an elementary row operation on the determinant of a matrix \(A\in M_{n\times n}(F)\).
- (interchange) If \(B\) is a matrix obtained by interchanging any two rows of \(A\), then \(\det(B)=-\det(A)\).
- (scaling) If \(B\) is a matrix obtained by multiplying a row of \(A\) by a nonzero scalar \(k\), then \(\det(B)=k\det(A)\).
- (replacement) If \(B\) is a matrix obtained by adding a multiple of one row of \(A\) to another row of \(A\), then \(\det(B)=\det(A)\).
These facts can be used to simplify the evaluation of a determinant. By using elementary row operations of types \(1\) and \(3\) only, we can transform any square matrix into an upper triangular matrix, and so we can easily evaluate determinant of any square matrix since the determinant of an upper triangular matrix is the product of its diagonal entries.
Thm. For any \(A,B\in M_{n\times n}(F)\), \(\det(AB)=\det(A)\cdot \det(B)\).
Corollary. A matrix \(A\in M_{n\times n}(F)\) is invertible iff \(\det(A)\neq 0\). Furthermore, if \(A\) is invertible, then \(\det(A^{-1})=\frac{1}{\det(A)}\).
Thm.(Cramer's Rule) Let \(Ax=b\) be the matrix form of a system of \(n\) linear equations in \(n\) unknowns, where \(x=(x_1,\cdots,x_n)^t\). If \(\det(A)\neq 0\), then this system has a unique solution, and for each \(k=1,2,\cdots,n\), \(x_k=\frac{\det(M_k)}{\det(A)}\), where \(M_k\) is the \(n\times n\) matrix obtained from \(A\) by replacing column \(k\) of \(A\) by \(b\).
Diagonalization
diagonalizable
eigenvector, eigenvalue, eigenspace
Thm. A linear operator \(T\) on a finite-dimensional vector space \(V\) is a diagonalizable iff there exists an ordered basis \(\beta\) for \(V\) consisting of eigenvectors of \(T\). Furthermore, if \(T\) is diagonalizable, \(\beta=\{v_1,\cdots,v_n\}\) is an ordered basis of eigenvectors of \(T\), and \(D=[T]_{\beta}\), then \(D\) is a diagonal matrix is \(D_{jj}\) is the eigenvalue corresponding to \(v_j\) for \(1\le j\le n\).
characteristic polynomial, split, (algebraic) multiplicity
计算矩阵 \(A\) 的特征值,通过求关于 \(t\) 的多项式 \(\det(A-tI)\) 的根。计算 \(\lambda\) 对应的特征向量,只要解方程 \((A-\lambda I)x=0\).
计算线性算子 \(T\) 的特征值和特征向量,先选一个基 \(\beta\),求 \(A=[T]_{\beta}\) 的特征值和特征向量。\(A\) 的特征值就是 \(T\) 的特征值,\(A\) 的特征向量是 \(T\) 的特征向量在基 \(\beta\) 下的坐标。
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\) such that the characteristic polynomial of \(T\) splits. Let \(\lambda_1,\cdots,\lambda_k\) be the distinct eigenvalues of \(T\). Then
- \(T\) is diagonalizable if and only if the multiplicity of \(\lambda_i\) is equal to \(\dim(E_{\lambda_i})\) for all \(i\).
- If \(T\) is diagonalizable and \(\beta_i\) is an ordered basis for \(E_{\lambda_i}\) for each \(i\), then \(\beta=\beta_1\cup \cdots \beta_k\) is an ordered basis for \(V\) consisting of eigenvectors of \(T\).
这个定理给出了对角化的方法。
微分方程组 \(\frac{\text{d}}{\text{d}t}x=Ax\)
考虑线性常微分方程组 \(x_i'=\sum_{j=1}^{n}a_{ij}x_j, i=1,2,\cdots,n\),其中 \(x_i=x_i(t)\) 是关于 \(t\) 的函数。对 \(A\) 做对角化 \(Q^{-1}AQ=D\),设 \(y=Q^{-1}x\),则 \(y’=Dy\)。根据 \(D\) 是对角矩阵求 \(y\),进而求得 \(x\)。
sum, direct sum
\(T\)-invariant, \(T\)-cyclic subspace of \(V\) generated by \(x\)
Let \(T\) be a linear operator on a vector space \(V\), and let \(x\) be a nonzero vector in \(V\). The subspace \(W=\text{span}(\{x,T(x),T^2(x),\cdots\})\) is called the \(T\)-cyclic subspace of \(V\) generated by \(x\).
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\), and let \(W\) be a \(T\)-invariant subspace of \(V\). Then the characteristic polynomial of \(T_W\) divides the characteristic polynomial of \(T\).
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\), and let \(W\) denote the \(T\)-cyclic subspace of \(V\) generated by a nonzero vector \(v\in V\). Let \(k=\dim(W)\), then
- \(\{v,T(v),T^2(v),\cdots,T^{k-1}(v)\}\) is a basis for \(W\).
- If \(a_0v+a_1T(v)+\cdots+a_{k-1}T^{k-1}(v)+T^{k}(v)=0\), then the characteristic polynomial of \(T_W\) is \(f(t)=(-1)^{k}(a_0+a_1t+\cdots +a_{k-1}t^{k-1}+t^k)\).
Thm.(Cayley-Hamilton) Let \(T\) be a linear operator on a finite-dimensional vector space \(V\), and let \(f(t)\) be the characteristic polynomial of \(T\). Then \(f(T)=T_0\), the zero transformation. That is, \(T\) "satisfies" its characteristic equation.
Corollary.(Cayley-Hamilton Theorem for Matrices) Let \(A\) be an \(n\times n\) matrix and let \(f(t)\) be the characteristic polynomial of \(A\). Then \(f(A)=O\), the \(n\times n\) zero matrix.
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\), and suppose that \(V=W_1\oplus W_2\oplus\cdots\oplus W_k\), where \(W_i\) is a \(T\)-invariant subspace of \(V\) for each \(i(1\le i\le k)\). Suppose that \(f_i(t)\) is the characteristic polynomial of \(T_{W_i}(1\le i\le k)\), then \(f_1(t)\cdot f_2(t)\cdot \cdots \cdot f_k(t)\) is the characteristic polynomial of \(T\).
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\), and let \(W_1,W_2,\cdots,W_k\) be \(T\)-invariant subspaces of \(V\) such that \(V=W_1\oplus W_2\oplus \cdots \oplus W_k\). For each \(i\), let \(\beta_i\) be an ordered basis for \(W_i\), and let \(\beta=\beta_1\cup \beta_2\cup\cdots\cup\beta_k\). Let \(A=[T]_{\beta}\) and \(B_i=[T_{W_i}]_{\beta_i}\) for \(i=1,2,\cdots,k\), then \(A=B_1\oplus B_2\oplus\cdots\oplus B_k\).
Inner Product Spaces
inner product
standard inner product, Frobenius inner product
complex/real inner product space
norm/length, unit vector
orthogonal/perpendicular, orthonormal
Def. Let \(V\) be an inner product space. Vectors \(x\) and \(y\) in \(V\) are orthogonal (perpendicular) if \(<x,y>=0\). A subset \(S\) of \(V\) is orthogonal if any two distinct vectors in \(S\) are orthogonal. A subset \(S\) of \(V\) is orthonormal if \(S\) is orthogonal and consists entirely of unit vectors.
这里 orthogonal 理解成“正交”而非垂直。在后面某些地方会有区别。
Def. Let \(V\) be an inner product space. A subset of \(V\) is an orthonormal basis for \(V\) if it is an ordered basis that is orthonormal.
Thm. Let \(V\) be an inner product space and \(S=\{v_1,v_2,\cdots,v_k\}\) be an orthogonal subset of \(V\) consisting of nonzero vectors. If \(y\in \text{span}(S)\), then \(y=\sum_{i=1}^{k}\frac{<y,v_i>}{\|v_i\|^2}v_i\). If, in addition to the hypotheses of this theorem, \(S\) is orthonormal and \(y\in \text{span}(S)\), then \(y=\sum_{i=1}^{k}<y,v_i>v_i\).
Gram-Schmidt process
Thm. Let \(V\) be an inner product space and \(S=\{w_1,\cdots,w_n\}\) be a linearly independent subset of \(V\). Define \(S'=\{v_1,\cdots,v_n\}\), where \(v_1=w_1\) and \(v_k=w_k-\sum_{j=1}^{k-1}\frac{<w_k,v_j>}{\|v_j\|^2}v_j\) for \(2\le k\le n\). Then \(S'\) is an orthogonal set of nonzero vectors such that \(\text{span}(S')=\text{span}(S)\). The construction of \(\{v_1,\cdots,v_n\}\) is called the Gram-Schmidt process.
Fourier coefficients
Def. Let \(\beta\) be an orthonormal subset (possibly infinite) of an inner product space \(V\), and let \(x\in V\). We define the Fourier coefficients of \(x\) relative to \(\beta\) to be the scalars \(<x,y>\), where \(y\in \beta\).
orthogonal complement, orthogonal projection
一个集合 \(S\) 的正交补集(orthogonal complement)是和其中所有向量都垂直的向量构成的集合。这个集合用 \(S^{\perp}\) 表示,形式化地说,有 \(S^{\perp}=\{x\in V\mid\)$ <x,y> = 0 \text{ for all } y\in S}$。
Thm. Let \(W\) be a finite-dimensional subspace of an inner product space \(V\), and let \(y\in V\). Then there exist unique vectors \(u\in W\) and \(z\in W^{\perp}\) such that \(y=u+z\). Furthermore, if \(\{v_1,v_2,\cdots,v_k\}\) is an orthonormal basis for \(W\), then \(u=\sum_{i=1}^{k}<y,v_i> v_i\).
Corollary. In the notation of this theorem, the vector \(u\) is the unique vector in \(W\) that is "closest" to \(y\); that is, for any \(x\in W\), \(\|y-x\|\ge \|y-u\|\), and this inequality is an equality if and only if \(x=u\). The vector \(u\) is called the orthogonal projection of \(y\) on \(W\).
adjoint
一个算子的共轭算子利用它们的矩阵表示定义。如果 \(T\) 和 \(U\) 在某个标准正交基下的矩阵表示互为共轭转置,那么 \(T\) 和 \(U\) 互为共轭算子。下面的定理给出了共轭算子的一个等价定义。
Thm. Let \(V\) be a finite-dimensional inner product space, and let \(T\) be a linear operator on \(V\). Then there exists a unique function \(T^{\ast}:V\to V\) such that \(<T(x),y>=<x,T^{\ast}(y)>\) for all \(x,y\in V\).
least squares approximation (最小二乘法)
最小二乘法考虑的问题是用一条直线拟合若干个点。形式化地讲,给定 \(m\) 个点 \((t_1,y_1),(t_2,y_2),\cdots\)\((t_m,y_m)\),要寻找一条直线 \(y=ct+d\),使得 \(E=\sum_{i=1}^{m}(y_i-ct_i-d)^2\) 最小。这个问题可以进一步抽象化:设 \(A=\begin{pmatrix}t_1&1\\t_2&1\\\vdots&\vdots\\t_m&1\end{pmatrix}, x=\begin{pmatrix}c\\d\end{pmatrix}, y=\begin{pmatrix}y_1\\y_2\\\vdots\\y_m\end{pmatrix}\), 则 \(E=\|y-Ax\|^2\)。我们将抛开这种固定的形式,对任意给定的 \(A\) 和 \(y\),求一个 \(x\) 使得 \(\|y-Ax\|\) 最小。
Thm. Let \(A\in M_{m\times n}(F)\) and \(y\in F^m\). Then there exists \(x_0\in F^n\) such that \((A^{\ast}A)x_0=A^{\ast}y\) and \(\|Ax_0-y\|\le \|Ax-y\|\) for all \(x\in F^n\). Furthermore, if \(\text{rank}(A)=n\), then \(x_0=(A^{\ast}A)^{-1}A^{\ast}y\).
\(Ax_0\) 是 \(R(L_A)\) 中最靠近 \(y\) 的向量;如果 \(x_0\) 满足这一条件,则 \(Ax_0-y\in R(L_A)^{\perp}\),也就是 \(<x,A^{*}(Ax_0-y)>=0\) 对所有 \(x\) 成立。再注意到如果 \(\text{rank}(A)=n\),则 \(\text{rank}(A^{*}A)=n\),就可以直接推出全部结论。
minimal solution
下面的定理给出了求一个线性方程组模长最小的解(minimal solution)的方法。
Thm. Let \(A\in M_{m\times n}(F)\) and \(b\in F_m\). Suppose that \(Ax=b\) is consistent, then the following statements are true.
- There exists exactly one minimal solution \(s\) of \(Ax=b\), and \(s\in R(L_{A^\ast})\).
- The vector \(s\) is the only solution to \(Ax=b\) that lies in \(R(L_{A^{\ast}})\). That is, if \(u\) satisfies \((AA^{\ast})u=b\), then \(s=A^{\ast}u\).
Thm.(Schur) Let \(T\) be a linear operator on a finite-dimensional inner product space \(V\). Suppose that the characteristic polynomial of \(T\) splits, then there exists an orthonormal basis \(\beta\) for \(V\) such that the matrix \([T]_{\beta}\) is upper triangular.
normal
Def. Let \(V\) be an inner product space, and let \(T\) be a linear operator on \(V\). We say that \(T\) is normal if \(T^{\ast}T=TT^{\ast}\). An \(n\times n\) real or complex matrix \(A\) is normal if \(A^{\ast}A=AA^{\ast}\).
Thm. Let \(T\) be a linear operator on a finite-dimensional complex inner product space \(V\), then \(T\) is normal if and only if there exists an orthonormal basis for \(V\) consisting of eigenvectors of \(T\).
self-adjoint(Hermitian)
Def. Let \(T\) be a linear operator on an inner product space \(V\). We say that \(T\) is self-adjoint(Hermitian) if \(T=T^{\ast}\). An \(n\times n\) real or complex matrix \(A\) is self-adjoint(Hermitian) if \(A=A^{\ast}\).
Thm. Let \(T\) be a linear operator on a finite-dimensional real inner product space \(V\). Then \(T\) is self-adjoint if and only if there exists an orthonormal basis \(\beta\) for \(V\) consisting of eigenvectors of \(T\).
unitary/orthogonal operator
Def. Let \(T\) be a linear operator on a finite-dimensional inner product space \(V\) (over \(F\)). If \(\|T(x)\|=\|x\|\) for all \(x\in V\), we call \(T\) a unitary operator if \(F=C\) and an orthogonal operator if \(F=R\).
Thm. Let \(T\) be a linear operator on a finite-dimensional inner product space \(V\). Then the following statements are equivalent.
- \(TT^{\ast}=T^{\ast}T=I\).
- \(<T(x),T(y)>=<x,y>\) for all \(x,y\in V\).
- If \(\beta\) is an orthonormal basis for \(V\), then \(T(\beta)\) is an orthonormal basis for \(V\).
- There exists an orthonormal basis \(\beta\) for \(V\) such that \(T(\beta)\) is an orthonormal basis for \(V\).
- \(\|T(x)\|=\|x\|\) for all \(x\in V\).
Def. A square matrix \(A\) is called an orthogonal matrix if \(A^tA=AA^t=I\) and unitary if \(A^{\ast}A=AA^{\ast}=I\).
Def. Two matrix \(A\) and \(B\) are unitary equivalent [orthogonal equivalent] if and only if there exists a unitary [orthogonal] matrix \(P\) such that \(B=P^*AP\).
Thm. Let \(A\) be a complex \(n\times n\) matrix. Then \(A\) is normal if and only if \(A\) if unitarily equivalent to a diagonal matrix.
Thm. Let \(A\) be a real \(n\times n\) matrix. Then \(A\) is symmetric if and only if \(A\) if orthogonally equivalent to a diagonal matrix.
实数域和复数域上部分概念和结论是不一样的,具体来说有以下几处。
首先,对于一个算子,如果有一个由其特征向量组成的标准正交基,那么,在复数域上,可以推出这个算子是正规(normal)的;在实数域上,可以推出这个算子是自伴(self-adjoint)的。
其次是一个称呼的不同,如果一个算子保范数(norm/length),在复数域上,将其称为酉(unitary)算子;在实数域上,将其称为正交(orthogonal)算子。不过后者的称呼并不常见。
再有,如果一个矩阵在复数域上可以酉对角化,可以推出这个矩阵是正规的(normal);如果一个矩阵在实数域上可以正交(orthogonal)对角化,可以推出这个矩阵是对称的。
orthogonal projection
Recall that if \(V=W_1\oplus W_2\), then a linear operator \(T\) on \(V\) is the projection on \(W_1\) along \(W_2\) if, whenever \(x=x_1+x_2\), with \(x_1\in W_1\) and \(x_2\in W_2\), we have \(T(x)=x_1\).
Def. Let \(V\) be an inner product space, and let \(T:V\to V\) be a projection. We say that \(T\) is an orthogonal projection if \(R(T)^{\perp}=N(T)\) and \(N(T)^{\perp}=R(T)\).
Thm. Let \(V\) be an inner product space, and let \(T\) be a linear operator on \(V\). Then \(T\) is an orthogonal projection if and only if \(T\) has an adjoint \(T^{\ast}\) and \(T^2=T=T^\ast\).
Spectral Theorem, spectrum, resolution of the identity operator, spectral decomposition
Thm. (The Spectral Theorem) Suppose that \(T\) is a linear operator on a finite-dimensional inner product space \(V\) over \(F\) with the distinct eigenvalues \(\lambda_1,\lambda_2,\cdots,\lambda_k\). Assume that \(T\) is normal if \(F=C\) and that \(T\) is self-adjoint if \(F=R\). For each \(i(1\le i\le k)\), let \(W_i\) be the eigenspace of \(T\) corresponding to the eigenvalue \(\lambda_i\), and let \(T_i\) be the orthogonal projection of \(V\) on \(W_i\). Then the following statements are true.
- \(V=W_1\oplus W_2\oplus \cdots \oplus W_k\).
- If \(W_i'\) denotes the direct sum of the subspaces \(W_j\) for \(j\neq i\), then \(W_i^{\perp}=W_i'\).
- \(T_iT_j=\delta_{ij}T_i\) for \(1\le i,j\le k\).
- \(I=T_1+T_2+\cdots+T_k\).
- \(T=\lambda_1T_1+\lambda_2T_2+\cdots+\lambda_kT_k\).
The set \(\{\lambda_1,\lambda_2,\cdots,\lambda_k\}\) of eigenvalues of \(T\) is called the spectrum of \(T\). The sum \(I=T_1+T_2+\cdots+T_k\) in (d) is called the resolution of the identity operator, and the sum \(T=\lambda_1T_1+\lambda_2T_2+\cdots+\lambda_kT_k\) in (e) is called the spectral decomposition of \(T\).
Corollary 1. If \(F=C\), then \(T\) is normal if and only if \(T^{\ast}=g(T)\) for some polynomial \(g\).
Corollary 2. If \(F=C\), then \(T\) is unitary if and only if \(T\) is normal and \(|\lambda|=1\) for every eigenvalue of \(T\).
Corollary 3. If \(F=C\) and \(T\) is normal, then \(T\) is self-adjoint if and only if every eigenvalue of \(T\) is real.
Corollary 4. Let \(T\) be as in the spectral theorem with spectral decomposition \(T=\lambda_1T_1+\lambda_2T_2+\cdots+\lambda_kT_k\). Then each \(T_j\) is a polynomial in \(T\).
singular value decomposition, singular value
奇异值分解(singular value decomposition,SVD)是把一个 \(m\times n\) 矩阵 \(A\) 分解成 \(U\Sigma V^*\) 的形式,其中 \(U,V\) 分别是 \(m\times m,n\times n\) 的矩阵,而 \(\Sigma\) 是一个 \(m\times n\) 矩阵,并且存在一个 \(r\),使得 \(\Sigma\) 中仅有 \(\Sigma_{11}\ge \Sigma_{22}\ge \cdots\ge \Sigma_{rr}\) 这 \(r\) 个元素是非零的。我们将从线性变换开始研究虽然可能只会考计算。
Thm. (Singular Value Theorem for Linear Transformations) Let \(V\) and \(W\) be finite-dimensional inner product spaces, and let \(T:V\to W\) be a linear transformation of rank \(r\). Then there exists orthonormal bases \(\{v_1,v_2,\cdots,v_n\}\) for \(V\) and \(\{u_1,u_2,\cdots,u_m\}\) for \(W\) and positive scalars \(\sigma_1\ge \sigma_2\cdots\ge\sigma_r\) such that \(T(v_i)=\begin{cases}\sigma_iu_i & \text{if } i\le r \\ 0 & \text{if } i\gt r\end{cases}\).
Furthermore, suppose the preceding conditions are satisfied. Then for \(1\le i\le n\), \(v_i\) is an eigenvector of \(T^{\ast}T\) with corresponding eigenvalue \(\sigma_i^2\) if \(1\le i\le r\) and \(0\) if \(i\gt r\). Therefore the scalars \(\sigma_1,\sigma_2,\cdots,\sigma_r\) are uniquely determined by \(T\).
Def. The unique scalars \(\sigma_1,\sigma_2,\cdots,\sigma_r\) are called the singular values(奇异值)of \(T\). If \(r\lt m\) and \(r\lt n\), then the term singular value is extended to include \(\sigma_{r+1}=\cdots=\sigma_{k}=0\) where \(k=\min(m,n)\).
Def. Let \(A\) be a \(m\times n\) matrix. We define the singular values of \(A\) to be the singular values of the linear transformation \(L_A\).
Thm. (Singular Value Theorem for Matrices) Let \(A\) be an \(m\times n\) matrix of rank \(r\) with singular values \(\sigma_1\ge\sigma_2\cdots\ge\sigma_r\), and let \(\Sigma\) be the \(m\times n\) matrix defined by \(\Sigma_{ij}=\begin{cases}\sigma_i & \text{if } i=j\le r \\ 0 & \text{else}\end{cases}\). Then there exists an \(m\times m\) unitary matrix \(U\) and an \(m\times n\) unitary matrix \(V\) such that \(A=U\Sigma V^{\ast}\).
Def. Let \(A\) be an \(m\times n\) matrix of rank \(r\) with positive singular values \(\sigma_1\ge \sigma_2\ge\cdots\ge\sigma_r\). A factorization \(A=U\Sigma V^{\ast}\) where \(U\) and \(V\) are unitary matrices and \(\Sigma\) is the \(m\times n\) matrix is called a singular value decomposition of \(A\).
下面来叙述一下求 SVD 的方法。我们对 \(A^*A\) 做酉对角化,得到的结果就是 \(A^*A=V\Sigma^2V^*\);相应的,对 \(AA^*\) 做酉对角化,得到的结果就是 \(AA^*=U\Sigma^2U^*\)。
事实上,假设 \(A=U\Sigma V^*\),那么 \(AV=U\Sigma\)。注意 \(U^*U=V^*V=I\),\(A^*=V\Sigma U^*\) (注意 \(\Sigma\) 的共轭转置就是 \(\Sigma\)),那么 \(A^*A=V\Sigma^2V^*\),\(AA^*=U\Sigma^2U^*\)。
bilinear form, matrix representation
双线性型(bilinear form)是指一类 \(V\times V\to F\) 的映射,它对两个变量都线性。我们用 \(\mathcal B(V)\) 表示这类映射的集合。可以自然地定义加法和数乘,然后 \(\mathcal B(V)\) 就是一个线性空间。
一个双线性型 \(H\) 相对于一个基 \(\beta=\{v_1,v_2,\cdots,v_n\}\) 的矩阵表示(matrix representation) \(A=\psi_\beta(H)\) 由 \(A_{ij}=H(v_i,v_j)\) 定义。不难发现 \(H(x,y)=[x]_{\beta}^{T}A[y]_{\beta}\)(这里的 \(T\) 是转置)。特别地,如果 \(V=F^n\),那么存在一个矩阵 \(A\),使得 \(H(x,y)=x^TAy\)。
symmetric, diagonalizable
双线性型的对称和对角化可以直接由它的矩阵表示定义。
Thm. Let \(V\) be a finite-dimensional vector space over a field \(F\) not of characteristic two. Then every symmetric bilinear form on \(V\) is diagonalizable. Here, \(F\) is of characteristic two if \(1+1=0\) in \(F\).
这里给出了对角化一个对称矩阵的方法。只需要两个方向同时做初等变换,例如把第一行加到第二行上后,立即把第一列加到第二列上。
quadratic form
Def. Let \(V\) be a vector space over \(F\). A function \(K:V\to F\) is called a quadratic form if and only if there exists a symmetric bilinear form \(H\in \mathcal B(V)\) such that \(K(x)=H(x,x)\) for all \(x\in V\).
If the field \(F\) is not of characteristic two, there is a one-to-one correspondence between symmetric bilinear forms and quadratic forms.
Thm. Let \(V\) be a finite-dimensional real inner product space, and let \(H\) be a symmetric bilinear form on \(V\). Then there exists an orthonormal basis \(\beta\) for \(V\) such that \(\psi_{\beta}(H)\) is a diagonal matrix.
Corollary. Let \(K\) be a quadratic form on a finite-dimensional real inner product space \(V\). There exists an orthonormal basis \(\beta=\{v_1,\cdots,v_n\}\) for \(V\) and scalars \(\lambda_1,\cdots,\lambda_n\) (not necessarily distinct) such that if \(x\in V\) and \(x=\sum_{i=1}^{n}s_iv_i, s_i\in \R\), then \(K(x)=\sum_{i=1}^{n}\lambda_is_i^2\). In fact, if \(H\) is the symmetric bilinear form determined by \(K\), then \(\beta\) can be chosen to be any orthonormal basis for \(V\) such that \(\psi_\beta(H)\) is a diagonal matrix.
Jordan Canonical Form
首先说明 Jordan Canonical Form 只要会算就可以。不过这里还是给出核心的推理过程。
generalized eigenvector, generalized eigenspace
Def. Let \(T\) be a linear operator on a vector space \(V\), and let \(\lambda\) be a scalar. A nonzero vector \(x\) in \(V\) is called a generalized eigenvector of \(T\) corresponding to \(\lambda\) if \((T-\lambda I)^{p}(x)=0\) for some integer \(p\).
Def. Let \(T\) be a linear operator on a vector space \(V\) and let \(\lambda\) be an eigenvalue of \(T\). The generalized eigenspace of \(T\) corresponding to \(\lambda\), denoted \(K_\lambda\), is the subset of \(V\) defined by \(K_\lambda=\{x\in V\mid (T-\lambda I)^{p}(x)=0\text{ for some integer } p\}\).
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\) such that the characteristic polynomial of \(T\) splits, and let \(\lambda_1,\lambda_2,\cdots,\lambda_k\) e the distinct eigenvalues of \(T\) with corresponding multiplicities \(m_1,m_2,\cdots,m_k\). For \(1\le i\le k\), let \(\beta_i\) be an ordered basis for \(K_{\lambda_i}\). Then the following statements are true.
- \(\beta_i\cap \beta_j=\empty\) for \(i\neq j\).
- \(\beta=\beta_1\cup \beta_2\cup\cdots\cup \beta_k\) is an ordered basis for \(V\).
- \(\dim(K_{\lambda_i})=m_i\) for all \(i\).
这个结论可以直接表述为 \(V=K_{\lambda_1}\oplus K_{\lambda_2}\oplus\cdots\oplus K_{\lambda_k}\)。结论的证明很复杂。
cycle of generalized eigenvectors
Def. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\), and let \(x\) be a generalized eigenvector of \(T\) corresponding to the eigenvalue \(\lambda\). Suppose that \(p\) is the smallest positive integer for which \((T-\lambda I)^{p}=0\). Then the ordered set \(\{(T-\lambda I)^{p-1}(x),\)\((T-\lambda I)^{p-2},\cdots,(T-\lambda I)(x),x\}\) is called a cycle of generalized eigenvectors of \(T\) corresponding to \(\lambda\). The vectors \((T-\lambda I)^{p-1}(x)\) and \(x\) are called the initial vector and the end vector of the cycle, respectively. We say the length of the cycle is \(p\).
Thm. Let \(T\) be a linear operator on a vector space \(V\), and let \(\lambda\) be an eigenvalue of \(T\). Suppose that \(\gamma_1,\gamma_2,\cdots,\gamma_q\) are cycles of generalized eigenvectors of \(T\) corresponding to \(\lambda\) such that the initial vectors of the \(\gamma_i\)'s are distinct and form a linearly independent set. Then \(\gamma_i\)'s are disjoint, and their union \(\gamma=\bigcup_{i=1}^{q}\gamma_i\) is linearly independent.
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\), and let \(\lambda\) be an eigenvalue of \(T\). Then \(K_{\lambda}\) has an ordered basis consisting of a union of disjoint cycles of generalized eigenvectors corresponding to \(\lambda\).
对每个广义特征向量空间,存在由若干个循环的并构成的基。
Thm. Let \(T\) be a linear operator on a finite-dimensional vector space \(V\) whose characteristic polynomial splits, and suppose that \(\beta\) is a basis for \(V\) such that \(\beta\) is a disjoint union of cycles of generalized eigenvectors of \(T\). Then for each cycle \(\gamma\) of generalized eigenvectors contained in \(\beta\), \(W=\text{span}(\gamma)\) is \(T\)-invariant, and \([T_{W}]_{\gamma}\) is a Jordan block. Furthermore, \(\beta\) is a Jordan canonical basis for \(V\).
每个循环给出一个 Jordan block;所有循环的并如果构成一个基,那么它就是一个 Jordan 基。
到这里就可以定论,如果一个算子的特征多项式可分解,那么它就有一个 Jordan canonical form。可以用 left-multiplication transformation 定义矩阵的 Jordan canonical form。
dot diagram
考虑一个固定的特征值 \(\lambda\) 和相应的广义特征空间 \(K_{\lambda}\)。根据上面的定理,我们可以找到若干个循环的并构成 \(K_{\lambda}\) 的基。具体的,我们将它写成 \(\{v_1,(T-\lambda I)(v_1),\cdots,(T-\lambda I)^{p_1-1}(v_1)\}\cup\) \(\{v_2,(T-\lambda I)(v_2),\cdots,(T-\lambda I)^{p_2-1}(v_2)\}\cup\cdots\) \(\{v_k,(T-\lambda I)(v_k),\cdots,(T-\lambda I)^{p_k-1}(v_k)\}\)。假设 \(p_1\ge p_2\ge\cdots\ge p_k\),然后就可以用一个点状图(dot diagram)来表示(第 \(i\) 列有 \(p_i\) 个点,从上到下依次是 \((T-\lambda I)^{p_i-1}(v_i),\cdots,(T-\lambda I)(v_i),v_i\))。
这个点状图直接给出了 Jordan canonical form。假设所有的特征值是 \(\lambda_1,\lambda_2,\cdots,\lambda_k\),那么 Jordan canonical form 是由 \(k\) 个大块组成的。设第 \(i\) 个特征值 \(\lambda_i\) 对应的基是由 \(v_1,v_2,\cdots,v_{n_i}\) 的循环的并组成的,然后 \(v_{j}\) 的循环的长度是 \(p_j\),那么第 \(i\) 个大块又是由 \(n_i\) 个小块组成的。第 \(j\) 个小块就是 \(p_j\times p_j\) 的矩阵,对角线上的值是 \(\lambda_i\)。
能不能说点人话
下一个定理直接给出了点状图的形状。设特征多项式是 \((t-\lambda_1)^{r_1}(t-\lambda_2)^{r_2}\cdots(t-\lambda_k)^{r_k}\),那么第 \(i\) 个点状图前 \(l\) 行的点数就是 \(\text{nullity}((T-\lambda_iI)^{l})\)。这个定理是不难理解的:对于点状图作用一次 \(T-\lambda_iI\),第一行都变成零了,后面的行向上移动;再作用一次,原本的第二行又变成零,后面的行再向上移动,依此类推。
有了点状图的形状,就有了 Jordan canonical form 这个矩阵,下一步就是求出 Jordan canonical basis。对每个特征值分别计算。先算点状图的最后一行。这些向量应当是 \(N((T-\lambda I)^{r})\) 的基的一部分,但不在 \(N((T-\lambda I)^{r-1})\) 内。然后对这些向量做一遍 \(T-\lambda I\),得到倒数第二行的向量。倒数第二行可能还有一些点,我们要补全这些向量,使得最后两行的向量构成了 \(N((T-\lambda I)^{r})\) 的基不在 \(N((T-\lambda I)^{r-2})\) 的部分。以此类推就可以求出需要的基这玩意根本不是能手算的吧。