离散数学 Project
这是我离散数学课程的 Project 报告。用来和同期的同学交流、八卦。同时,希望能帮助到以后的同学。
Discrete Mathematics Project Report: Useful Matrix Tricks in Computing
Yihao Jiang
Project Motivation
When learning relations, we learned how to represent relations using matrices. This allowed us to use matrix multiplication to calculate composition of relations. Also, in the following lectures, we learned how to use matrices to represent graphs and use matrix multiplication as connecting paths in graphs. These matrix tricks are useful, and they are results of the nature of matrices. So I want to do a project on matrix tricks in computing to expand this idea.
Calculating Closures of Relations
The most primal use of matrices in the lectures, is to represent relations using matrices. By using boolean variables as matrix elements, we can calculate the combination of relations using matrix multiplication. By adding up powers of the relation matrix, we can calculate the transitive closure of a relation. Although this method is quite slow (\(O(n^4)\)), it is a very inspiring one, and its idea will be used below.
Calculating Paths in Graphs
An expansion from relations to graphs brings adjacency matrices from relation matrices. By taking powers of the adjacency matrix, we can count the number of paths of certain lengths between certain pairs of vertices in a graph. Now the matrix not only have information about plain relations, but also a "weight" for each relation, which brings further possibilities.
Representing Linear Recurrence Relations
Linear recurrence relations can be represented using matrices. By multiplying a vector of values with another vector of coefficients, we can get the next value in the sequence. By stacking coefficients into matrices and initial values into vectors, we can use matrix exponentiation to calculate the \(n\)-th value in the sequence in \(O(m^3\log n)\) multiplications (\(m\) is the order of the recurrence).
Here, it is assumed that the recurrence relation is calculated in a finite field (like modulo a prime number) or is limited to certain precision (like 64-bit floating point numbers). If the values can grow very large, then we need to consider the bit-length of each value.
If the \(n\)-th value has \(f(n)\) bits (say \(f(n) = \Theta(n)\)), then since each multiplication takes \(O(m^3 f(\frac n {2^k})\log f(\frac n {2^k}))\) time, the total time complexity is \(O(m^3 f(n)\log f(n))\).
This is a very useful trick in computing, it is already efficient enough in many cases. But sometimes the matrix can be large, and we want to optimize further.
Optimizing with Berlekamp–Massey Algorithm
When the matrix is large, we can use the Berlekamp–Massey algorithm to reduce the order of the recurrence relation, and get thus a smaller matrix.
The algorithm takes a sequence \({a_0, a_1, \dots, a_{n-1}}\), and outputs the relation sequence \({r_0, r_1, \dots, r_m}\) with \(r_0 = 1\) with smallest order \(m\) such that \(\sum_{j=0}^m r_j a_{i-j} = 0\) for all \(i \ge m\). It uses \(O(n^2)\) time.
Although the algorithm only works for finite sequences, but for infinite sequences generated by linear recurrence relations of order \(d\), we can treat it as a finite sequence of length \(2d\), and the algorithm will return the correct relation sequence. Because if we treat relation as a polynomial of degree \(d\), then by the property of polynomials, if two polynomials of degree \(d\) are equal on \(d+1\) different points, then they are equal everywhere. Here although we only have \(d\) points, but since \(r_0 = 1\), we only need such many.
Berlekamp–Massey Algorithm Procedure
Let \(F_i\) be a relation \(f_{i, 0}, f_{i, 1}, \dots, f_{i, |F_i|-1}\) found for \(a_i\) such that \(\sum_{j=0}^{|F_i|-1} f_{i,j} a_{i-j-1} = a_i\).
Initially, let \(F_0 = \{\}\).
If we have found \(F_{i-1}\), then there are two cases:
- \(F_{i-1}\) is also a relation for \(a_i\). Then let \(F_i = F_{i-1}\).
- \(F_{i-1}\) is not a relation for \(a_i\). Then we need further calculation.
If this is the first time this happens, then we can simply set \(F_i = {0, \dots, 0}\), since \(a_i\) is the first non-zero term.
Otherwise, let \(\Delta_i\) be \(a_i - \sum_{j=0}^{|F_{i-1}|-1} f_{i-1,j} a_{i-j-1}\). And \(F_k\) be the last relation found modified before \(F_{i-1}\) with order \(m\).
Then if we can found a relation \(G = {g_0, \dots, g_{m'}}\) such that \(\sum_{j=0}^{m'} g_j a_{i - j - 1} = \Delta_i\) and \(\sum_{j=0}^{m'} g_j a_{i'-j-1} = 0 \ \forall \ m' < i' < i\), then we can set \(F_i = F_k + G\).
A construct of \(G\) is \(G = {0, 0, \dots, \frac {\Delta_i} {\Delta_k}, - \frac {\Delta_i} {\Delta_k} F_{k-1}}\). Which is \(i-k-1\) zeros followed by \(\frac {\Delta_i} {\Delta_k}\) and then \(- \frac {\Delta_i} {\Delta_k}\) times each element in \(F_{k-1}\).
Now we have found \(F_i\). We can construct \(r_i\) from it and solve the linear recurrence relation using it. It has much more applications in linear algebra, but it is currently beyond my scope.
Kirchhoff's Theorem
Matrix is also useful in counting problems in graphs. For example, to count the number of spanning trees in a graph without self-loops, we can use Kirchhoff's theorem to translate it into a linear algebra problem.
For the case of undirected graphs, let \(D(G)\) be the degree matrix of graph \(G\), with \(D_{i,i}\) being the degree of vertex \(i\) and other elements being \(0\). Let \(A(G)\) be the adjacency matrix of graph \(G\). Then let \(L(G) = D(G) - A(G)\) be the Laplacian (or Kirchhoff) matrix of graph \(G\). And \(t(G)\) be the number of spanning trees in graph \(G\).
For the case of directed graphs, let \(D^{out}(G)\) be the out-degree matrix of graph \(G\), with \(D_{i,i}^{out}\) being the out-degree of vertex \(i\) and other elements being \(0\). Let \(A(G)\) be the adjacency matrix of graph \(G\). Then let \(L^{out}(G) = D^{out}(G) - A(G)\) be the directed Laplacian matrix of graph \(G\). And \(t^{root}(G, r)\) be the number of spanning trees rooted at vertex \(r\) in graph \(G\), with edges pointing to the root.
Then by Kirchhoff's theorem, in undirected graph \(G\): \(t(G) = \det L(G)_{[n]\setminus\{k\},[n]\setminus\{k\}}\), here \([n]\) means \(\{1, 2, \dots, n\}\) and the matrix is a \((n-1)\times(n-1)\) submatrix of \(L(G)\), or say that \(L(G)\) has eigenvalues \(\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_n = 0\), then \(t(G) = \frac 1 n \prod_{i=1}^{n-1} \lambda_i\).
Similarly, in directed graph \(G\): \(t^{root}(G, r) = \det L^{out}(G)_{[n]\setminus\{r\},[n]\setminus\{r\}}\).
Proof of this theorem uses Cauchy–Binet theorem, for two matrices \(A\) and \(B\) of size \(n \times m\) and \(m \times n\), we have:
The cases of \(m \le n\) are trivial, and for \(m > n\), noticing that:
and with \([x^{n-k}]\det(x I_n + C)\) (coefficient of \(x^{n-k}\) in \(\det(x I_n + C)\)) being the sum of all \(k \times k\) principal minors of matrix \(C\), we have:
Then Cauchy-Binet theorem is proved.
To prove Kirchhoff's theorem, for graph \(G\) with \(n\) vertices and \(m\) edges and \(w(e)\) being the weight of edge \(e=(u,v)\), define a \(m\times n\) matrix \(M^{out}_{i,j} = \sqrt{w(e_i)}\) if \(e_i = (u_j, v)\) otherwise \(M^{out}_{i,j} = 0\) be the incidence matrix of graph \(G\).
Similarly, define another \(m\times n\) matrix \(M^{in}_{i,j} = \sqrt{w(e_i)}\) if \(e_i = (u, v_j)\) otherwise \(M^{in}_{i,j} = 0\).
Then we have:
and
Then we will prove another lemma: for a subgraph \((W, S)\), if \(|W| = |S| \le n\), then \(T = (V, S)\) is a spanning forest rooted at \(V \setminus W\) if and only if:
and if it is non-zero, then it equals to \(\prod_{e\in S} w(e)\) and let this be \(w(T)\).
To prove it, with out loss of generality, let \(w(e) = 1\). If \(\det(M^{out}_{S,W})\ne 0\), then each row must have exactly one non-zero element, which means each vertex in \(W\) has out-degree \(1\) in subgraph \(T\).
Now if there are no cycles in \(T\), \(T\) is a spanning forest. By calculating \(\det((M^{out} - M^{in})_{S, W})\) with Gauss elimination, we can find that each elimination equals to eliminating one edge in \(T\), and finally iff there is a cycle, the determinant becomes \(0\). Otherwise, the determinant is actually \(\det(M^{out}_{S,W})\).
So iff \(T\) is a spanning forest, \(\det(M^{out}_{S, W}) \det((M^{out} - M^{in})_{S, W}) = \det(M^{out}_{S, W})^2 = w(T)\ne 0\).
Using this lemma and Cauchy-Binet theorem, and let \(W\) be \([n]\setminus\{r\}\) here, we have:
summing up give:
which is Kirchhoff's theorem for directed graphs. The undirected case is an expansion of the directed case.
In this way, we can relate counting in graphs to calculating determinants of matrices. It features transforming edge connections to row eliminations, which is similar to the idea of using matrix multiplication to connect paths in graphs, but more complex. It is a good expansion from the base lectures in class.
Lindström–Gessel–Viennot Lemma
Another useful matrix trick in counting problems in graphs is the Lindström–Gessel–Viennot lemma. It provides a method to count the number of non-intersecting paths in directed acyclic graphs. But it is even harder and I do not understand it currently.
Conclusion
In this project, I explored some useful matrix tricks in computing. Most of them are extracurricular, like the Berlekamp–Massey algorithm. I learned a lot in these. The most important part is that, how to translate combination and connection operations into linear algebra languages. I hope these will be useful in the future.
Appendix
Here is a sample implementation of the Berlekamp–Massey algorithm in C++ from OI Wiki:
vector<int> berlekamp_massey(const vector<int> &a) {
vector<int> v, last; // v is the answer, 0-based, p is the module
int k = -1, delta = 0;
for (int i = 0; i < (int)a.size(); i++) {
int tmp = 0;
for (int j = 0; j < (int)v.size(); j++)
tmp = (tmp + (long long)a[i - j - 1] * v[j]) % p;
if (a[i] == tmp) continue;
if (k < 0) {
k = i;
delta = (a[i] - tmp + p) % p;
v = vector<int>(i + 1);
continue;
}
vector<int> u = v;
int val = (long long)(a[i] - tmp + p) * power(delta, p - 2) % p;
if (v.size() < last.size() + i - k) v.resize(last.size() + i - k);
(v[i - k - 1] += val) %= p;
for (int j = 0; j < (int)last.size(); j++) {
v[i - k + j] = (v[i - k + j] - (long long)val * last[j]) % p;
if (v[i - k + j] < 0) v[i - k + j] += p;
}
if ((int)u.size() - i < (int)last.size() - k) {
last = u;
k = i;
delta = a[i] - tmp;
if (delta < 0) delta += p;
}
}
for (auto &x : v) x = (p - x) % p;
v.insert(v.begin(), 1);
return v; // $\forall i, \sum_{j = 0} ^ m a_{i - j} v_j = 0$
}

浙公网安备 33010602011771号