Glauber 过程的拌和时间(第一篇)

Chapter 1

1.1 The Glauber dynamics

In this paper we consider the following problem:

Given a simple undirected graph \(G=(V,E)\), a set of forbidden vertices \(X\subseteq V\), and a positive activity parameter \(\lambda>0\), compute the partition function

\[Z(G)=\sum_{\substack{I\in\mathcal{I}(G),\,I\cap X=\varnothing}}\lambda^{|I|}, \]

where \(\mathcal{I}(G)\) denotes the family of independent sets of \(G\), and each independent set \(I\) is weighted by \(\lambda^{|I|}\).

Throughout this paper, we will assume \(V=[n]=\{1,\cdots,n\}\).


The following process is called the (single-site) Heat-Bath Glauber dynamics:

  1. Initially, set every vertex to white.
  2. For a given number of iterations \(T\), repeat the following for \(T\) steps: at each step sample a vertex \(i\in[n]\) uniformly at random and perform the single-site update:
    1. If \(i\in X\) or any neighbor of \(i\) is black, then force \(i\) to remain white.
    2. Otherwise, set \(i\) to black with probability \(\frac{\lambda}{\lambda+1}\) and to white with probability \(\frac{1}{\lambda+1}\).
  3. After the iterations, output the set \(I\) of all black vertices as an independent set.

Throughout the process, the set of black vertices is always an independent set of \(G\).


We will prove the following mixing-time result. If there exists an integer degree bound \(\Delta\geq 3\) and a constant slackness \(0<\alpha<1\) such that

\[\lambda=(1-\alpha)\lambda_c,\qquad \lambda_c=\frac{(\Delta-1)^{\Delta-1}}{(\Delta-2)^\Delta}, \]

then the Heat-Bath Glauber dynamics has mixing time

\[T_{\mathrm{mix}}(\varepsilon)=O\big(n\log n + n\log\varepsilon^{-1}\big). \]

Therefore, it is an excellent Monte Carlo sampler.


1.2 \(\mathsf{P}\) is ergodic and reversible w.r.t. \(\mu\)

Represent a coloring by \(\sigma:[n]\to\{0,1\}\), where \(\sigma(i)=1\) means vertex \(i\) is black (belongs to the independent set), and \(\sigma(i)=0\) means vertex \(i\) is white.

Define the state space

\[\Omega=\big\{\sigma:[n]\to\{0,1\}\;:\;\sigma(i)=0\text{ for all }i\in X,\text{ and }\{i:\sigma(i)=1\}\text{ is an independent set of }G\big\}. \]

That is, \(\Omega\) is the set of all colorings that satisfy "forbidden vertices are white" and "black vertices form an independent set". Such colorings are called valid. Throughout the remainder of the text, all colorings refer to valid colorings unless explicitly stated otherwise.

On \(\Omega\) define the Gibbs distribution (with activity \(\lambda\)):

\[\mu(\sigma)=\frac{\lambda^{|\sigma|}}{Z(G)},\qquad |\sigma|:=\sum_{i\in[n]}\sigma(i), \]

where the partition function is

\[Z(G)=\sum_{\sigma\in\Omega}\lambda^{|\sigma|}=\sum_{I\in\mathcal I(G),\,I\cap X=\varnothing}\lambda^{|I|}. \]


For each vertex \(i\in[n]\) define the single-site update matrix \(\mathsf{P}_i\) as the operation that resamples the color of \(i\) according to the conditional Gibbs distribution while keeping \(\sigma_{[n]\setminus\{i\}}\) fixed. More formally:

\[\mathsf{P}_i(\sigma\to\tau)=\begin{cases} \mu^{\sigma_{[n]\setminus\{i\}}}(\tau), & \text{if }\sigma_{[n]\setminus\{i\}}=\tau_{[n]\setminus\{i\}},\\ 0, & \text{otherwise.} \end{cases},\qquad\forall\sigma,\tau\in\Omega. \]

A simple calculation shows that if at some step we sample \(i\), then the algorithm's action coincides with performing the update using \(\mathsf{P}_i\).

The overall chain has transition matrix \(\mathsf{P}=\frac{1}{n}\sum_{i=1}^n\mathsf{P}_i\).

Clearly \(\mathsf{P}\) is a row-stochastic matrix. We prove \(\mathsf{P}\) satisfies the detailed-balance equation with respect to \(\mu\):

\[\mu(\sigma)\mathsf{P}(\sigma\to\tau)=\mu(\tau)\mathsf{P}(\tau\to\sigma),\qquad\forall\sigma,\tau\in\Omega. \]

We verify case by case:

  1. If \(\sigma=\tau\), then both sides equal \(\mu(\sigma)\mathsf{P}(\sigma\to\sigma)\).
  2. If \(\sigma,\tau\) differ on one vertex, call it \(i\). In this case both sides are equal to \(w(\sigma,\tau)=\frac{1}{n}\cdot\frac{\mu(\sigma)\mu(\tau)}{\mu(\sigma)+\mu(\tau)}\).
  3. If \(\sigma,\tau\) differ on two or more vertices, then \(\mathsf{P}(\sigma\to\tau)=\mathsf{P}(\tau\to\sigma)=0\).

Hence \(\mathsf{P}\) is time-reversible with respect to \(\mu\).


Finally, it helps to visualize a graph \(\mathcal{G}\) on the \(N=|\Omega|\) distinct colorings:

  1. For each coloring \(\sigma\), put a self-loop at \(\sigma\) with weight \(\mu(\sigma)\mathsf{P}(\sigma\to\sigma)\).
  2. For each pair of colorings \(\sigma\) and \(\tau\) that differ on one vertex, put an undirected edge between \(\sigma\) and \(\tau\) with weight \(w(\sigma,\tau)\).

On this graph \(\mathcal{G}\), since \(\mu(\sigma)>0\) for all colorings \(\sigma\), all edge weights (including self-loops) are positive. Furthermore, the weights of all incident edges sum to \(\mu(\sigma)\).

We can see \(\mathsf{P}\) is irreducible and aperiodic by looking at this graph:

  1. \(\mathsf{P}\) is irreducible because every two colorings \(\sigma,\tau\) are connected in \(\mathcal{G}\), since they are both connected to the all-white coloring \(\iota\) (via the path that removes black vertices one by one).
  2. \(\mathsf{P}\) is aperiodic because every coloring \(\sigma\) has a self-loop in \(\mathcal{G}\).

Hence \(\mathsf{P}\) is irreducible and aperiodic on \(\Omega\), and therefore converges to \(\mu\) as its unique stationary distribution.


1.3 \(\mu\) is self-similar

In this paper we often examine conditional distributions of \(\mu\), so we make the following definitions:

  1. A partial coloring \(\sigma:U\to\{0,1\}\) is like a coloring but defined on a subset \(U\subseteq[n]\).

    (Again, all partial colorings are valid partial colorings unless explicitly stated otherwise.)

  2. A conditional distribution \(\mu^\sigma:\Omega^\sigma\to[0,1]\) is the law of a full coloring sampled from \(\mu\) conditioned on agreeing with \(\sigma\) on its ground set \(U\).

  3. A restricted distribution \(\mu_S:\Omega_S\to[0,1]\) is the law of the coloring on \(S\) when a full coloring is drawn from \(\mu\).

  4. A restricted conditional distribution \(\mu^\sigma_S:\Omega^\sigma_S\to[0,1]\) is the law of the coloring on \(S\) when a full coloring is drawn from \(\mu^\sigma\). If \(\sigma\) has ground set \(U=[n]\setminus S\), we call this a boundary-conditioned distribution on \(S\).

In these definitions, the conditional state space \(\Omega^\sigma\) is the set of all full colorings that agree with the partial coloring \(\sigma\) on its ground set \(U\). The restricted (conditional) state spaces \(\Omega_S\) and \(\Omega^\sigma_S\) are the sets of partial colorings on \(S\) obtainable from some full coloring in \(\Omega\) and \(\Omega^\sigma\), respectively.

The (restricted) conditional distributions are well-defined because setting \([n]\setminus U\) to all white is always valid.


We state the self-similarity property of \(\mu\):

For any subset \(S\subseteq[n]\) and any coloring \(\sigma\) with ground set \([n]\setminus S\) (a boundary condition), let \(Y\subseteq S\) be the subset of \(S\) that is either forbidden by \(X\) or adjacent to at least one black vertex in \([n]\setminus S\) as determined by \(\sigma\).

Then the Gibbs distribution on the induced subgraph \(G[S]\) with forbidden set \(Y\) is exactly \(\mu^\sigma_S\).

In future discussions we sometimes need to prove that certain properties hold for all boundary-conditioned distributions. From this observation, we can save effort by proving the property only for the global Gibbs distribution \(\mu\) itself, since such a proof generalizes to all graphs and forbidden sets, including \(G[S]\) with forbidden set \(Y\).

This also justifies why the problem statement includes the forbidden set \(X\): it is precisely the structure needed to express both the original problem and every subproblem that arises under possible boundary conditions.


Chapter 2

2.1 Functional analysis

We will perform some functional analysis. Here are the definitions:

The inner product \(\langle\cdot,\cdot\rangle_\mu\) is defined by

\[\langle f,g\rangle_\mu=\sum_{x\in\Omega}\mu(x)f(x)g(x),\qquad\forall f,g:\Omega\to\mathbb{R}. \]

Let \(\nu\) be any probability distribution. By right-multiplying the row-stochastic matrix \(\mathsf{P}\), \(\nu\mathsf{P}\) is the pushforward distribution defined by

\[(\nu\mathsf{P})(y)=\sum_{x\in\Omega}\nu(x)\mathsf{P}(x,y),\qquad\forall y\in\Omega. \]

(Equivalently, \((\nu\mathsf{P})(y)\) is the probability that \(I_{i+1}=y\) when the law of \(I_i\) is \(\nu\).)

Let \(f\) be any function. By left-multiplying by the row-stochastic matrix \(\mathsf{P}\), \(\mathsf{P}f\) is the one-step averaged function defined by

\[(\mathsf{P}f)(x)=\sum_{y\in\Omega}\mathsf{P}(x,y)f(y),\qquad\forall x\in\Omega. \]

(Equivalently, \((\mathsf{P}f)(x)\) is the expected value of \(f(I_{i+1})\) when \(I_i=x\).)

The variance \(\mathrm{Var}_\mu(\cdot)\) is defined as

\[\mathrm{Var}_\mu(f)=\mathbb{E}_{\mu}[f^2]-\mathbb{E}_\mu[f]^2=\langle f,f\rangle_\mu-\langle f,\mathbf{1}\rangle_\mu^2,\qquad\forall f:\Omega\to\mathbb{R}. \]

The following computation

\[\mathrm{Var}_\mu(f)=\sum_{x\in\Omega}\mu(x)f(x)^2-\left(\sum_{x\in\Omega}\mu(x)f(x)\right)^2=\frac{1}{2}\left(\sum_{x,y\in\Omega}\mu(x)\mu(y)(f(x)-f(y))^2\right) \]

helps view \(\mathrm{Var}_\mu(f)\) from a more local standpoint.

Finally, the Dirichlet energy \(\mathfrak{D}(\cdot,\cdot)\) is defined by

\[\mathfrak{D}(f,f)=\langle f,(\mathsf{Id}-\mathsf{P})f\rangle_\mu=\langle f,f\rangle_\mu-\langle f,\mathsf{P}f\rangle_\mu,\qquad\forall f:\Omega\to\mathbb{R}. \]

The motivation behind this definition will be revealed in later discussions.

The following computation

\[\mathfrak{D}(f,f)=\sum_{x\in\Omega}\mu(x)f(x)^2-\sum_{x,y\in\Omega}w(x,y)f(x)f(y)=\frac{1}{2}\left(\sum_{x,y\in\Omega}w(x,y)(f(x)-f(y))^2\right) \]

helps view \(\mathfrak{D}(f,f)\) from a more local standpoint.


2.2 Poincaré inequality

The Poincaré inequality is an important tool for proving upper bounds on the mixing time.

For a nonnegative real number \(\gamma\ge 0\), we say the Poincaré inequality holds with parameter \(\gamma\) if and only if

\[\gamma\cdot\mathrm{Var}_\mu(f)\leq\mathfrak{D}(f,f),\qquad\forall f:\Omega\to\mathbb{R}. \]

A larger \(\gamma\) corresponds to a strictly stronger statement.

In fact, the best \(\gamma\) that can appear in the Poincaré inequality is \(1-\lambda_2\), where \(\lambda_2\) is the second-largest eigenvalue of the transition matrix \(\mathsf{P}\). Equivalently,

\[1-\lambda_2=\inf_{f\ \text{not constant}}\left\{\frac{\mathfrak{D}(f,f)}{\mathrm{Var}_\mu(f)}\right\}. \]

Let \(N=|\Omega|\). Because \(\mathsf{P}\) is ergodic and reversible with respect to \(\mu\), Markov chain theory tells us that

\[1=\lambda_1>\lambda_2\ge\cdots\ge\lambda_N>-1. \]

Take an orthonormal eigenbasis \(\varphi_1,\varphi_2,\cdots,\varphi_N\) (with respect to \(\langle\cdot,\cdot\rangle_\mu\)) corresponding to the eigenvalues \(\lambda_1,\lambda_2,\cdots,\lambda_N\), with \(\varphi_1=\mathbf{1}\).

(Note that \(\varphi_1=\mathbf{1}\) because \(\mathsf{P}\mathbf{1}=\mathbf{1}\). For \(i\ge2\), \(\varphi_i\perp_\mu\varphi_1\), so \(\langle\varphi_i,\mathbf{1}\rangle_\mu=0\) and \(\mathbb{E}_\mu[\varphi_i]=0\).)

For any function \(f:\Omega\to\mathbb{R}\), decompose

\[f=\sum_{i=1}^N \alpha_i\varphi_i,\qquad \alpha_i\in\mathbb{R}. \]

Writing \(\mathrm{Var}_\mu(f)\) and \(\mathfrak{D}(f,f)\) in terms of these coefficients yields

\[\mathrm{Var}_\mu(f) =\left\langle\sum_{i=1}^N\alpha_i\varphi_i,\sum_{j=1}^N\alpha_j\varphi_j\right\rangle_\mu -\left\langle\sum_{i=1}^N\alpha_i\varphi_i,\mathbf{1}\right\rangle_\mu^2 =\sum_{i=2}^N\alpha_i^2, \]

and

\[\mathfrak{D}(f,f) =\left\langle\sum_{i=1}^N\alpha_i\varphi_i,\sum_{j=1}^N\alpha_j\varphi_j\right\rangle_\mu -\left\langle\sum_{i=1}^N\alpha_i\varphi_i,\sum_{j=1}^N\lambda_j\alpha_j\varphi_j\right\rangle_\mu =\sum_{i=2}^N(1-\lambda_i)\alpha_i^2. \]

Thus the ratio \(\frac{\mathfrak{D}(f,f)}{\mathrm{Var}_\mu(f)}\) is a weighted average of the numbers \(\{1-\lambda_i\}_{i=2,\cdots,N}\) with weights proportional to \(\alpha_i^2\). Therefore it is always at least \(1-\lambda_2\), and equality holds when \(\alpha_3=\cdots=\alpha_N=0\) but \(\alpha_2\neq0\).


2.3 PI implies rapid mixing

If the Poincaré inequality holds with parameter \(\gamma\), we prove

\[T_{\text{mix}}(\varepsilon)\leq\frac{1}{\gamma}\left(\ln\frac{1}{\mu(\iota)}+2\ln\frac{1}{2\varepsilon}\right). \]

Note that this statement is valid only in the present context of the Heat-Bath Glauber dynamics.


For \(0\le k\le n\), the \(k\)-subset residual variance is defined as

\[V^{(k)}=\mathbb{E}_{S\sim\small{\left(\begin{smallmatrix}[n]\\k\end{smallmatrix}\right)}}\big[\mathbb{E}_{\sigma\sim\mu_S}\big[\text{Var}_{\mu^\sigma}(f)\big]\big], \]

which denotes the average, over a uniformly random \(k\)​-subset \(S\)​ and over \(\sigma\sim\mu_S\), of the variance of \(f\)​ on the complement \([n]\setminus S\)​. Note that \(V^{(0)}=\text{Var}_\mu(f)\)​ and \(V^{(n)}=0\)​. We now show \(V^{(n-1)}=\mathfrak{D}(f,f)\):

\[\begin{aligned} V^{(n-1)} &=\mathbb{E}_{i\sim[n]}\big[\mathbb{E}_{\sigma\sim\mu_{[n]\setminus\{i\}}}\big[\text{Var}_{\mu^\sigma}(f)\big]\big]\\ &=\frac{1}{n}\sum_{i=1}^n\sum_{\substack{\sigma_0,\sigma_1\\\text{differ only on }i}}\frac{\mu(\sigma_0)\mu(\sigma_1)}{\mu(\sigma_0)+\mu(\sigma_1)}(f(\sigma_0)-f(\sigma_1))^2\\ &=\mathfrak{D}(f,f). \end{aligned} \]

Here, the second equality is the nontrivial step: it follows from the fact that the variance of the two-point distribution \(\left\{\left(f_0,\frac{\mu_0}{\mu_0+\mu_1}\right),\left(f_1,\frac{\mu_1}{\mu_0+\mu_1}\right)\right\}\) equals \(\frac{\mu_0\mu_1}{(\mu_0+\mu_1)^2}(f_0-f_1)^2\).


The \(\chi^2\) divergence \(D_{\chi^2}(\cdot\,\|\,\cdot)\) is defined by

\[D_{\chi^2}(\nu\,\|\,\mu)=\sum_{x\in\Omega}\mu(x)\left(\frac{\nu(x)}{\mu(x)}-1\right)^2,\qquad\forall\nu,\mu\in\mathcal{P}(\Omega). \]

We wish to prove that the Poincaré inequality implies contraction of the \(\chi^2\) divergence:

\[D_{\chi^2}(\nu\mathsf{P}\,\|\,\mu)\leq(1-\gamma)D_{\chi^2}(\nu\,\|\,\mu),\qquad\forall\nu\in\mathcal{P}(\Omega). \]

Consequently,

\[D_{\chi^2}\big(\nu\mathsf{P}^t\,\big\|\,\mu\big)\leq(1-\gamma)^tD_{\chi^2}(\nu\,\|\,\mu)\leq e^{-\gamma t}D_{\chi^2}(\nu\,\|\,\mu). \]

In our chain \(I_0=\iota\), so the distribution of \(I_0\) is \(\delta_\iota\), and \(D_{\chi^2}(\delta_\iota\,\|\,\mu)=\frac{1}{\mu(\iota)}-1\). The distribution of \(I_t\) is \(\delta_\iota\mathsf{P}^t\).

By the Cauchy–Schwarz inequality,

\[\|\nu-\mu\|_{\textsf{TD}}=\frac{1}{2}\sum_{x\in\Omega}\mu(x)\left|\frac{\nu(x)}{\mu(x)}-1\>\!\right|\leq\frac{1}{2}\sqrt{\sum_{x\in\Omega}\mu(x)\left(\frac{\nu(x)}{\mu(x)}-1\right)^2}=\frac{1}{2}\sqrt{D_{\chi^2}(\nu\,\|\,\mu)}. \]

Applying this to \(\delta_\iota\mathsf{P}^t\) gives

\[\big\|\delta_\iota\mathsf{P}^t-\mu\big\|_{\textsf{TD}} \leq\frac{1}{2}\sqrt{D_{\chi^2}\big(\delta_\iota\mathsf{P}^t\,\big\|\,\mu\big)} \leq\frac{1}{2}\sqrt{e^{-\gamma t}D_{\chi^2}(\delta_\iota\,\|\,\mu)} =\frac{1}{2}e^{-\gamma t/2}\sqrt{\frac{1}{\mu(\iota)}-1}. \]

Requiring the left-hand side to be at most \(\varepsilon\) and solving for \(t\) yields the claimed mixing-time bound

\[T_{\text{mix}}(\varepsilon)\leq\frac{1}{\gamma}\left(\ln\frac{1}{\mu(\iota)}+2\ln\frac{1}{2\varepsilon}\right). \]


For two distributions \(\nu,\mu\), let the density function be \(f=\frac{\nu}{\mu}\). (Then \(\mathbb{E}_\mu[f]=\langle f,\mathbf{1}\rangle_\mu=1\).) We compute:

\[\begin{aligned} D_{\chi^2}(\nu\mathsf{P}\,\|\,\mu) &= D_{\chi^2}\left(\frac{1}{n}\sum_{i\in[n]}\nu\mathsf{P}_i\,\Bigg\|\,\mu\right) &&\text{(definition of }\mathsf{P}\text{)}\\[0.3em] &\leq \frac{1}{n}\sum_{i\in[n]} D_{\chi^2}(\nu\mathsf{P}_i\,\|\,\mu) &&\text{(convexity of }\chi^2\text{ divergence)}\\[0.3em] &= \frac{1}{n}\sum_{i\in[n]}\text{Var}_\mu(\mathsf{P}_i f) &&\text{(since }D_{\chi^2}(\nu\,\|\,\mu)=\text{Var}_\mu(f)\text{)}\\[0.3em] &= \frac{1}{n}\sum_{i\in[n]}\text{Var}_{\sigma\sim\mu_{[n]\setminus\{i\}}}\big(\mathbb{E}_{\mu^\sigma}[f]\big) &&\text{(definition of }\mathsf{P}_i\text{)}\\[0.3em] &= \frac{1}{n}\sum_{i\in[n]}\Big(\text{Var}_\mu(f)-\mathbb{E}_{\sigma\sim\mu_{[n]\setminus\{i\}}}\big[\text{Var}_{\mu^\sigma}(f)\big]\Big) &&\text{(law of total variance)}\\[0.3em] &= V^{(0)}(f)-V^{(n-1)}(f), \end{aligned} \]

which, together with \(V^{(n-1)}=\mathfrak{D}(f,f)\) and the Poincaré inequality \(\mathfrak{D}(f,f)\geq\gamma\cdot V^{(0)}\), implies \(D_{\chi^2}(\nu\mathsf{P}\,\|\,\mu)\leq(1-\gamma)D_{\chi^2}(\nu\,\|\,\mu)\).


posted @ 2025-11-11 19:02  叶语星辰  阅读(55)  评论(0)    收藏  举报