Probabilistic Graphical Models – The Bayesian Network Representation

一个随机系统包含$p$个变量,$X_1, X_2, \ldots, X_p$,但是我们不知道$X_1, X_2, \ldots, X_p$的联合分布。假如可以观测到$X_1, X_2, \ldots, X_p$在样本上的实现值。如何由数据得到变量间的关系(联合分布),这是很多统计问题的本质。比如回归问题,$E(Y|X)$,如果知道$X_1, X_2, \ldots, X_p, Y$的联合分布,$E(Y|X)$也就知道了。

  • 考虑单变量下最简单的情况。我们对$\mu = E(X)$感兴趣,观测到独立同分布样本$x_1, x_2, \ldots, x_n$,假设$X$服从某种参数形式,可以建立likelihood function,做极大似然估计(MLE,maximum likelihood estimate),似然比检验(LRT,likelihood ratio test),在一些正则条件下有Wilks’s theorem。
  • 考虑有$p$个变量,$p$很大。考虑最简单的分布形式,多元正态,需要的参数个数为$O(p^2)$。更多的时候,我们根本不知道怎样给出一个高维联合分布。

概率图模型给出了一种更符合人类思维方式的解决方案。基本的想法是利用变量之间的(条件)独立性可以减少independent parameters的个数。这些(条件)独立性可以用图(有向或无向)表示,这种表示对应于联合分布的分解,即CPD(conditional probability distribution)的乘积,也可以看成local probability model的乘积。

Definition 3.1 – Bayesian Network Semantics

A bayesian network structure $\mathcal{G}$ is a directed acyclic graph whose nodes represent random variables $X_1, \ldots, X_n$. Let $Pa_{X_i}^{\mathcal{G}}$ denote the parents of $X_i$ in $\mathcal{G}$, and $NonDescendants_{X_i}$ denote the variables in the graph that are not descendants of $X_i$. Then $\mathcal{G}$ encodes the following set of conditional independence assumptions, call the local independencies, and denote by $\mathcal{I}_{l}(\mathcal{G})$:

For each variable $X_i$: ($X_i \perp NonDescendants_{X_i}|Pa_{X_i}^{\mathcal{G}}$).

Remarks:

  • 贝叶斯网络就是一个有向无圈图及其上定义的一组条件独立性(local independencies),有向边对应direct influence。

Definition 3.3 - I-map

Let $\mathcal{K}$ be any graph object associated with a set of independencies $\mathcal{I}(\mathcal{K})$. We say that $\mathcal{K}$ is an I-map for a set of indepdencies $\mathcal{I}$ if $\mathcal{I}(\mathcal{K}) \subseteq \mathcal{I}$.

Definition 3.4 – Factorization

Let $\mathcal{G}$ be a BN graph over the variables $X_1, \ldots, X_n$. We say that a distribution $P$ over the same space factorizes according to $\mathcal{G}$ if $P$ can be expressed as a product

$P(X_1, \ldots, X_n)=\prod_{i=1}^nP(X_i|Pa_{X_i}^{\mathcal{G}})$

Theorem 3.1

Let $\mathcal{G}$ be a BN graph over a set of random variables $\mathcal{X}$, and let $P$ be a joint distribution over the same space. If $\mathcal{G}$ is an I-map for $P$, then $P$ factorizes according to $\mathcal{G}$.

Remarks:

  • I-map $\Rightarrow$ Factorization.
  • Factorization $\Rightarrow$ I-map.
posted @ 2013-05-23 14:11  cchen  阅读(268)  评论(0)    收藏  举报