Probabilistic Graphical Models – The Bayesian Network Representation

一个随机系统包含$p$个变量，$X_1, X_2, \ldots, X_p$，但是我们不知道$X_1, X_2, \ldots, X_p$的联合分布。假如可以观测到$X_1, X_2, \ldots, X_p$在样本上的实现值。如何由数据得到变量间的关系（联合分布），这是很多统计问题的本质。比如回归问题，$E(Y|X)$，如果知道$X_1, X_2, \ldots, X_p, Y$的联合分布，$E(Y|X)$也就知道了。

考虑单变量下最简单的情况。我们对$\mu = E(X)$感兴趣，观测到独立同分布样本$x_1, x_2, \ldots, x_n$，假设$X$服从某种参数形式，可以建立likelihood function，做极大似然估计（MLE，maximum likelihood estimate），似然比检验（LRT，likelihood ratio test），在一些正则条件下有Wilks’s theorem。
考虑有$p$个变量，$p$很大。考虑最简单的分布形式，多元正态，需要的参数个数为$O(p^2)$。更多的时候，我们根本不知道怎样给出一个高维联合分布。

概率图模型给出了一种更符合人类思维方式的解决方案。基本的想法是利用变量之间的（条件）独立性可以减少independent parameters的个数。这些（条件）独立性可以用图（有向或无向）表示，这种表示对应于联合分布的分解，即CPD（conditional probability distribution）的乘积，也可以看成local probability model的乘积。

Definition 3.1 – Bayesian Network Semantics

A bayesian network structure $\mathcal{G}$ is a directed acyclic graph whose nodes represent random variables $X_1, \ldots, X_n$. Let $Pa_{X_i}^{\mathcal{G}}$ denote the parents of $X_i$ in $\mathcal{G}$, and $NonDescendants_{X_i}$ denote the variables in the graph that are not descendants of $X_i$. Then $\mathcal{G}$ encodes the following set of conditional independence assumptions, call the local independencies, and denote by $\mathcal{I}_{l}(\mathcal{G})$:

For each variable $X_i$: ($X_i \perp NonDescendants_{X_i}|Pa_{X_i}^{\mathcal{G}}$).

Remarks:

贝叶斯网络就是一个有向无圈图及其上定义的一组条件独立性（local independencies），有向边对应direct influence。

Definition 3.3 - I-map

Let $\mathcal{K}$ be any graph object associated with a set of independencies $\mathcal{I}(\mathcal{K})$. We say that $\mathcal{K}$ is an I-map for a set of indepdencies $\mathcal{I}$ if $\mathcal{I}(\mathcal{K}) \subseteq \mathcal{I}$.

Definition 3.4 – Factorization

Let $\mathcal{G}$ be a BN graph over the variables $X_1, \ldots, X_n$. We say that a distribution $P$ over the same space factorizes according to $\mathcal{G}$ if $P$ can be expressed as a product

$P(X_1, \ldots, X_n)=\prod_{i=1}^nP(X_i|Pa_{X_i}^{\mathcal{G}})$

Theorem 3.1

Let $\mathcal{G}$ be a BN graph over a set of random variables $\mathcal{X}$, and let $P$ be a joint distribution over the same space. If $\mathcal{G}$ is an I-map for $P$, then $P$ factorizes according to $\mathcal{G}$.

Remarks:

I-map $\Rightarrow$ Factorization.
Factorization $\Rightarrow$ I-map.

posted @ 2013-05-23 14:11 cchen 阅读(268) 评论(0) 收藏举报

刷新页面返回顶部

Probabilistic Graphical Models – The Bayesian Network Representation

公告