Deep Learning Tutorial (翻译) 之 RBM(下)
英文原文请参考http://www.deeplearning.net/tutorial/rbm.html
RBM代码实现
我们构造一个RBM类,其参数(主要是W,hbias,vbias,theano_rng)可以通过构造器初始化或通过形参传递。这样处理有助于将RBM用于深度网络的一个构成块,这样参数W和b可以与相应的MLP的sigmoidal层参数共享。代码如下:
class RBM(object):
def __init__(self,
input=None,
n_visible=784,
n_hidden=500,
W=None,
hbias=None,
vbias=None,
numpy_rng=None,
theano_rng=None
):
self.n_visible = n_visible
self.n_hidden = n_hidden
if numpy_rng is None:
numpy_rng = numpy.random.RandomState(1234)
if theano_rng is None:
theano_rng = RandomStreams(numpy_rng.randint(2**30))
if W is None:
initial_W = numpy.asarray(
numpy_rng.uniform(
low=-4 * numpy.sqrt(6. / (n_visible + n_hidden)),
high=4 * numpy.sqrt(6. /(n_visible + n_hidden)),
size=(n_visible, n_hidden)
),
dtype= theano.config.floatX
)
W = theano.shared(value=initial_W, name='W',borrow=True)
if hbias is None:
hbias = theano.shared(
value=numpy.zeros(
n_hidden,
dtype=theano.config.floatX
),
name='hbias',
borrow=True
)
if vbias is None:
vbias = theano.shared(
value=numpy.zeros(
n_visible,
dtype=theano.config.floatX
),
name='vbias',
borrow=True
)
# initialize input layer for standalone RBM or layer0 of DBN
self.input = input
if not input:
self.input = T.matrix('input')
self.W = W
self.hbias = hbias
self.vbias = vbias
self.theano_rng = theano_rng
self.params = [self.W, self.hbias, self.vbias]
下一步定义构造符号图的函数根据公式7和8

代码如下:
def propup(self, vis):
'''
这个函数从可见层向隐藏层激活进行传播
注意到这里也返回了pre_sigmoid_activation。这个符号变量在需要更稳定的计算图时可能用到
'''
pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias
return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def propdown(self, hid):
pre_sigmoid_activation = T.dot(hid, self.W) + self.vbias
return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_h_given_v(self, v0_sample):
#这个函数给定可见层预测隐藏层
#首先根据给定可见层样本计算隐藏层的activation
pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
#获得隐藏层样本通过上面的activation
h1_sample = self.theano_rng.binomial(size=h1_mean.shape,
n=1, p=h1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_h1, h1_mean, h1_sample]
def sample_v_given_h(self, h0_sample):
pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
v1_sample = self.theano_rng.binomial(size=v1_mean.shape,
n=1,p=v1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_v1, v1_mean, v1_sample]
我们可以使用这些函数为Gibbs采用步骤定义符号图。定义两个函数:
- gibbs_vhv执行一步Gibbs采样,从可见层开始,我们将看到,这步对从RBM采用很有用
- gibbs_hvh执行一步Gibbs采样,从隐藏层开始,对执行CD和PCD更新有用
代码如下:
def gibbs_hvh(self, h0_sample):
pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)
pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)
return [pre_sigmoid_v1, v1_mean, v1_sample,
pre_sigmoid_h1,h1_mean, h1_sample]
def gibbs_vhv(self, v0_sample):
pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)
pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)
return [pre_sigmoid_h1, h1_mean, h1_sample,
pre_sigmoid_v1, v1_mean, v1_sample]
注意到这里我们也返回了pre-sigmoid activation。理解这个我们需要了解Theano是怎样工作的,whenever你编译一个Theano函数,作为input的计算图为了加速和稳定获得优化,这是通过改变其他子图的几个部分实现的。(接下来的解释都不懂就不翻译了)Therefore the easiest and more efficient way is to get also the pre-sigmoid activation as an output of scan, and apply both the log and sigmoid outside scan such that Theano can catch and optimize the expression.
这个类也有个函数计算自由能,计算参数的梯度时用到。我们增加get_cost_updates方法,生成符号梯度为CD-k或PCD-k更新,代码如下:

def free_energy(self, v_sample):
wx_b = T.dot(v_sample, self.W) + self.hbias
vbias_term = T.dot(v_sample, self.vbias)
hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1)
return -hidden_term -vbias_term
def get_cost_updates(self, lr=0.1, persistent=None, k=1):
'''
这个函数用来实现一步CD-k或PCD-k
:param lr: 学习率
:param persistent: For PCD,共享变量包含Gibbs链的old state。
size为(batch size, 隐藏单元个数)
:param k: Gibbs步数
:return:返回代价和updates,updates包括weights和biases,
同时也有shared variable的更新,用于保存持久链,如果是PCD
'''
# 计算positive phase
pre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input)
# 决定如何初始化持久链
# 对CD,使用新生成的隐藏层样本
# 对PCD,用以前的链状态初始化
if persistent is None:
chain_start = ph_sample
else:
chain_start = persistent
# 执行negative phase
# 为了实现CD/PCD我们需要scan实现一步gibbs的函数k次
# the scan 将返回整个Gibbs链
(
[
pre_sigmoid_nvs,
nv_means,
nv_samples,
pre_sigmoid_nhs,
nh_means,
nh_samples
],
updates
) = theano.scan(
self.gibbs_hvh,
# None 是占位符place holders
outputs_info=[None, None, None, None, None, chain_start],
n_steps=k,
name="gibbs_hvh"
)
# 如果我们直接使用T.grad,函数可能遍历Gibbs链来获得梯度,这不是我们想要的,
# 因为会混淆,因此我们需要表明chain_end是一个常量by consider_constant
chain_end = nv_samples[-1]
cost = T.mean(self.free_energy(self.input)) - T.mean(self.free_energy(chain_end))
gparams = T.grad(cost, self.params, consider_constant=[chain_end])
for gparam, param in zip(gparams, self.params):
updates[param] = param - gparam * T.cast(lr, dtype=theano.config.floatX)
if persistent:
updates[persistent] = nh_samples[-1]
# pseudo-likelihood is a better proxy for PCD
monitoring_cost = self.get_pseudo_likelihood_cost(updates)
else:
# reconstruction cross-entropy is a better proxy for CD
monitoring_cost = self.get_reconstruction_cost(updates, pre_sigmoid_nvs[-1])
return monitoring_cost, updates
跟踪进展
RBMs很难训练,因为partition函数Z,我们在训练中不能估计log-likelihood
,我们没有直接有用的指标来选择最优的超参数。
下面有几个options
观察负样本
负样本在训练中可以可视化,随着训练进行,我们知道模型越来越接近于真实的分布
。负样本应该看起来像是训练集的样本,显然不好的参数应该丢弃。
Filters可视化观察
通过模型学习到的filters可以被可视化。
由于网站关闭,翻译不能进行,sorry。

浙公网安备 33010602011771号