Exercise : Feature extraction using convolution
参考网页:
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
http://deeplearning.stanford.edu/wiki/index.php/Pooling
实验介绍:
使用卷积神经网络在STL-10数据集上训练分类器。STL-10有10个类别,在该实验中仅使用4个类别。
实验步骤:
1、使用linear decoder进行特征学习
2、将学习到的特征与样本作卷积,得到卷积图
3、对总面积图做池化,降维
4、将池化结果作为softmax 模型的输入训练分类器
理论基础:
前面几节我们都使用分辨率比较低的图片,比如在手写体识别中使用28*28,在linear decoder中使用8*8的patch块,这一节我们使用神经网络解决大图片问题。
Fully Connecte Networks:
在稀疏编码中我们使用的网络都是全连接的,对于比较小的图片如MNIST中的28*28,llinear decoder中的8*8 , 假设隐含层有100个神经元,那么需要学习28*28*100个参数。假设现在我们使用稍大一点的图96*96,这时需要学习106个参数。显然随着图片分辨率的增大,计算复杂度越来越大。
Locally coneected Networks:
为了解决以上问题,提出了局部连接网络,即隐含层的一个神经元只与输入层的一部分连接,也就是说每个神经元只与图片中的一小块连续区域相连接。在人的视觉系统中该现象称为:localized receptive fields。
Convolutions:
图片有一个性质:staionary(应该翻译为静态性吧),即在一个部分学习到的特征同样适用于另一个部分,因此我们根据少量的区域学习到的特征可以应用于整个图片。在前面章节中,我们在图片上随机采样8*8的patch块学习到的特征,可以用于检测整个图片的特征。那么将该8*8的特征与96*96的图片卷积可以得到大小为89*89的总面积图。如果隐含层有100个神经元,那么根据以上方法可以到到100张大小 为89*89的总面积图。
假设,给定大小为r*c的大图,特征学习的patch块大小为a*b,学习K个特征检测器,那么给定一张图片,通过卷积可以得到k*(r-a+1)*(c-b+1)维特征。
pooling:
假设大图大小为96*96,小patch块大小为8*8,k=400,那么一张图片可以提取3,168,400维特征,如果我们使用该特征进行分类器训练,那么很容易过抑合,因此必须降维。其实在局部连接中我们用到的局部性理论可以很好的用来解决这个问题,即对卷积图分块,对每个块取平均或最大值,这种操作在深度学习中我们称为pooling,mean pooling or max pooling。由于pooling对每一块取最大值或平均值,因此它对较小的平移具有平移不变性。
一些问题:
1、在卷积实现过程中,使用matlab自带接口:conv2 , 那么matlab执行conv2( img , w )时会先将w翻转,因此为了得到正确的结果,在调用conv2之前我们需要将w翻转 。
2、由于在使用sparse autoencoder进行特征学习时,我们对数据使用ZCA 白化进行预处理,因此在特征提取时我们同样也要执行相同的预处理操作。


T为白化矩阵,即实际卷积过程中使用WT,然后加
而不是b
3、如何检验卷积与池化部分的代码是否正确。因此卷积和池化相当费时间,不能跑了一个小时发现代码部分有问题,那时候估计想死的心都有了。
在卷积部分,先用cnnConvolve函数计算出样本的总面积值,然后从样本中选取1000个patch,直接feedForwardAutoencoder函数计算结果,如果对于所有patch块,这两者之间的差都非常小(比如1e-4),那么就说明cnnConvolve函数正确。
池化部分,把cnnPool计算结果与手动计算结果比较,如果计算结果相同,则认为cnnPool函数正确。
4、本节中使用彩色图片,显然我们需要对R,G,B三个通道分别做卷积,但如何将三个通道的卷积结果进行整合呢?想想在linear decoder那一节中,我们将三个通道串成一个向量,然后学习权重 W,每个patch块的响应值 为W*X,相当于将三个通道的结果相加。
因此在这里三个通道的卷积图要相加。(原来没理解到这一层,觉得应该取平均,写博文的时候突然想通)
5、本节内容在前几节的内容基础上做的,因此需要将相关代码拷贝过来,笔者大意没把softmaxCost.m的代码拷过来,就悲剧了。卷积池化部分跑了40分钟好不容易跑完了,才发现缺 文件,只能重跑。。卷积神经网络真的相当耗时又耗内存,笔者的电脑只有不到4G的内存完全不够用,借了朋友的服务器才跑起来,看来搞这块设备必须得跟得上。我们老板太穷,不给更新设备,有个电脑就不错了呢
一些函数:
conv2(A , B , SHAPE) : convolves A with B,
shape表示卷积图的大小
‘full': 卷积图大于原图
’same':相等
‘valid' : 小于
fliplr:左右翻转
flipud:上下翻转
因此逆时针旋转90度翻转为:fliopud( fliplr )
squeeze:
B = squeeze(A),B与A有相同的元素,f但维度大小为1的维度(a singleton dimension )被去掉了。如果 A 是一个row or column矢量或a scalar (1-by-1) value, then B = A.比如,rand(4,1,3)产生一个均匀分布的阵列,共3页,每页4行1列,经过squeeze后,1列的那个维度就没有了,只剩下4行3列的一个二维阵列。而rand(4,2,3)因为没有1列或1行的维度,所有squeeze后没有变化。
实验结果:
Linear decoder学习到的特征:

准确率:

当然也可以通过调整参数来提高准确率,比如pooling时的分块大小,改换池化策略,增加隐含层数目等等,所谓调的一手好参就是这样来的
主要代码:
cnnConvolve.m
function convolvedFeatures = cnnConvolve(patchDim, numFeatures, images, W, b, ZCAWhite, meanPatch) %cnnConvolve Returns the convolution of the features given by W and b with %the given images % % Parameters: % patchDim - patch (feature) dimension % numFeatures - number of features % images - large images to convolve with, matrix in the form % images(r, c, channel, image number) % W, b - W, b for features from the sparse autoencoder % ZCAWhite, meanPatch - ZCAWhitening and meanPatch matrices used for % preprocessing % % Returns: % convolvedFeatures - matrix of convolved features in the form % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) numImages = size(images, 4); imageDim = size(images, 1); imageChannels = size(images, 3); convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1); % Instructions: % Convolve every feature with every large image here to produce the % numFeatures x numImages x (imageDim - patchDim + 1) x (imageDim - patchDim + 1) % matrix convolvedFeatures, such that % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the % value of the convolved featureNum feature for the imageNum image over % the region (imageRow, imageCol) to (imageRow + patchDim - 1, imageCol + patchDim - 1) % % Expected running times: % Convolving with 100 images should take less than 3 minutes % Convolving with 5000 images should take around an hour % (So to save time when testing, you should convolve with less images, as % described earlier) % -------------------- YOUR CODE HERE -------------------- % Precompute the matrices that will be used during the convolution. Recall % that you need to take into account the whitening and mean subtraction % steps b =b - W * ZCAWhite * meanPatch; W = W*ZCAWhite; % -------------------------------------------------------- convolvedFeatures = zeros(numFeatures, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1); for imageNum = 1:numImages for featureNum = 1:numFeatures % convolution of image with feature matrix for each channel convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1); feature_tmp = W(featureNum , : ); % get the featureNum_th feature for channel = 1:3 % Obtain the feature (patchDim x patchDim) needed during the convolution % ---- YOUR CODE HERE ---- feature = zeros(8,8); % You should replace this feature = reshape( feature_tmp( (channel-1)*8*8 +1 :channel*8*8 ) , 8 , 8 ); % ------------------------ % Flip the feature matrix because of the definition of convolution, as explained later feature = flipud(fliplr(squeeze(feature))); % Obtain the image im = squeeze(images(:, :, channel, imageNum)); % Convolve "feature" with "im", adding the result to convolvedImage % be sure to do a 'valid' convolution % ---- YOUR CODE HERE ---- convolvedImage = convolvedImage + conv2( im , feature , 'valid'); % pa attention to the 'valid' parameter % ------------------------ % add R G B channel's convolution maps together end % Subtract the bias unit (correcting for the mean subtraction as well) % Then, apply the sigmoid function to get the hidden activation % ---- YOUR CODE HERE ---- convolvedImage = sigmoid( convolvedImage + b(featureNum) ); % ------------------------ % The convolved feature is the sum of the convolved values for all channels convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage; end end end function sigm = sigmoid( x ) sigm = 1./(1+exp(-x)); end
connPool.m
function pooledFeatures = cnnPool(poolDim, convolvedFeatures) %cnnPool Pools the given convolved features % % Parameters: % poolDim - dimension of pooling region % convolvedFeatures - convolved features to pool (as given by cnnConvolve) % convolvedFeatures(featureNum, imageNum, imageRow, imageCol) % % Returns: % pooledFeatures - matrix of pooled features in the form % pooledFeatures(featureNum, imageNum, poolRow, poolCol) % numImages = size(convolvedFeatures, 2); numFeatures = size(convolvedFeatures, 1); convolvedDim = size(convolvedFeatures, 3); pooledFeatures = zeros(numFeatures, numImages, floor(convolvedDim / poolDim), floor(convolvedDim / poolDim)); % -------------------- YOUR CODE HERE -------------------- % Instructions: % Now pool the convolved features in regions of poolDim x poolDim, % to obtain the % numFeatures x numImages x (convolvedDim/poolDim) x (convolvedDim/poolDim) % matrix pooledFeatures, such that % pooledFeatures(featureNum, imageNum, poolRow, poolCol) is the % value of the featureNum feature for the imageNum image pooled over the % corresponding (poolRow, poolCol) pooling region % (see http://ufldl/wiki/index.php/Pooling ) % % Use mean pooling here. % -------------------- YOUR CODE HERE -------------------- pooledFeaturesDim = convolvedDim/poolDim; for featureNum = 1:numFeatures for imageNum = 1:numImages img = squeeze( convolvedFeatures( featureNum , imageNum , : , :) ); pooledImg = zeros( pooledFeaturesDim , pooledFeaturesDim ); for rowNum = 1 : pooledFeaturesDim for colNum = 1 : pooledFeaturesDim patch = img( (rowNum-1)*poolDim+1 : rowNum*poolDim , (colNum-1)*poolDim+1 : colNum*poolDim ); pooledImg( rowNum , colNum ) = mean( patch(:) ); end end pooledFeatures( featureNum , imageNum , : , :) = pooledImg; end end end

浙公网安备 33010602011771号