Naive MLP using BP Algorithm
本文使用Matlab语言在不借助任何现有神经网络工具包的前提下,实现了一个简单的基于误差反向传播(BP)算法的多层感知器(MLP)神经网络,选择的激活函数为对数S型函数(Sigmoid)和最简单的线性函数(Linear)。与其他简易MLP分类器不同的是,本分类器使用向量计算来代替for循环,以提高执行效率(由于Matlab是脚本语言,带有复杂执行语言的for循环会大大降低效率,改用向量或矩阵计算可以利用并行计算的特性,大大提高执行速度,这也是Matlab思想所在)。
注意到,while循环体中使用了批量学习(Batch Learning)算法,而非在线学习(Online Learning)算法,原因是后者在实际测试中很难收敛。引用一篇博文[http://hi.baidu.com/lixinxin555518/item/28b832e09f89e90f8d3ea89e]的解释来说明:
批处理(Batch)算法和在线学习(online)算法都是基于梯度下降原理的。区别在于前者选用的是较保守的算法,在对所有观测数据处理后取得一个最速下降方向进行迭代,而在线学习算法只着眼于当前的某一观测值,是一种随机梯度算法。前者的优点是收敛速度快,缺点是计算复杂,无法处理实时数据;后者的优点是计算量小,收敛速度慢,甚至有时产生震荡无法收敛。
代码依赖的训练样本和测试样本可参考此文:http://www.cnblogs.com/awarrior/p/3282959.html
% Naive-MLP Classifier of Three Layers
% This classifier is realized in a simple way, using logarithm sigmoid
% function as hidden layer active neuron and linear function as output
% layer active neuron, and employs BP algorithm in order to refine the
% weights. Each sample contains two features which can be involved from
% outer text files. Unlike other naive MLP classifiers, this one uses
% matrix data structure better than for-end which can improve the
% calculation performance.
%
% Please use search engine or leave your comments if you can't understand
% the details of this code.
%
% Author: Justin Green
% Date: 2013.8.25
%
close all
clear
clc
trainFile = 'train.txt';
testFile = 'test.txt';
% initialize unit number in different layers
inputNum = 2;
hiddenNum = 3;
outputNum = 8;
% initialize weight
wInputToHidden = 1 / power(inputNum * hiddenNum, 1/2) * rand(inputNum, hiddenNum);
wHiddenToOutput = 1 / power(hiddenNum * outputNum, 1/2) * rand(hiddenNum, outputNum);
% read training dataset
trainDS = textread(trainFile);
trainFeature = trainDS(:, 1:2);
trainClass = trainDS(:, 3);
trainSize = size(trainDS, 1);
% training data normalization
trainMAX = max(trainFeature);
trainMIN = min(trainFeature);
tmp = [size(trainFeature, 1), 1];
rainFeature = (trainFeature - repmat(trainMIN, tmp)) ./ repmat((trainMAX - trainMIN), tmp);
% construct target matrix
targetMatrix = zeros(trainSize, outputNum);
for i = 1 : trainSize
targetMatrix(i, trainClass(i)) = 1;
end
% set parameters of classifier
errorThreshold = 0.05;
learningRate = 0.01;
momentum = 0.95;
iterateTime = 1000;
% -------------------------------------------------------------------------
% train classifier
deltaInputToHidden = zeros(inputNum, hiddenNum);
deltaHiddenToOutput = zeros(hiddenNum, outputNum);
iterator = 0;
tic
while iterator < iterateTime
for i = 1 : trainSize
% active function
net = trainFeature(i, :) * wInputToHidden;
hiddenFunction = 1 ./ (1 + exp(-net));
outputFunction = hiddenFunction * wHiddenToOutput;
outputFunction = 1 ./ (1 + exp(-outputFunction));
% square error
error = targetMatrix(i, :) - outputFunction;
errorPerSample(i) = 0.5 * error * error';
% trade off output error
gradientOutput = hiddenFunction' * error;
deltaHiddenToOutput = learningRate * gradientOutput + momentum * deltaHiddenToOutput;
wHiddenToOutput = wHiddenToOutput + deltaHiddenToOutput;
% trade off hidden error
hiddenFunctionD = hiddenFunction .* (1 - hiddenFunction);
gradientHidden = trainFeature(i, :)' * (hiddenFunctionD .* (wHiddenToOutput * error')');
deltaInputToHidden = learningRate * gradientHidden + momentum * deltaInputToHidden;
wInputToHidden = wInputToHidden + deltaInputToHidden;
end
% compare errorThreshold
errorSum = sum(errorPerSample) / length(trainFeature);
if errorSum < errorThreshold
break
end
% plot error visually
if mod(iterator, 1) == 0
figure(1)
plot(iterator, errorSum, 'b.')
hold on
grid on
end
iterator = iterator + 1;
end
toc
% -------------------------------------------------------------------------
% read testing dataset
testDS = textread(testFile);
testFeature = testDS(:,1:2);
testClass = testDS(:,3);
testSize = size(testDS, 1);
% testing data normalization
tmp = [size(testFeature, 1), 1];
testFeature = (testFeature - repmat(trainMIN, tmp)) ./ repmat((trainMAX - trainMIN), tmp);
% -------------------------------------------------------------------------
% test testing data using trained classifier
hiddenFunction = 1 ./ (1 + exp(-(testFeature * wInputToHidden + 1)));
outputFunction = hiddenFunction * wHiddenToOutput;
% calculate testing accuracy
[mv, mi] = max(outputFunction, [], 2);
err = testClass - mi;
yes = length(find(err == 0));
acc = yes / testSize * 100;
disp(sprintf('Testing Accuracy: %3.3f%%', acc))
% -------------------------------------------------------------------------
% test training data using trained classifier
hiddenFunction = 1 ./ (1 + exp(-(trainFeature * wInputToHidden + 1)));
outputFunction = hiddenFunction * wHiddenToOutput;
% calculate testing accuracy<br />[mv, mi] = max(outputFunction, [], 2);
error = trainClass - mi;
match = length(find(error == 0));
accuracy = match / trainSize * 100;
disp(sprintf('Training Accuracy: %3.3f%%', accuracy))
>>>>>>>>>>>>>>输出>>>>>>>>>>>>>>
Elapsed time is 2.807000 seconds.
Testing Accuracy: 69.800%
Training Accuracy: 87.143%
下图为训练误差值的变化 (样本为DS1):


浙公网安备 33010602011771号