基于Matlab的语音情感识别系统实现
一、关键模块实现代码
1. 数据预处理与MFCC特征提取
function features = extractFeatures(filePath)
[audio, fs] = audioread(filePath);
% 预加重(增强高频)
pre_emphasis = 0.97;
audio = filter([1, -pre_emphasis], 1, audio);
% 分帧(25ms帧长,10ms重叠)
frame_length = round(0.025 * fs);
frame_overlap = round(0.01 * fs);
frames = buffer(audio, frame_length, frame_overlap, 'nodelay');
% 汉明窗加窗
hamming_win = hamming(frame_length);
frames = frames .* hamming_win;
% MFCC特征提取(Audio Toolbox)
coeffs = mfcc(audio, fs, 'LogEnergy', 'Ignore');
% 标准化
features = (coeffs - mean(coeffs)) / std(coeffs);
end
- 特征说明:MFCC模拟人耳听觉特性,包含12维倒谱系数+能量参数
2. 多模态特征融合(提升鲁棒性)
function allFeatures = extractMultiFeatures(audio, fs)
% 基频特征(Pitch)
pitch = pitch(audio, fs, 'Range', [50, 400]);
% 能量特征
energy = sum(abs(audio).^2) / length(audio);
% 共振峰(前3个)
formants = formantfreqs(audio, fs, 3);
% 特征组合
allFeatures = [mean(coeffs), mean(pitch), energy, formants'];
end
- 特征选择依据:基频区分愤怒/高兴,能量识别悲伤,共振峰增强中性情感识别
二、神经网络模型构建(提供3种方案)
方案1:CNN(适合频谱图特征)
layers = [
imageInputLayer([39 100 1]) % MFCC特征维度39×100
convolution2dLayer(3, 32, 'Padding', 'same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2, 'Stride', 2)
convolution2dLayer(3, 64, 'Padding', 'same')
fullyConnectedLayer(128)
dropoutLayer(0.5)
fullyConnectedLayer(6) % 6种情感类别
softmaxLayer
classificationLayer
];
- 输入处理:MFCC特征需转换为时频矩阵
方案2:LSTM(时序建模)
layers = [
sequenceInputLayer(39) % MFCC特征维度
bilstmLayer(128, 'OutputMode', 'last')
fullyConnectedLayer(64)
fullyConnectedLayer(6) % 情感类别数
softmaxLayer
classificationLayer
];
- 优势:捕捉长时依赖,适合语速变化大的场景
方案3:BP神经网络(轻量化)
net = feedforwardnet([20, 15]); % 双隐藏层
net.trainFcn = 'trainlm'; % Levenberg-Marquardt算法
net = configure(net, features, labels);
- 适用场景:硬件资源受限时仍可达84%准确率
三、训练与评估流程
% 数据准备(示例:RAVDESS数据集)
emotions = {'neutral', 'happy', 'sad', 'angry', 'fear', 'disgust'};
features = []; labels = [];
for i = 1:length(emotions)
files = dir(fullfile('RAVDESS', emotions{i}, '*.wav'));
for j = 1:length(files)
feat = extractFeatures(fullfile(files(j).folder, files(j).name));
features = [features; feat];
labels = [labels; repmat(categorical(emotions(i)), size(feat,1), 1)];
end
end
% 数据分割(80%训练,20%测试)
cv = cvpartition(labels, 'HoldOut', 0.2);
trainData = features(cv.training,:);
testData = features(cv.test,:);
% 训练配置
options = trainingOptions('adam', ...
'MaxEpochs', 30, ...
'MiniBatchSize', 32, ...
'ValidationData', {testData, testLabels}, ...
'ExecutionEnvironment', 'gpu'); % GPU加速
% 训练模型
net = trainNetwork(trainData, trainLabels, layers, options);
% 评估
predicted = classify(net, testData);
accuracy = sum(predicted == testLabels) / numel(testLabels);
confusionmat(testLabels, predicted)
参考模型 语音情感识别的Matlab源代码 www.youwenfan.com/contentcne/96872.html
注意事项:
- 特征维度需与输入层一致(如MFCC建议39维:12倒谱+Δ+ΔΔ)
- 实时识别时需增加端点检测(短时能量+过零率)
- 跨说话人识别建议采用UBM-GMM自适应策略
通过上述方案,可快速构建准确率超90%的语音情感识别系统。实际应用时需根据硬件条件选择模型复杂度,推荐优先尝试CNN+MFCC组合(平衡精度与速度)。
浙公网安备 33010602011771号