S1 and S2 Heart Sound Recognition Using Deep Neural Networks (Literature study)

Chen, Tien-En, et al. “S1 and S2 heart sound recognition using deep neural networks.” IEEE Transactions on Biomedical Engineering 64.2 (2017): 372-380.

Obective:
Focus on the first (S1) and second (S2) heart sound recognition.
This paper proposes a novel acoustic fingerprinting-based detection framework that applies only supervised classifiers for recognizing S1 and S2.

Procedure:
Mel-frequency cepstral coefficients (MFCC) are appllied for feature extraction.
K-means algorithm is used to divide one heart sound fragment into two groups, then form a supervector.
The supervector is fed into a deep NN (DNN) classifier to classify S1 and S2.

This work:
Recognite S1 and S2 based only on acoustic fingerprinting without incorporating additional duration and interval information;
Based on deep learning in acoustic modeling.

Overall S1 and S2 Recognition Architecture

Two parts:
Offline part: feature extraction
Online part: DNN calssifier

Feature extraction

Mel-frequency cepstral coefficients (MFCC)- 梅尔倒频谱系数

The MFCC feature extraction procedure comprises six operations:
(1)Pre-emphasis
enhances the received signals to compensate for signal distortions.
(2) windowing
divides a given signal into a sequence of frames
(3) fast Fourier transform (FFT)
for spectral analysis
(4) Mel-filtering
it integrates the frequency compositions from one Mel-filter band into one-energy intensity.
(5) nonlinear transformation
takes the logarithm of all Mel-filter band intensities.
(6) discrete cosine transform (DCT)
converted the transformed intensities into MFCCs

Using differential parameters to describe temporal characteristics improves pattern recognition performance. So a differential cepstral parameter (差分倒频谱参数)(the slope of a cepstral parameter versus time) representing the dynamic change of the cepstral parameters is proposed.
Three times of dimensions of original features are obtained by appending velocity(vel) and acceleration(acc) features.

v e l (d, t) = \frac{\sum_{m = 1}^{M_{v}} m \times [c (d, t + m) - c (d, t - m)]}{2 \sum_{m = 1}^{M_{v}} m^{2}}

a c c (d, t) = \frac{\sum_{m = 1}^{M_{a}} m \times [v e l (d, t + m) - v e l (d, t - m)]}{2 \sum_{m = 1}^{M_{a}} m^{2}}

where

c (d, t)

is the dth dimension of the cepstral parameter, and t is the time indicator for the current sound frame;

M_{v}

and

M_{a}

are window lengths for computing vel and acc coefficients,
respectively. In this study,

M_{v} = 3

and

M_{a} = 2

K-Means Algorithm

The K-means algorithm is used to cluster the acoustic features within each heart sound segment into two groups (K = 2). Then a population center vector is computed for each group. These two center vectors are then concatenated to form a supervector. This supervector is the final feature that represents a segment of heart sound. The supervectors are used to build classifiers and perform S1/S2 recognition.

The main goal of the K-means algorithm is to determine representative data points form large number of data points.
Such data points are called “population centers.”

The idea of K-Means pattern classification is using a low number of representative points to represent specific categories for lowering the amount of data and avoiding adverse effects caused by noise.

The calculation steps of the K-means algorithm:
1) Initialization:
Divide training materials $v i$ , i = 1, … , N, randomly into K groups and arbitrarily choose one observation
from each group as the initial population center μk , k= 1,2, … ,K.
2) Recursive(递归) calculation:
i) Let each vi find the nearest population center and assign it to that population center by
$k^{*} = a r g_{k} m i n d (v_{i}, u_{k}), i = 1, . . ., N$
where $d (\cdot, \cdot)$ denotes the distance measure, the Euclidean distance.
ii) All $v i$ that belong to the kth group form a new group. Calculate the population center μk again.(这里没有理解)
iii) If the new groups of the population centers are the same as the original population center set, training
is completed. Otherwise, new population groups replace the original population center groups. Step 2) is repeated to continue recursive calculations.

DNN calssifier

Given the correct label,y, the parameters of the DNN classifier can be estimated as follows:
$θ^{*} = a r g m i n_{θ} {C (y, \hat{y},; x . θ) + γ R (W) + η ρ (A)}$ (*)
$\hat{y}$ is the DNN output;
$x$ is the input data;
$y$ is the label data.
where θ denotes the DNN parameter set and C(.) is a cost function.
$C (.)$ is a cost function:
cross-entropy is used as the cost function proposed in paper[34].

$R (W)$ :

R (W) = \sum_{l} | | W_{l} | |_{F}^{2}

where

| | . | |_{F}^{2}

denotes the Frobenius norm.

$ρ (A)$ is the sparsity penalty of the hidden outputs
$η$ and $γ$ are the controlling coefficients.

Standard back-propagation algorithm is applied to compute the parameters in the DNN model.

To overcome the limitation of insufficient training data.

A pretraining technique that uses unlabeled data is generally adopted.
A popular pretraining process is to use a deep belief network (DBN) with maximum likelihood estimation.
A DBN model is formed by stacking a set of restricted Boltzmann machine (RBM) models.

RBN model [20]

DBM model

Use training data to forms a DBN, then a softmax function is added to the top of the DBN model, and the standard back-propagation training with the cost function(*) in is applied to estimate the DNN parameters.

Experiments

S1 and S2 were manually segmented and labled.
KNN,LR, SVM and GMM classifiers were implemented and recognition was tested for comparison.

setup

The data collected and divide into groups, S1 and S2 segmentation.

evluation metrics

![fig3](https://img-blog.csdn.net/20180129201016684) prcision, recall, F-measure

P r e c i d i o n = \frac{T_{p}}{T_{p} + F_{p}}

R e c a l l = \frac{T_{p}}{T_{p} + F_{n}}

F - m e a s u r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

A c c u r a c y = \frac{T_{p} + T_{n}}{T_{p} + T_{n} + F_{p} + F_{n}}

Experiment Results

Determining Optimal Feature Configuration and NN Structure:

A one-layer ANN model comprised of 100 hidden neurons was used as the classifier, and 13-dimensional MFCCs served as the acoustic features. Then extended the original MFCCs from 13 to 26 dimensions (by appending 13 velocity features) and 39 dimensions (by appending 13 velocity and 13 acceleration features). The effectiveness of the K-means algorithm. Fbank features in paper [40][41] with dimensional of 24,168,264 were used here. (?)

Investigate the correlation between classification performance and NN structure (numbers of hidden layers and neurons in each hidden layer).

further tested the effectiveness of pre-training and activation functions

Finally, we examine the weight decay and sparsity penalty (R(W) and ρ(A) in (*), respectively).

Comparison of DNN With Other Classifiers

The test set data that through the HSAD procedure based on shannon energy was applied to detect heart sound segments. The KNN classifier uses the Euclidean metric for the distance calculation. For the GMM model, eight GMMs were used. For the SVM classifier, the Gaussian radial basis function was used as the kernel function.

posted @ 2018-01-29 20:36 Siucaan 阅读(515) 评论(0) 收藏举报