【论文写作】SCI 修稿

一 评审意见

可以从官网看到一些文章的评审意见作为参考

缺乏创新类

意见1:置疑文章的创新性,提出相似的工作已经被A和B做过。

Thank you for pointing this out. A and B’s research groups have done blablablabla. However, the focus of our work is on blablablabla, which is very different from A and B’s work, and this is also the major contribution of our work. We have added the following discussion on this issue in our revised manuscript, see LOA2.“blablablabla(此处把A和B的工作做一个review,并提出自己工作和他们的区别之处)”

The paper lacks scientific novelty.

Response:

We have updated the paper to be clear that we are not intending to advance the machine learning literature per se. While we do provide an additional example of machine learning for speaker identification (which remains minimal), our aim was to discuss an application of machine learning in a novel context: As a hearing aid feature to increase ease of listening. It is the concept that is novel and to which we hope to generate a stimulating conversation as the hearing aid literature seeks additional advances in technology/software.

A lot of technical details are missed.

Response:

We have substantially updated the methods section to include as many details as possible with respect to the preprocessing, training, validation and classification of the CNN built here. In addition, we provide all of the code used to create the CNNs as well as train, validate and test in the appendix. We hope that these additions now make the approach clear.

The authors should present a detailed description of the data used for train and testing.

Response:

We now provide more detail with respect to our data, training and testing.

“...

The following steps were used for feature extraction. First, we used the random function in the Python library to pick a random audio file from 300 audio files. Second, we used the audio processor function “scipy” library by McFee et al. (2015)  to read the audio file and return audio data and sample rate (in seconds). Third, we randomly sample a section of the audio sample with a fixed window size inside the audio file. Fourth, we implemented an open-source code library, namely, “librosa.feature.mfcc” API McFee et al. (2015); source code can be found through https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html) to extract the spectral features into a matrix. The MFCC method extracts cepstrum from the audio data and the format of cepstrum we chose was 32 pixels by 32 pixels. We set the parameters as follows: sampling rate = 44100, hop_length  = 700 n_mfcc = 32, n_fft = 512 (Appendix 2.0). So the shape of matrix X is (50450, 32, 32,1) with each of the 50450 cepstrum representing one feature of one person. Third, the CNN model requires the addition of one color channel (gray) to our matrices, as CNN models always train image data, and for images, there are three color channels for one dimension (RGB). We used one-hot encoding (Gori, 2018) to deal with the classes because our classes were not pure numbers but instead they were people’s names. One-hot encoding aids the machine model to train and predict each class as 0 or 1. For example, if the voice is from Alex, and the index of Alex is 5 (start from 0), the one-hot encoding matrix is [0,0,0,0,0,1,0,0,0,0]. Finally, we separate the 50,450 samples, with 80% of them being used as training data, 10% are validation data, and the remaining 10% are testing data as per  TensorFlow documentation (2021 https://www.tensorflow.org/tutorials/audio/simple_audio). Because we have 10 classes in this model, each class has about 4,000 training samples, 500 validation samples, and 500 testing samples.”

 

In the introduction, the authors use a sufficient number of references to give the necessary background around the topic of hearing aid.  However, some of them are not recent and should be replaced.

Response:

We completed a recent literature review and now provide some more up-to-date references including: You et al., 2017, Yadav & Rai 2018, Mohammed et al., 2020, Zhang et al., 2018, Geetha et al.,, 2017, Bentler 2005 and Seewald et al., 2005.

 

For discussion, the limitations and potential issues of this study should be recorded.

Response:

We now include an updated section on limitations and future directions for the work.

“Obviously, there is much to be discovered before such an endeavor is, in practice, feasible. For example, it remains to be seen if hearing aids will have the computing power of modern PC machines to store features of a familiar voice. Even if storage is possible, it remains likely a few years hence before real time processing and voice conversion can be realized. Furthermore, there is the nagging unknown of whether people will be able to tolerate or accept a voice from a stranger that sounds similar and familiar to your partner or spouse, even if it does lead to easier listening and better performance. It is conceivable that this might be very strange for the listener. Nevertheless, the work outlined here was a necessary first step along this path.”

 

The authors should carefully proofread the text for grammatical errors correction.

Response:

We have completed a careful proofread to try to catch all grammatical errors.

 

posted @ 2023-04-24 17:01  学习记录本  阅读(48)  评论(0)    收藏  举报