Android 音视频之声音的描述
Android 音视频之声音的描述
声音本质上就是波,通过介质传递进入耳膜,然后经过大脑加工就变成了,人可以听到的声音。
声波本质上是纵波。
关于纵波有一个视频讲解的极好.
纵波的形成 传播与波形图
收集音频
麦克风完成了将声波即模拟信号转为了数字信号。此数字信号便是最为原始的未经编码的数据。我们来最简单的波形模拟一下。当然日常声波是不规则的波形,好在傅里叶变换可以分解不规则的波形为简单有规律的波形。
如上视频得出的结论,纵波的波形图大概如下所示

当然上图的折线使用的便是著名的贝塞尔曲线,用途甚广。
看上图的波形,自然界中的音叉发出的声音的波形与其相似。
音量:即振幅大小
音色: 和波形相关,音色较为复杂
音调: 和频率相关
采样
采样,如下图所示

这就涉及到一个概念了,叫做
采样率:每秒钟的采样数,常见的有44100,16000.
即\(\frac{1}{44100}\)秒或者\(\frac{1}{16000}\)秒采样一次。
每个采样点的数值类型常见的有short16或者float32.将其保存为二进制即我们常说的pcm格式文件(Pulse-code modulation脉冲编码调制)。
由此可见真实的pcm会丢失部分信息,采样率越高,越接近真实声音。
声压
定义: 声波是在空气中传播的疏密波,它会引起大气压强的微小波动。声压就是指这个波动压强与大气静止压强的差值。
- 大于0: 代表瞬时压力高于静态大气压(空气被压缩)
- 小于0:代表瞬时压力低于静态大气压(空气被稀疏)
单位: 帕斯卡(Pa)
特点: - 它是一个客观的物理量,可以直接使用仪器测量(如麦克风、声级计)测量。
- 人耳能听到的声压范围非常广,最微弱的可听生(听觉阀)大约是\(20\mu Pa\)(即0.00002Pa),而喷气式飞机起飞时的声压可达200Pa,两者相差一千万倍。用绝对值来表示声音大小非常不方便。
声压级-最常用的感知量
为了解决声音绝对值范围过大的问题,并更好的模拟人耳对声音强度的对数响应特性,我们引入了声压级。
定义: 声压级是声压与基准声压之比的以10为底对数乘以20
计算公式:
- lp:声压级,单位是分贝
- p: 被测声压的有效值
- p0:基准声压,通常取人耳能听到的最微弱的声压0.00002Pa
单位: 分贝(dB)
特点:
- 将巨大的声压从(0.00002Pa到几百pa)压缩到一个更易处理的尺度(通常是0dB~130dB)
- 他是一个无量纲(无单位) 的相对值,表示的是比值
- 分贝值每增加10dB,人耳感觉到的响度大约增加一倍,例如60dB的声音比50dB的声音响一倍。
日常生活中的声压级
- 0dB: 听觉阈值(刚刚可以听到)
- 20dB:安静的图书馆
- 60dB:正常的教堂距离
- 85dB:繁忙的城市交通,长期于此环境可能损伤听力
- 120dB: 摇滚音乐会前排,飞机引擎附近(痛阈)
- 130~140dB:可能引起即时听力损伤。
采样点物理意义
那short16举例,其取值范围为[-32768~32767],PCM 的正负值反映的是瞬时声压相对于大气压的偏差
- 等于0:表示无声音的静止压力(大气压下的基线)
- 大于0:瞬时压力高于静态大气压(正压力)
- 小于0:瞬时压力低于静态大气压(负压力)
所以采样点的数值代表的是模拟声压信号的相对值,与真实的声压的绝对的物理单位(Pa,帕斯卡)之间差了一个麦克风灵敏度 + 前端增益的比例系数。
滑动窗口
在滑动窗口内有一系列的short16,对于滑动窗口内的所有数据,求其f(x1, x2, x3...) = a;
题例:
154. 滑动窗口
图解

题解
最小值:
- 当一次for循环时,队列非空并且数据滑出窗口时,将数据剔除出窗口
- 当队列非空,循环判断当前元素小于等于队尾元素时剔除队尾元素
- 将当前元素插入队尾,此队列是一个单调队列,且递增
#include<bits/stdc++.h>
const int N = 1e6 + 10;
int a[N], q[N], h, t = -1;
using namespace std;
int main(){
int n, k;
scanf("%d%d", &n, &k);
for(int i = 0; i < n; i ++ ) scanf("%d", &a[i]);
for(int i = 0; i < n; i ++ ){
if(h <= t && q[h] <= i - k) h ++ ;
while(h <= t && a[i] <= a[q[t]]) t -- ;
q[ ++ t] = i;
if(i >= k - 1) printf("%d ", a[q[h]]);
}
puts("");
h = 0; t = -1;
for(int i = 0; i < n; i ++ ){
if(h <= t && q[h] <= i - k) h ++ ;
while(h <= t && a[i] >= a[q[t]]) t -- ;
q[ ++ t] = i;
if(i >= k - 1) printf("%d ", a[q[h]]);
}
return 0;
}
class Solution {
public:
const static int N = 1e5 + 10;
int q[N], h, t = -1;
vector<int> res;
vector<int> maxSlidingWindow(vector<int>& nums, int k) {
for(int i = 0; i < nums.size(); i ++ ){
if(h <= t && q[h] <= i - k) h ++ ;
while(h <= t && nums[i] >= nums[q[t]]) t -- ;
q[ ++ t] = i;
if(i >= k - 1) res.push_back(nums[q[h]]);
}
return res;
}
};
音量可视化
既然是音量可视化,我们没有别的数据只有采样点的short16数据,其代表是
正常声压与short16之比是一个比例系数k。音量越大其绝对值越大。
所以将其归一化即可,越接近0代表音量越小,越接近1代表音量越大。
例如下图所式的波形,从中间向两边传递:

首先写一个工具函数
AudioRecordManager.kt
package edu.tyut.helloktorfit.manager
import android.Manifest
import android.content.Context
import android.content.pm.PackageManager
import android.media.AudioFormat
import android.media.AudioRecord
import android.media.MediaRecorder
import android.net.Uri
import android.util.Log
import androidx.core.app.ActivityCompat
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.withContext
import okio.BufferedSink
import okio.buffer
import okio.sink
import kotlin.math.abs
import kotlin.math.max
private const val TAG: String = "AudioRecordManager"
internal class AudioRecordManager internal constructor(
private val context: Context
) {
private val channelMask: Int = AudioFormat.CHANNEL_IN_MONO
private val sampleRate = 16000
private val bufferSize: Int =
AudioRecord.getMinBufferSize(sampleRate, channelMask, AudioFormat.ENCODING_PCM_16BIT)
private val audioRecord: AudioRecord by lazy {
initAudioRecord()
}
private fun initAudioRecord(): AudioRecord {
if (ActivityCompat.checkSelfPermission(
context,
Manifest.permission.RECORD_AUDIO
) != PackageManager.PERMISSION_GRANTED
) {
throw RuntimeException("Not RECORD_AUDIO permission...")
}
return AudioRecord.Builder()
.setAudioSource(MediaRecorder.AudioSource.MIC)
.setAudioFormat(
AudioFormat.Builder().setEncoding(AudioFormat.ENCODING_PCM_16BIT)
.setSampleRate(sampleRate).setChannelMask(channelMask).build()
)
.setBufferSizeInBytes(bufferSize)
.build()
}
/**
* 0000 -> 0
* 0001 -> 1
* 0010 -> 2
* 0011 -> 3
* 0100 -> 4
* 0101 -> 5
* 0110 -> 6
* 0111 -> 7
* 1000 -> 8
* 1001 -> 9
* 1010 -> A
* 1011 -> B
* 1100 -> C
* 1101 -> D
* 1110 -> E
* 1111 -> F
*
* byte 0000 0000
* short 0000 0000 0000 0000
* callback: (percent: Float) -> Unit sub thread
*/
internal suspend fun startRecord(uri: Uri, callback: (percent: Float) -> Unit): Unit = withContext(Dispatchers.IO){ // 40 ms 25 个 是一秒
audioRecord.startRecording()
context.contentResolver.openOutputStream(uri)?.sink()?.buffer()
?.use { bufferedSink: BufferedSink ->
var totalLength = 0L
val bytes = ByteArray(bufferSize) // 1280 -> 640
var length: Int
while (audioRecord.read(bytes, 0, bytes.size).also { length = it } > 0) {
bufferedSink.write(bytes, 0, length)
totalLength += length
// 求最大的 percent
var minShort: Short = Short.MAX_VALUE
var maxShort: Short = Short.MIN_VALUE
for (i in 0 until length - 1 step 2){
val low: Int = bytes[i].toInt() and 0xFF // 小端
val high: Int = bytes[i + 1].toInt() shl 8
val shortValue: Short = (low or high).toShort()
maxShort = maxOf(maxShort, shortValue)
minShort = minOf(minShort, shortValue)
}
val percent: Float = max(maxShort.toFloat(), abs(minShort.toFloat())) / Short.MAX_VALUE.toFloat() // 归一
callback(percent)
Log.i(TAG, "startRecord -> size: ${bytes.size}, percent: ${percent}, maxShort: $maxShort, minShort: $minShort, data: ${bytes.joinToString{ it.toHexString() }}")
}
bufferedSink.flush()
Log.i(TAG, "startRecord -> 录制完成, 文件大小为: $totalLength bytes")
}
}
// @RequiresPermission(value = Manifest.permission.RECORD_AUDIO)
internal fun stopRecord(){
if (ActivityCompat.checkSelfPermission(
context,
Manifest.permission.RECORD_AUDIO
) != PackageManager.PERMISSION_GRANTED
) {
throw RuntimeException("Not RECORD_AUDIO permission...")
}
if (audioRecord.recordingState == AudioRecord.RECORDSTATE_RECORDING) {
audioRecord.stop()
}
}
internal fun release(){
if (ActivityCompat.checkSelfPermission(
context,
Manifest.permission.RECORD_AUDIO
) != PackageManager.PERMISSION_GRANTED
) {
throw RuntimeException("Not RECORD_AUDIO permission...")
}
audioRecord.release()
}
}
放大采样点的数值你会发现什么?自己动手试试吧
for (i in 0 until length - 1 step 2){
val low: Int = bytes[i].toInt() and 0xFF // 小端
val high: Int = bytes[i + 1].toInt() shl 8
val shortValue: Int = (low or high) * 10
val newShortValue = if (shortValue > Short.MAX_VALUE) {
Short.MAX_VALUE.toInt()
} else {
if (shortValue < Short.MIN_VALUE){
Short.MIN_VALUE.toInt()
} else {
shortValue
}
}
bytes[i] = newShortValue.toByte()
bytes[i + 1] = (newShortValue shr 8).toByte()
}
画出UI即可
package edu.tyut.helloktorfit.ui.screen
import android.content.Context
import android.content.pm.PackageManager
import android.net.Uri
import android.os.Environment
import android.util.Log
import androidx.activity.compose.rememberLauncherForActivityResult
import androidx.activity.result.contract.ActivityResultContracts
import androidx.compose.foundation.Canvas
import androidx.compose.foundation.background
import androidx.compose.foundation.clickable
import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.foundation.layout.fillMaxWidth
import androidx.compose.foundation.layout.height
import androidx.compose.foundation.layout.padding
import androidx.compose.material3.SnackbarHostState
import androidx.compose.material3.Text
import androidx.compose.runtime.Composable
import androidx.compose.runtime.getValue
import androidx.compose.runtime.mutableStateListOf
import androidx.compose.runtime.mutableStateOf
import androidx.compose.runtime.remember
import androidx.compose.runtime.rememberCoroutineScope
import androidx.compose.ui.Modifier
import androidx.compose.ui.geometry.Offset
import androidx.compose.ui.geometry.Size
import androidx.compose.ui.graphics.Color
import androidx.compose.ui.platform.LocalContext
import androidx.compose.ui.platform.LocalDensity
import androidx.compose.ui.unit.Density
import androidx.compose.ui.unit.dp
import androidx.core.content.ContextCompat
import androidx.core.content.FileProvider
import androidx.hilt.navigation.compose.hiltViewModel
import androidx.navigation.NavHostController
import edu.tyut.helloktorfit.manager.AudioRecordManager
import edu.tyut.helloktorfit.manager.AudioTrackManager
import edu.tyut.helloktorfit.ui.theme.RoundedCornerShape10
import edu.tyut.helloktorfit.viewmodel.HelloViewModel
import kotlinx.coroutines.CoroutineScope
import kotlinx.coroutines.channels.Channel
import kotlinx.coroutines.launch
import java.io.File
import kotlin.math.abs
import kotlin.random.Random
private const val TAG: String = "AudioScreen"
private const val WINDOWS_SIZE: Int = 5
private const val BAR_SIZE: Int = 10
private const val BAR_HEIGHT: Int = 200
private var windowSize: Int = 0
private var percentSum: Float = 0F
@Composable
internal fun AudioScreen(
navHostController: NavHostController,
snackBarHostState: SnackbarHostState,
helloViewModel: HelloViewModel = hiltViewModel<HelloViewModel>()
) {
val context: Context = LocalContext.current
val density: Density = LocalDensity.current
val coroutineScope: CoroutineScope = rememberCoroutineScope()
val volumes = remember {
mutableStateListOf<Float>(*FloatArray(BAR_SIZE).toTypedArray())
}
val recordManager: AudioRecordManager by remember {
mutableStateOf(value = AudioRecordManager(context = context))
}
val audioTrackManager: AudioTrackManager by remember {
mutableStateOf(value = AudioTrackManager())
}
val permissions: Array<String> = arrayOf(android.Manifest.permission.RECORD_AUDIO, android.Manifest.permission.WRITE_EXTERNAL_STORAGE)
val launcher = rememberLauncherForActivityResult(
contract = ActivityResultContracts.RequestMultiplePermissions()
) { map ->
coroutineScope.launch {
snackBarHostState.showSnackbar("获取权限是否成功: ${map.values.all { it }}")
}
}
Column(
modifier = Modifier.fillMaxSize()
) {
Text(
text = "开始录音",
Modifier
.padding(top = 10.dp)
.background(color = Color.Black, shape = RoundedCornerShape10)
.padding(all = 5.dp)
.clickable {
if (permissions.any {
ContextCompat.checkSelfPermission(
context,
it
) != PackageManager.PERMISSION_GRANTED
}) {
launcher.launch(permissions)
return@clickable
}
val uri: Uri = FileProvider.getUriForFile(
context, "${context.packageName}.provider", File(
Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS),
"hello1.pcm"
).apply {
Log.i(TAG, "AudioScreen path: $this")
}
)
val channel = Channel<Float>()
coroutineScope.launch {
for (percent in channel) {
// 方法2
percentSum += percent
windowSize++
if (windowSize >= WINDOWS_SIZE) {
for (i in 1..volumes.size / 2) {
volumes[i - 1] = volumes[i]
volumes[volumes.size - i] = volumes[volumes.size - 1 - i]
}
// -tag:Battery -tag:oktorfit:binde -tag:ut.helloktorfi
volumes[volumes.size / 2] = percentSum / WINDOWS_SIZE
Log.i(TAG, "AudioScreen -> 平均值: ${percentSum / WINDOWS_SIZE}")
percentSum = 0F
windowSize = 0
}
}
}
coroutineScope.launch {
Log.i(TAG, "AudioScreen -> startRecord...")
recordManager.startRecord(uri = uri) { percent: Float ->
// Log.i(TAG, "AudioScreen -> percent: $percent, Thread: ${Thread.currentThread()}")
channel.trySend(percent)
}
val int: Int = -2147483648
Log.i(TAG, "AudioScreen -> endRecord -> ${abs(int.toFloat())}")
}
},
color = Color.White
)
Text(
text = "停止录音",
Modifier
.padding(top = 10.dp)
.background(color = Color.Black, shape = RoundedCornerShape10)
.padding(all = 5.dp)
.clickable {
recordManager.stopRecord()
},
color = Color.White
)
Text(
text = "播放录音",
Modifier
.padding(top = 10.dp)
.background(color = Color.Black, shape = RoundedCornerShape10)
.padding(all = 5.dp)
.clickable {
val uri: Uri = FileProvider.getUriForFile(
context, "${context.packageName}.provider", File(
Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS),
"hello.pcm"
)
)
coroutineScope.launch {
audioTrackManager.startPlay(context, uri)
}
},
color = Color.White
)
Text(
text = "暂停播放录音",
Modifier
.padding(top = 10.dp)
.background(color = Color.Black, shape = RoundedCornerShape10)
.padding(all = 5.dp)
.clickable {
audioTrackManager.pause()
},
color = Color.White
)
fun randomColor(): Color {
val r = Random.nextInt(0, 256)
val g = Random.nextInt(0, 256)
val b = Random.nextInt(0, 256)
return Color(r, g, b)
}
Canvas(
modifier = Modifier
.fillMaxWidth()
.height(200.dp)
.background(color = Color.Cyan)
) {
for (i in 0 until BAR_SIZE) {
drawRect(color = randomColor(), topLeft = Offset(10F + i * 25F, 0F), size = Size(20F, with(density){ ((volumes[i] * 2).coerceIn(0F, 1F) * BAR_HEIGHT).dp.toPx() }))
}
for (i in 0 until BAR_SIZE) {
drawRect(color = randomColor(), topLeft = Offset(260F + i * 25F, 0F), size = Size(20F, with(density){ ((volumes[i] * 2).coerceIn(0F, 1F) * BAR_HEIGHT).dp.toPx() }))
}
}
}
}
效果图

探索
我们可以发现,大部分情况下,柱形长条都非常短因为percent的众数时0.02,所有可以进行以下优化
根据下面的两个公式
计算公式:
计算dBFS(Decibels relative to Full Scale 满刻度分贝)
- rms: 是 采样值的 均方根
- ref: 代表基准值表示,采样点的最大值,short16为32768
- dBFS: 在数字音频里,用来表示相对于数字系统中最大可表示幅度的分贝值。
优化如下:
internal suspend fun startRecord2(uri: Uri, callback: (percent: Float) -> Unit): Unit = withContext(Dispatchers.IO){ // 40 ms 25 个 是一秒
audioRecord.startRecording()
context.contentResolver.openOutputStream(uri)?.sink()?.buffer()
?.use { bufferedSink: BufferedSink ->
var totalLength = 0L
val bytes = ByteArray(bufferSize) // 1280 -> 640
var length: Int
while (audioRecord.read(bytes, 0, bytes.size).also { length = it } > 0) {
bufferedSink.write(bytes, 0, length)
totalLength += length
val minDb = -60F
val maxDb = 0F
var sum = 0.0
for (i in 0 until length - 1 step 2) {
val low: Int = bytes[i].toInt() and 0xFF // 小端
val high: Int = bytes[i + 1].toInt() shl 8
val shortValue: Short = (low or high).toShort()
sum += shortValue.toDouble() * shortValue.toDouble()
}
val sampleCount = length / 2
val rms = sqrt(sum / sampleCount)
val cb = if (rms > 0) 20.0F * kotlin.math.log10(rms / Short.MAX_VALUE).toFloat() else -120F
val percent: Float = ((cb - minDb) / (maxDb - minDb)).coerceIn(0F, 1F)
callback(percent)
Log.i(TAG, "startRecord -> sum: $sum, rms: $rms, cb: $cb")
}
bufferedSink.flush()
Log.i(TAG, "startRecord -> 录制完成, 文件大小为: $totalLength bytes")
}
}

Android 音视频之声音的描述
浙公网安备 33010602011771号