您的位置：首页 > 其它

语音发生检测VAD

2017-10-12 12:52 1161 查看

webrtc 的各个音频处理都很值得大家学习，

不说个人感觉最牛的aec，就这个vad就很好！

基本实现思想是通过把信号分为 6个频带，对各个子频带进行噪声和语音的高斯模型特征判决!

对不同的信号频率均降频到8k hz，内部对 16、24、32、48、做了分频

如果需要做不同信号频率的检测，需要单独做分频到8k。

判决参数均可调整：

个人新增了一个具有明显辨识度的语音信号参数：

Custom as 4

// Mode 0, Quality.
static const int16_t kOverHangMax1Q[3] = { 8, 4, 3 };
static const int16_t kOverHangMax2Q[3] = { 14, 7, 5 };
static const int16_t kLocalThresholdQ[3] = { 24, 21, 24 };
static const int16_t kGlobalThresholdQ[3] = { 57, 48, 57 };
// Mode 1, Low bitrate.
static const int16_t kOverHangMax1LBR[3] = { 8, 4, 3 };
static const int16_t kOverHangMax2LBR[3] = { 14, 7, 5 };
static const int16_t kLocalThresholdLBR[3] = { 37, 32, 37 };
static const int16_t kGlobalThresholdLBR[3] = { 100, 80, 100 };
// Mode 2, Aggressive.
static const int16_t kOverHangMax1AGG[3] = { 6, 3, 2 };
static const int16_t kOverHangMax2AGG[3] = { 9, 5, 3 };
static const int16_t kLocalThresholdAGG[3] = { 82, 78, 82 };
static const int16_t kGlobalThresholdAGG[3] = { 285, 260, 285 };
// Mode 3, Very aggressive.
static const int16_t kOverHangMax1VAG[3] = { 6, 3, 2 };
static const int16_t kOverHangMax2VAG[3] = { 9, 5, 3 };
static const int16_t kLocalThresholdVAG[3] = { 94, 94, 94 };
static const int16_t kGlobalThresholdVAG[3] = { 1100, 1050, 1100 };

// Mode 4, custom.
static const int16_t kOverHangMax1Cus[3] = { 6, 3, 2 };
static const int16_t kOverHangMax2Cus[3] = { 9, 5, 3 };
static const int16_t kLocalThresholdCus[3] = { 96, 96, 96 };
static const int16_t kGlobalThresholdCus[3] = { 1300, 1200, 1300 };

单独抽稀的vad模块源码： https://github.com/dreamno23/vad
Demo for iOS 地址：https://github.com/dreamno23/VADTest

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 源码 vad

相关文章推荐

新的分享

章节导航