您的位置:首页 > 其它

Speex 一个开源的声学回声消除器(Acoustic Echo Cancellation)

2012-09-11 14:29 471 查看
前段时间,搞了一阵声学回声消除,非常郁闷,因为没有成功,但可以说学到一点东西吧,至少理论上懂了一点。

为什么需要声学回声消除呢?在一般的VOIP软件或视频会议系统中,假设我们只有A和B两个人在通话,首先,A的声音传给B,B然后用喇叭放出来,而这时B的MIC呢则会采集到喇叭放出来的声音,然后传回给A,如果这个传输的过程中时延足够大,A就会听到一个和自己刚才说过的话一样的声音,这就是回声,声学回声消除器的作用就是在B端对B采集到的声音进行处理,把采集到声音包含的A的声音去掉再传给A,这样,A就不会听到自己说过的话了。

声学回声消除的原理我就不说了,这在网上有很多文档,网上缺少的是实现,所以,我在这把一个开源的声学回声消除器介绍一下,希望对有些有人用,如果有人知道怎么把这消除器用的基于实时流的VOIP软件中,希望能一起分享一下。

这个声学回声消除器是一个著名的音频编解码器speex中的一部分,1.1.9版本后的回声消除器才起作用,以前版本的都不行,我用的也是这个版本,测试表明,用同一个模拟文件,它有效果比INTEL IPP库4.1版中的声学回声消除器的还要好。

先说编译。首先,从www.speex.org上下载speex1.1.9的源代码,解压,打开speex/win32/libspeex中的libspeex.dsw,这个工作区里有两个工程,一个是libspeex,另一个是libspeex_dynamic。然后,将libspeex中的mdf.c文件添加到工程libspeex中,编译即可。

以下是我根据文档封装的一个类,里面有一个测试程序: //file name: speexEC.h

#ifndef SPEEX_EC_H

#define SPEEX_EC_H

#include <stdio.h>

#include <stdlib.h>

#include "speex/speex_echo.h"

#include "speex/speex_preprocess.h"

class CSpeexEC

{

public:

CSpeexEC();

~CSpeexEC();

void Init(int frame_size=160, int filter_length=1280, int sampling_rate=8000);

void DoAEC(short *mic, short *ref, short *out);

protected:

void Reset();

private:

bool m_bHasInit;

SpeexEchoState* m_pState;

SpeexPreprocessState* m_pPreprocessorState;

int m_nFrameSize;

int m_nFilterLen;

int m_nSampleRate;

float* m_pfNoise;

};

#endif

//fine name:speexEC.cpp

#include "SpeexEC.h"

CSpeexEC::CSpeexEC()

{

m_bHasInit = false;

m_pState = NULL;

m_pPreprocessorState = NULL;

m_nFrameSize = 160;

m_nFilterLen = 160*8;

m_nSampleRate = 8000;

m_pfNoise = NULL;

}

CSpeexEC::~CSpeexEC()

{

Reset();

}

void CSpeexEC::Init(int frame_size, int filter_length, int sampling_rate)

{

Reset();

if (frame_size<=0 || filter_length<=0 || sampling_rate<=0)

{

m_nFrameSize =160;

m_nFilterLen = 160*8;

m_nSampleRate = 8000;

}

else

{

m_nFrameSize =frame_size;

m_nFilterLen = filter_length;

m_nSampleRate = sampling_rate;

}

m_pState = speex_echo_state_init(m_nFrameSize, m_nFilterLen);

m_pPreprocessorState = speex_preprocess_state_init(m_nFrameSize, m_nSampleRate);

m_pfNoise = new float[m_nFrameSize+1];

m_bHasInit = true;

}

void CSpeexEC::Reset()

{

if (m_pState != NULL)

{

speex_echo_state_destroy(m_pState);

m_pState = NULL;

}

if (m_pPreprocessorState != NULL)

{

speex_preprocess_state_destroy(m_pPreprocessorState);

m_pPreprocessorState = NULL;

}

if (m_pfNoise != NULL)

{

delete []m_pfNoise;

m_pfNoise = NULL;

}

m_bHasInit = false;

}

void CSpeexEC:DoAEC(short* mic, short* ref, short* out)

{

if (!m_bHasInit)

return;

speex_echo_cancel(m_pState, mic, ref, out, m_pfNoise);

speex_preprocess(m_pPreprocessorState, (__int16 *)out, m_pfNoise);

}

可以看出,这个回声消除器类很简单,只要初始化一下就可以调用了。但是,要注意的是,传给回声消除器的两个声音信号,必须同步得非常的好,就是说,在B端,接收到A说的话以后,要把这些话音数据传给回声消除器做参考,然后再传给声卡,声卡再放出来,这有一段延时,这时,B再采集,然后传给回声消除器,与那个参考数据比较,从采集到的数据中把频域和参考数据相同的部分消除掉。如果传给消除器的两个信号同步得不好,即两个信号找不到频域相同的部分,就没有办法进行消除了。

测试程序:

#define NN 160

void main()

{

FILE* ref_fd, *mic_fd, *out_fd;

short ref[NN], mic[NN], out[NN];

ref_fd = fopen ("ref.pcm", "rb"); //打开参考文件,即要消除的声音

mic_fd = fopen ("mic.pcm", "rb");//打开mic采集到的声音文件,包含回声在里面

out_fd = fopen ("echo.pcm", "wb");//消除了回声以后的文件

CSpeexEC ec;

ec.Init();

while (fread(mic, 1, NN*2, mic_fd))

{

fread(ref, 1, NN*2, ref_fd);

ec.DoAEC(mic, ref, out);

fwrite(out, 1, NN*2, out_fd);

}

fclose(ref_fd);

fclose(mic_fd);

fclose(out_fd);

}

  以上的程序是用文件来模拟回声和MIC,但在实时流中是大不一样的,在一般的VOIP软件中,接收对方的声音并传到声卡中播放是在一个线程中进行的,而采集本地的声音并传送到对方又是在另一个线程中进行的,而声学回声消除器在对采集到的声音进行回声消除的同时,还需要播放线程中的数据作为参考,而要同步这两个线程中的数据是非常困难的,因为稍稍有些不同步,声学回声消除器中的自适应滤波器就会发散,不但消除不了回声,还会破坏原始采集到的声音,使被破坏的声音难以分辨。我做过好多尝试,始终无法用软件来实现对这两个线程中的数据进行同步,导致实现失败,希望有经验的网友们一起分享一下这方面的经验。

示例代码:

Sample code

This section shows sample code for encoding and decoding speech using the Speex API. The commands can be used to encode and decode a file by calling:

% sampleenc in_file.sw | sampledec out_file.sw

where both files are raw (no header) files encoded at 16 bits per sample (in the machine natural endianness).

sampleenc.c

sampleenc takes a raw 16 bits/sample file, encodes it and outputs a Speex stream to stdout. Note that the packing used is NOT compatible with that of speexenc/speexdec.

#include <speex/speex.h>
#include <stdio.h>
/*The frame size in hardcoded for this sample code but it doesn't have to be*/
#define FRAME_SIZE 160
int main(int argc, char **argv)
{
char *inFile;
FILE *fin;
short in[FRAME_SIZE];
float input[FRAME_SIZE];
char cbits[200];
int nbBytes;
/*Holds the state of the encoder*/
void *state;
/*Holds bits so they can be read and written to by the Speex routines*/
SpeexBits bits;
int i, tmp;
/*Create a new encoder state in narrowband mode*/
state = speex_encoder_init(&speex_nb_mode);
/*Set the quality to 8 (15 kbps)*/
tmp=8;
speex_encoder_ctl(state, SPEEX_SET_QUALITY, &tmp);
inFile = argv[1];
fin = fopen(inFile, "r");
/*Initialization of the structure that holds the bits*/
speex_bits_init(&bits);
while (1)
{
/*Read a 16 bits/sample audio frame*/
fread(in, sizeof(short), FRAME_SIZE, fin);
if (feof(fin))
break;
/*Copy the 16 bits values to float so Speex can work on them*/
for (i=0;i<FRAME_SIZE;i++)
input[i]=in[i];
/*Flush all the bits in the struct so we can encode a new frame*/
speex_bits_reset(&bits);
/*Encode the frame*/
speex_encode(state, input, &bits);
/*Copy the bits to an array of char that can be written*/
nbBytes = speex_bits_write(&bits, cbits, 200);
/*Write the size of the frame first. This is what sampledec expects but
it's likely to be different in your own application*/
fwrite(&nbBytes, sizeof(int), 1, stdout);
/*Write the compressed data*/
fwrite(cbits, 1, nbBytes, stdout);
}
/*Destroy the encoder state*/
speex_encoder_destroy(state);
/*Destroy the bit-packing struct*/
speex_bits_destroy(&bits);
fclose(fin);
return 0;
}


sampledec.c

sampledec reads a Speex stream from stdin, decodes it and outputs it to a raw 16 bits/sample file. Note that the packing used is NOT compatible with that of speexenc/speexdec.

#include <speex/speex.h>
#include <stdio.h>
/*The frame size in hardcoded for this sample code but it doesn't have to be*/
#define FRAME_SIZE 160
int main(int argc, char **argv)
{
char *outFile;
FILE *fout;
/*Holds the audio that will be written to file (16 bits per sample)*/
short out[FRAME_SIZE];
/*Speex handle samples as float, so we need an array of floats*/
float output[FRAME_SIZE];
char cbits[200];
int nbBytes;
/*Holds the state of the decoder*/
void *state;
/*Holds bits so they can be read and written to by the Speex routines*/
SpeexBits bits;
int i, tmp;
/*Create a new decoder state in narrowband mode*/
state = speex_decoder_init(&speex_nb_mode);
/*Set the perceptual enhancement on*/
tmp=1;
speex_decoder_ctl(state, SPEEX_SET_ENH, &tmp);
outFile = argv[1];
fout = fopen(outFile, "w");
/*Initialization of the structure that holds the bits*/
speex_bits_init(&bits);
while (1)
{
/*Read the size encoded by sampleenc, this part will likely be
different in your application*/
fread(&nbBytes, sizeof(int), 1, stdin);
fprintf (stderr, "nbBytes: %d/n", nbBytes);
if (feof(stdin))
break;
/*Read the "packet" encoded by sampleenc*/
fread(cbits, 1, nbBytes, stdin);
/*Copy the data into the bit-stream struct*/
speex_bits_read_from(&bits, cbits, nbBytes);
/*Decode the data*/
speex_decode(state, &bits, output);
/*Copy from float to short (16 bits) for output*/
for (i=0;i<FRAME_SIZE;i++)
out[i]=output[i];
/*Write the decoded audio to file*/
fwrite(out, sizeof(short), FRAME_SIZE, fout);
}
/*Destroy the decoder state*/
speex_decoder_destroy(state);
/*Destroy the bit-stream truct*/
speex_bits_destroy(&bits);
fclose(fout);
return 0;
}


开源 H323 协议中封装的使用参考代码:



/*

* speexcodec.cxx

*

* Speex codec handler

*

* Open H323 Library

*

* Copyright (c) 2002 Equivalence Pty. Ltd.

*

* The contents of this file are subject to the Mozilla Public License

* Version 1.0 (the "License"); you may not use this file except in

* compliance with the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/
*

* Software distributed under the License is distributed on an "AS IS"

* basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See

* the License for the specific language governing rights and limitations

* under the License.

*

* The Original Code is Open H323 Library.

*

* The Initial Developer of the Original Code is Equivalence Pty. Ltd.

*

* Contributor(s): ______________________________________.

*

* $Log: speexcodec.cxx,v $

* Revision 1.20 2002/12/08 22:59:41 rogerh

* Add XiphSpeex codec. Not yet finished.

*

* Revision 1.19 2002/12/06 10:11:54 rogerh

* Back out the Xiph Speex changes on a tempoary basis while the Speex

* spec is being redrafted.

*

* Revision 1.18 2002/12/06 03:27:47 robertj

* Fixed MSVC warnings

*

* Revision 1.17 2002/12/05 12:57:17 rogerh

* Speex now uses the manufacturer ID assigned to Xiph.Org.

* To support existing applications using Speex, applications can use the

* EquivalenceSpeex capabilities.

*

* Revision 1.16 2002/11/25 10:24:50 craigs

* Fixed problem with Speex codec names causing mismatched capabilities

* Reported by Ben Lear

*

* Revision 1.15 2002/11/09 07:08:20 robertj

* Hide speex library from OPenH323 library users.

* Made public the media format names.

* Other cosmetic changes.

*

* Revision 1.14 2002/10/24 05:33:19 robertj

* MSVC compatibility

*

* Revision 1.13 2002/10/22 11:54:32 rogerh

* Fix including of speex.h

*

* Revision 1.12 2002/10/22 11:33:04 rogerh

* Use the local speex.h header file

*

* Revision 1.11 2002/10/09 10:55:21 rogerh

* Update the bit rates to match what the codec now does

*

* Revision 1.10 2002/09/02 21:58:40 rogerh

* Update for Speex 0.8.0

*

* Revision 1.9 2002/08/21 06:49:13 rogerh

* Fix the RTP Payload size too small problem with Speex 0.7.0.

*

* Revision 1.8 2002/08/15 18:34:51 rogerh

* Fix some more bugs

*

* Revision 1.7 2002/08/14 19:06:53 rogerh

* Fix some bugs when using the speex library

*

* Revision 1.6 2002/08/14 04:35:33 craigs

* CHanged Speex names to remove spaces

*

* Revision 1.5 2002/08/14 04:30:14 craigs

* Added bit rates to Speex codecs

*

* Revision 1.4 2002/08/14 04:27:26 craigs

* Fixed name of Speex codecs

*

* Revision 1.3 2002/08/14 04:24:43 craigs

* Fixed ifdef problem

*

* Revision 1.2 2002/08/13 14:25:25 craigs

* Added trailing newlines to avoid Linux warnings

*

* Revision 1.1 2002/08/13 14:14:59 craigs

* Initial version

*

*/

#include <ptlib.h>

#ifdef __GNUC__

#pragma implementation "speexcodec.h"

#endif

#include "speexcodec.h"

#include "h323caps.h"

#include "h245.h"

#include "rtp.h"

extern "C" {

#include "speex/libspeex/speex.h"

};

#define new PNEW

#define XIPH_COUNTRY_CODE 0xB5 // (181) Country code for United States

#define XIPH_T35EXTENSION 0

#define XIPH_MANUFACTURER_CODE 0x0026 // Allocated by Delta Inc

#define EQUIVALENCE_COUNTRY_CODE 9 // Country code for Australia

#define EQUIVALENCE_T35EXTENSION 0

#define EQUIVALENCE_MANUFACTURER_CODE 61 // Allocated by Australian Communications Authority, Oct 2000

#define SAMPLES_PER_FRAME 160

#define SPEEX_BASE_NAME "Speex"

#define SPEEX_NARROW2_H323_NAME SPEEX_BASE_NAME "Narrow-5.95k{sw}"

#define SPEEX_NARROW3_H323_NAME SPEEX_BASE_NAME "Narrow-8k{sw}"

#define SPEEX_NARROW4_H323_NAME SPEEX_BASE_NAME "Narrow-11k{sw}"

#define SPEEX_NARROW5_H323_NAME SPEEX_BASE_NAME "Narrow-15k{sw}"

#define SPEEX_NARROW6_H323_NAME SPEEX_BASE_NAME "Narrow-18.2k{sw}"

H323_REGISTER_CAPABILITY(SpeexNarrow2AudioCapability, SPEEX_NARROW2_H323_NAME);

H323_REGISTER_CAPABILITY(SpeexNarrow3AudioCapability, SPEEX_NARROW3_H323_NAME);

H323_REGISTER_CAPABILITY(SpeexNarrow4AudioCapability, SPEEX_NARROW4_H323_NAME);

H323_REGISTER_CAPABILITY(SpeexNarrow5AudioCapability, SPEEX_NARROW5_H323_NAME);

H323_REGISTER_CAPABILITY(SpeexNarrow6AudioCapability, SPEEX_NARROW6_H323_NAME);

#define XIPH_SPEEX_NARROW2_H323_NAME SPEEX_BASE_NAME "Narrow-5.95k(Xiph){sw}"

#define XIPH_SPEEX_NARROW3_H323_NAME SPEEX_BASE_NAME "Narrow-8k(Xiph){sw}"

#define XIPH_SPEEX_NARROW4_H323_NAME SPEEX_BASE_NAME "Narrow-11k(Xiph){sw}"

#define XIPH_SPEEX_NARROW5_H323_NAME SPEEX_BASE_NAME "Narrow-15k(Xiph){sw}"

#define XIPH_SPEEX_NARROW6_H323_NAME SPEEX_BASE_NAME "Narrow-18.2k(Xiph){sw}"

H323_REGISTER_CAPABILITY(XiphSpeexNarrow2AudioCapability, XIPH_SPEEX_NARROW2_H323_NAME);

H323_REGISTER_CAPABILITY(XiphSpeexNarrow3AudioCapability, XIPH_SPEEX_NARROW3_H323_NAME);

H323_REGISTER_CAPABILITY(XiphSpeexNarrow4AudioCapability, XIPH_SPEEX_NARROW4_H323_NAME);

H323_REGISTER_CAPABILITY(XiphSpeexNarrow5AudioCapability, XIPH_SPEEX_NARROW5_H323_NAME);

H323_REGISTER_CAPABILITY(XiphSpeexNarrow6AudioCapability, XIPH_SPEEX_NARROW6_H323_NAME);

/////////////////////////////////////////////////////////////////////////

static int Speex_Bits_Per_Second(int mode) {

void *tmp_coder_state;

int bitrate;

tmp_coder_state = speex_encoder_init(&speex_nb_mode);

speex_encoder_ctl(tmp_coder_state, SPEEX_SET_QUALITY, &mode);

speex_encoder_ctl(tmp_coder_state, SPEEX_GET_BITRATE, &bitrate);

speex_encoder_destroy(tmp_coder_state);

return bitrate;

}

static int Speex_Bytes_Per_Frame(int mode) {

int bits_per_frame = Speex_Bits_Per_Second(mode) / 50; // (20ms frame size)

return ((bits_per_frame+7)/8); // round up

}

OpalMediaFormat const OpalSpeexNarrow_5k95(OPAL_SPEEX_NARROW_5k95,

OpalMediaFormat::DefaultAudioSessionID,

RTP_DataFrame::DynamicBase,

TRUE, // Needs jitter

Speex_Bits_Per_Second(2),

Speex_Bytes_Per_Frame(2),

SAMPLES_PER_FRAME, // 20 milliseconds

OpalMediaFormat::AudioTimeUnits);

OpalMediaFormat const OpalSpeexNarrow_8k(OPAL_SPEEX_NARROW_8k,

OpalMediaFormat::DefaultAudioSessionID,

RTP_DataFrame::DynamicBase,

TRUE, // Needs jitter

Speex_Bits_Per_Second(3),

Speex_Bytes_Per_Frame(3),

SAMPLES_PER_FRAME, // 20 milliseconds

OpalMediaFormat::AudioTimeUnits);

OpalMediaFormat const OpalSpeexNarrow_11k(OPAL_SPEEX_NARROW_11k,

OpalMediaFormat::DefaultAudioSessionID,

RTP_DataFrame::DynamicBase,

TRUE, // Needs jitter

Speex_Bits_Per_Second(4),

Speex_Bytes_Per_Frame(4),

SAMPLES_PER_FRAME, // 20 milliseconds

OpalMediaFormat::AudioTimeUnits);

OpalMediaFormat const OpalSpeexNarrow_15k(OPAL_SPEEX_NARROW_15k,

OpalMediaFormat::DefaultAudioSessionID,

RTP_DataFrame::DynamicBase,

TRUE, // Needs jitter

Speex_Bits_Per_Second(5),

Speex_Bytes_Per_Frame(5),

SAMPLES_PER_FRAME, // 20 milliseconds

OpalMediaFormat::AudioTimeUnits);

OpalMediaFormat const OpalSpeexNarrow_18k2(OPAL_SPEEX_NARROW_18k2,

OpalMediaFormat::DefaultAudioSessionID,

RTP_DataFrame::DynamicBase,

TRUE, // Needs jitter

Speex_Bits_Per_Second(6),

Speex_Bytes_Per_Frame(6),

SAMPLES_PER_FRAME, // 20 milliseconds

OpalMediaFormat::AudioTimeUnits);

/////////////////////////////////////////////////////////////////////////

SpeexNonStandardAudioCapability::SpeexNonStandardAudioCapability(int mode)

: H323NonStandardAudioCapability(1, 1,

EQUIVALENCE_COUNTRY_CODE,

EQUIVALENCE_T35EXTENSION,

EQUIVALENCE_MANUFACTURER_CODE,

NULL, 0, 0, P_MAX_INDEX)

{

PStringStream s;

s << "Speex bs" << speex_nb_mode.bitstream_version << " Narrow" << mode;

PINDEX len = s.GetLength();

memcpy(nonStandardData.GetPointer(len), (const char *)s, len);

}

/////////////////////////////////////////////////////////////////////////

SpeexNarrow2AudioCapability::SpeexNarrow2AudioCapability()

: SpeexNonStandardAudioCapability(2)

{

}

PObject * SpeexNarrow2AudioCapability::Clone() const

{

return new SpeexNarrow2AudioCapability(*this);

}

PString SpeexNarrow2AudioCapability::GetFormatName() const

{

return SPEEX_NARROW2_H323_NAME;

}

H323Codec * SpeexNarrow2AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_5k95, 2, direction);

}

/////////////////////////////////////////////////////////////////////////

SpeexNarrow3AudioCapability::SpeexNarrow3AudioCapability()

: SpeexNonStandardAudioCapability(3)

{

}

PObject * SpeexNarrow3AudioCapability::Clone() const

{

return new SpeexNarrow3AudioCapability(*this);

}

PString SpeexNarrow3AudioCapability::GetFormatName() const

{

return SPEEX_NARROW3_H323_NAME;

}

H323Codec * SpeexNarrow3AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_8k, 3, direction);

}

/////////////////////////////////////////////////////////////////////////

SpeexNarrow4AudioCapability::SpeexNarrow4AudioCapability()

: SpeexNonStandardAudioCapability(4)

{

}

PObject * SpeexNarrow4AudioCapability::Clone() const

{

return new SpeexNarrow4AudioCapability(*this);

}

PString SpeexNarrow4AudioCapability::GetFormatName() const

{

return SPEEX_NARROW4_H323_NAME;

}

H323Codec * SpeexNarrow4AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_11k, 4, direction);

}

/////////////////////////////////////////////////////////////////////////

SpeexNarrow5AudioCapability::SpeexNarrow5AudioCapability()

: SpeexNonStandardAudioCapability(5)

{

}

PObject * SpeexNarrow5AudioCapability::Clone() const

{

return new SpeexNarrow5AudioCapability(*this);

}

PString SpeexNarrow5AudioCapability::GetFormatName() const

{

return SPEEX_NARROW5_H323_NAME;

}

H323Codec * SpeexNarrow5AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_15k, 5, direction);

}

/////////////////////////////////////////////////////////////////////////

SpeexNarrow6AudioCapability::SpeexNarrow6AudioCapability()

: SpeexNonStandardAudioCapability(6)

{

}

PObject * SpeexNarrow6AudioCapability::Clone() const

{

return new SpeexNarrow6AudioCapability(*this);

}

PString SpeexNarrow6AudioCapability::GetFormatName() const

{

return SPEEX_NARROW6_H323_NAME;

}

H323Codec * SpeexNarrow6AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_18k2, 6, direction);

}

/////////////////////////////////////////////////////////////////////////

XiphSpeexNonStandardAudioCapability::XiphSpeexNonStandardAudioCapability(int mode)

: H323NonStandardAudioCapability(1, 1,

XIPH_COUNTRY_CODE,

XIPH_T35EXTENSION,

XIPH_MANUFACTURER_CODE,

NULL, 0, 0, P_MAX_INDEX)

{

// FIXME: To be replaced by an ASN defined block of data

PStringStream s;

s << "Speex bs" << speex_nb_mode.bitstream_version << " Narrow" << mode;

PINDEX len = s.GetLength();

memcpy(nonStandardData.GetPointer(len), (const char *)s, len);

}

/////////////////////////////////////////////////////////////////////////

XiphSpeexNarrow2AudioCapability::XiphSpeexNarrow2AudioCapability()

: XiphSpeexNonStandardAudioCapability(2)

{

}

PObject * XiphSpeexNarrow2AudioCapability::Clone() const

{

return new XiphSpeexNarrow2AudioCapability(*this);

}

PString XiphSpeexNarrow2AudioCapability::GetFormatName() const

{

return XIPH_SPEEX_NARROW2_H323_NAME;

}

H323Codec * XiphSpeexNarrow2AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_5k95, 2, direction);

}

/////////////////////////////////////////////////////////////////////////

XiphSpeexNarrow3AudioCapability::XiphSpeexNarrow3AudioCapability()

: XiphSpeexNonStandardAudioCapability(3)

{

}

PObject * XiphSpeexNarrow3AudioCapability::Clone() const

{

return new XiphSpeexNarrow3AudioCapability(*this);

}

PString XiphSpeexNarrow3AudioCapability::GetFormatName() const

{

return XIPH_SPEEX_NARROW3_H323_NAME;

}

H323Codec * XiphSpeexNarrow3AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_8k, 3, direction);

}

/////////////////////////////////////////////////////////////////////////

XiphSpeexNarrow4AudioCapability::XiphSpeexNarrow4AudioCapability()

: XiphSpeexNonStandardAudioCapability(4)

{

}

PObject * XiphSpeexNarrow4AudioCapability::Clone() const

{

return new XiphSpeexNarrow4AudioCapability(*this);

}

PString XiphSpeexNarrow4AudioCapability::GetFormatName() const

{

return XIPH_SPEEX_NARROW4_H323_NAME;

}

H323Codec * XiphSpeexNarrow4AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_11k, 4, direction);

}

/////////////////////////////////////////////////////////////////////////

XiphSpeexNarrow5AudioCapability::XiphSpeexNarrow5AudioCapability()

: XiphSpeexNonStandardAudioCapability(5)

{

}

PObject * XiphSpeexNarrow5AudioCapability::Clone() const

{

return new XiphSpeexNarrow5AudioCapability(*this);

}

PString XiphSpeexNarrow5AudioCapability::GetFormatName() const

{

return XIPH_SPEEX_NARROW5_H323_NAME;

}

H323Codec * XiphSpeexNarrow5AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_15k, 5, direction);

}

/////////////////////////////////////////////////////////////////////////

XiphSpeexNarrow6AudioCapability::XiphSpeexNarrow6AudioCapability()

: XiphSpeexNonStandardAudioCapability(6)

{

}

PObject * XiphSpeexNarrow6AudioCapability::Clone() const

{

return new XiphSpeexNarrow6AudioCapability(*this);

}

PString XiphSpeexNarrow6AudioCapability::GetFormatName() const

{

return XIPH_SPEEX_NARROW6_H323_NAME;

}

H323Codec * XiphSpeexNarrow6AudioCapability::CreateCodec(H323Codec::Direction direction) const

{

return new SpeexCodec(OpalSpeexNarrow_18k2, 6, direction);

}

/////////////////////////////////////////////////////////////////////////////

const float MaxSampleValue = 32767.0;

const float MinSampleValue = -32767.0;

SpeexCodec::SpeexCodec(const char * name, int mode, Direction dir)

: H323FramedAudioCodec(name, dir)

{

PTRACE(3, "Codec/tSpeex mode " << mode << " " << (dir == Encoder ? "en" : "de")

<< "coder created");

bits = new SpeexBits;

speex_bits_init(bits);

if (direction == Encoder) {

coder_state = speex_encoder_init(&speex_nb_mode);

speex_encoder_ctl(coder_state, SPEEX_GET_FRAME_SIZE, &encoder_frame_size);

speex_encoder_ctl(coder_state, SPEEX_SET_QUALITY, &mode);

} else {

coder_state = speex_decoder_init(&speex_nb_mode);

}

}

SpeexCodec::~SpeexCodec()

{

speex_bits_destroy(bits);

delete bits;

if (direction == Encoder)

speex_encoder_destroy(coder_state);

else

speex_decoder_destroy(coder_state);

}

BOOL SpeexCodec::EncodeFrame(BYTE * buffer, unsigned & length)

{

// convert PCM to float

float floatData[SAMPLES_PER_FRAME];

PINDEX i;

for (i = 0; i < SAMPLES_PER_FRAME; i++)

floatData[i] = sampleBuffer[i];

// encode PCM data in sampleBuffer to buffer

speex_bits_reset(bits);

speex_encode(coder_state, floatData, bits);

length = speex_bits_write(bits, (char *)buffer, encoder_frame_size);

return TRUE;

}

BOOL SpeexCodec::DecodeFrame(const BYTE * buffer, unsigned length, unsigned &)

{

float floatData[SAMPLES_PER_FRAME];

// decode Speex data to floats

speex_bits_read_from(bits, (char *)buffer, length);

speex_decode(coder_state, bits, floatData);

// convert float to PCM

PINDEX i;

for (i = 0; i < SAMPLES_PER_FRAME; i++) {

float sample = floatData[i];

if (sample < MinSampleValue)

sample = MinSampleValue;

else if (sample > MaxSampleValue)

sample = MaxSampleValue;

sampleBuffer[i] = (short)sample;

}

return TRUE;

}

VC++ 中使用 API的 char 单字节压缩代码示例:

Encoding and decoding problem in speex 1.0.4

Subject:Encoding and decoding problem in speex 1.0.4
List-id:speex-dev.xiph.org
Hi,
I am using the speex 1.0.4 library from Windows.
I have posted my problem before but didn't get a solution. I am doing an
VOIP project
in which i am recording sound and streaming it to the peer. I wanted to
encode and decode
wav files that brought me to this site.
I am recording sound in the following format:-
m_WaveFormatEx.wFormatTag          = WAVE_FORMAT_PCM;
m_WaveFormatEx.nChannels           = 1;
m_WaveFormatEx.wBitsPerSample      = 8;
m_WaveFormatEx.cbSize              = 0;
m_WaveFormatEx.nSamplesPerSec      = 8000;
m_WaveFormatEx.nBlockAlign         = 1;
m_WaveFormatEx.nAvgBytesPerSec     = 8000;
The recording is as follows :-
When the buffer(size = 2000 bytes) gets filled with sound data a
function with the body shown
below is called.
LPWAVEHDR lpHdr = (LPWAVEHDR) lParam;
if(lpHdr->dwBytesRecorded==0 || lpHdr==NULL)
return ERROR_SUCCESS;
::waveInUnprepareHeader(m_hRecord, lpHdr, sizeof(WAVEHDR));
Here lpHdr->lpData contains the audio data in a character array.
Now here I want to use Speex codec for encoding the data so the encoding
function is
called (I am thankful to Tay YueWeng for the function).
char *encode(char *buffer, int &encodeSize)
{
char   *encodedBuffer = new char[RECBUFFER/2];            /*
RECBUFFER = 2000 */
short   speexShort;
float  speexFloat[RECBUFFER/2];
void   *mEncode       = speex_encoder_init(&speex_nb_mode);
/*Initialization of the structure that holds the bits*/
speex_bits_init(&mBits);
// Convert the audio to a short then to a float buffer
int    halfBufferSize = RECBUFFER/2;
for (int i = 0; i < halfBufferSize; i++)
{
memcpy(&speexShort, &buffer[i*2], sizeof(short));
speexFloat[i]     = speexShort;
}
// Encode the sound data using the float buffer
speex_bits_reset(&mBits);
speex_encode(mEncode, speexFloat, &mBits);
encodeSize            = speex_bits_write(&mBits, encodedBuffer,
RECBUFFER/2);
/*Destroy the encoder state*/
speex_encoder_destroy(mEncode);
/*Destroy the bit-stream struct*/
speex_bits_destroy(&mBits);
// Return the encoded buffer
return encodedBuffer;
}
Here i noticed that though my captured audio data is 2000 bytes the
compressed form is
always 38 bytes. In the speexFloat array above i get values in the range
-32767 to +32767.
Is it correct. Also after calling the 'speex_encode' function the first
160 values in the
input float array i.e. speexFloat is changed (why does it happen?Is
anything abnormal).
Further after calling the above function for testing I decode the
returned encoded data
immediately by calling the decoding function shown bellow :-
char *decode (char *buffer, int encodeSize)
{
char *decodedBuffer   = new char[RECBUFFER];
short speexShort;
float speexFloat[RECBUFFER/2];
// Decode the sound data into a float buffer
void  *mDecode        = speex_decoder_init(&speex_nb_mode);
/*Initialization of the structure that holds the bits*/
speex_bits_init(&mBits);
int    halfBufferSize = RECBUFFER/2;
speex_bits_reset(&mBits);
speex_bits_read_from(&mBits, buffer, encodeSize);
speex_decode(mDecode, &mBits, speexFloat);
// Convert from float to short to char
for (int i = 0; i < halfBufferSize; i++)
{
speexShort = speexFloat[i];
memcpy(&decodedBuffer[i*2], &speexShort, sizeof(short));
}
/*Destroy the decoder state*/
speex_encoder_destroy(mDecode);
/*Destroy the bit-stream truct*/
speex_bits_destroy(&mBits);
// Return the buffer
return decodedBuffer;
}
After decoding using the above function only the first 160 values in the
decodedBuffer array is
changed. i.e i encoded an 2000 byte audio data to get a 38 byte encoded
audio data. On decoding
the 38 byte audio data i get an decompressed 160 byte data. I don't
understand whats going
wrong. I checked all the messages posted in this newsgroup and did'nt
find an answer so i am
posting this code hoping that it gets solved soon.  Thanks in advance.

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: