您的位置:首页 > 其它

读取utf-8等编码的文本

2007-11-01 10:14 281 查看
int MultiByteToWideChar(
UINT
CodePage, // code page
DWORD dwFlags, // character-type options
LPCSTR lpMultiByteStr,// string to map
int cbMultiByte, // number of bytes in string
LPWSTR lpWideCharStr, // wide-character buffer
int cchWideChar // size of buffer
);

CodePage, :所输入的多字节文本编码

CP_ACPANSI code page
CP_MACCPMacintosh code page
CP_OEMCPOEM code page
CP_SYMBOLWindows 2000/XP: Symbol code page (42)
CP_THREAD_ACPWindows 2000/XP: The current thread's ANSI code page
CP_UTF7Windows 98/Me, Windows NT 4.0 and later: Translate using UTF-7
CP_UTF8Windows 98/Me, Windows NT 4.0 and later: Translate using UTF-8.
dwFlags, :一般情况为0 即MB_PRECOMPOSED

MB_PRECOMPOSEDAlways use precomposed charactersthat is, characters in which a base character and a nonspacing character have a single character value. This is the default translation option. Cannot be used with MB_COMPOSITE.
MB_COMPOSITEAlways use composite charactersthat is, characters in which a base character and a nonspacing character have different character values. Cannot be used with MB_PRECOMPOSED.
MB_ERR_INVALID_CHARSIf the function encounters an invalid input character, it fails and GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
MB_USEGLYPHCHARSUse glyph characters instead of control characters.
cbMultiByte为-1,lpWideCharStr为NULL,cchWideChar 为0时,可以计算出转换后宽字符串所需要的最小长度(即其返回值)

CFile newFOXFile;
CString newFOXStrLog;//保存结果
try{
char strBuffle[256];//读去文本缓存
memset(strBuffle,0,sizeof(strBuffle));//清空设为0
newFOXFile.Open(m_NewFOXPath.GetString(),CFile::modeRead);//打开文本文件
while(newFOXFile.Read(strBuffle,sizeof(strBuffle)-1))//注意确保缓存最后有以个是0,
{
wchar_t *wstrBuffle;
DWORD dwMinSize;
dwMinSize = MultiByteToWideChar (CP_UTF8, 0,strBuffle, -1, NULL, 0);//获取转换UTF8多字节成宽直接所需的空间

if(255<dwMinSize)//检查是否越界
{
memset(wstrBuffle,0,sizeof(strBuffle));
continue;
}
wstrBuffle=new wchar_t[dwMinSize];
MultiByteToWideChar(CP_UTF8, 0,strBuffle, -1,wstrBuffle, dwMinSize);//转换
newFOXStrLog+=CString(wstrBuffle);//累加入结果字符串
delete wstrBuffle;
memset(wstrBuffle,0,sizeof(strBuffle));
}

无论是MultiByteToWideChar还是WideCharToMultiByte 都可以确定转换后的字符串长度的,MultiByteToWideChar,当cchWideChar 为0的时候,返回的就是长度,对WideCharToMultiByte ,当cbMultiByte为0 的时候,返回的是转换后的长度,所以,一般在不能确定转换后的长度情况下,一般需要调用相应的函数两次.第一次是获取转换后的长度,第二次正式转换.

宽字符转为多字节字符的代码如下:
wchar_t wText[20] = {L"宽字符转换实例!OK!"};
DWORD dwNum = WideCharToMultiByte(CP_OEMCP,NULL,lpcwszStr,-1,NULL,0,NULL,FALSE);
char *psText;
psText = new char[dwNum];
if(!psText)
{
delete []psText;
}
WideCharToMultiByte (CP_OEMCP,NULL,lpcwszStr,-1,psText,dwNum,NULL,FALSE);
delete []psText;

MultiByteToWideChar()函数乱码的问题
准的WinCE4.2或WinCE5.0 SDK模拟器下,这个函数都无法正常工作,其转换之后的字符全是乱码.及时更改MultiByteToWideChar()参数也依然如此.
不过这个不是代码问题,其结症在于所定制的操作系统.如果我们定制的操作系统默认语言不是中文,也会出现这种情况.由于标准的SDK默认语言为英文,所以肯定会出现这个问题.而这个问题的解决,不能在简单地更改控制面板的"区域选项"的"默认语言",而是要在系统定制的时候,选择默认语言为"中文".
系统定制时选择默认语言的位置于:
Platform -> Setting... -> locale -> default language ,选择"中文",然后编译即可.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: