C++ 保存文件为UTF8编码格式
2013-10-26 21:02
579 查看
前言
本文是我的第一篇译文,可能翻译不好,将中英同时呈上,便于大家阅读。
本文来自从天堂大鸟的编程博客的保存文件为UTF8格式(Writing
UTF-8 files in C++).发现的。
原英文地址:http://mariusbancila.ro/blog/2008/10/20/writing-utf-8-files-in-c/
因为最近在使用tinyxml保存xml,发现其并不能将文件转换为utf8格式,一遇到中文显示,就出乱子,所以就百度一下,找到相关内容。觉得《Writing UTF-8 files in C++》比较浅显易读,就翻译了一下。
正文
当你要写下面一段XML文件内容
Let’s say you need to write an XML file with this content:
如何用C++实现?
How do we write that in C++?
咋一看,你可能试图写成这个样子:
At a first glance, you could be tempted to write it like this:
当你用IE打开时,惊讶的发现不正确显示:
When you open the file in IE for instance, surprize! It's not rendered correctly:
因此,你可能会说:“采用wstring 和 wofstream"。
So you could be tempted to say "let's switch to wstring and wofstream".
当你运行程序并打开文件时,仍和以前一样。哦,问题出在哪里了?好了,这个问题既不是ofstream也不是用wofstream来写UTF-8编码格式文件。如果你想要实现真正的UFT-8编码格式,你需要在输出字符串中使用UTF-8。我们可以使用WideCharToMultiByte()。这个API函数可以实现将宽字符串转化为新的字符串(不一定是一个多字节字符集)。该函数的第一个参数是字符集编码类型,对于UTF-8格式我们纸需要使用CP_UTF8.
And when you run it and open the file again, no change. So, where is the problem? Well, the problem is that neither ofstream nor wofstream write the text in a UTF-8 format. If you want the file to really be in UTF-8 format,
you have to encode the output buffer in UTF-8. And to do that we can use WideCharToMultiByte(). This Windows API maps a wide character string to a new character string (which is not necessary from a multibyte character set). The first argument indicates the
code page. For UTF-8 we need to specify CP_UTF8.
下面函数能帮助我们从std::wstring 转成 UTF-8编码的 std::string。
The following helper functions encode a std::wstring into a UTF-8 stream, wrapped into a std::string.
有了这个基础,你只需要做如下改动:
With that in hand, all you have to do is doing the following changes:
现在你再打开文件,你得到你想要的结果:
And now when you open the file, you get what you wanted in the first place.
大功告成!
And that is all!
本文是我的第一篇译文,可能翻译不好,将中英同时呈上,便于大家阅读。
本文来自从天堂大鸟的编程博客的保存文件为UTF8格式(Writing
UTF-8 files in C++).发现的。
原英文地址:http://mariusbancila.ro/blog/2008/10/20/writing-utf-8-files-in-c/
因为最近在使用tinyxml保存xml,发现其并不能将文件转换为utf8格式,一遇到中文显示,就出乱子,所以就百度一下,找到相关内容。觉得《Writing UTF-8 files in C++》比较浅显易读,就翻译了一下。
正文
当你要写下面一段XML文件内容
Let’s say you need to write an XML file with this content:
< ?xml version="1.0" encoding="UTF-8"? > < root description="this is a naïve example" > < /root >
如何用C++实现?
How do we write that in C++?
咋一看,你可能试图写成这个样子:
At a first glance, you could be tempted to write it like this:
#include< fstream > int main() { std::ofstream testFile; testFile.open("demo.xml", std::ios::out| std::ios::binary); std::string text = "< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n" "< root description=\"this is a naïve example\" >\n< /root >"; testFile << text; testFile.close(); return0; }
当你用IE打开时,惊讶的发现不正确显示:
When you open the file in IE for instance, surprize! It's not rendered correctly:
因此,你可能会说:“采用wstring 和 wofstream"。
So you could be tempted to say "let's switch to wstring and wofstream".
int main() { std::wofstream testFile; testFile.open("demo.xml", std::ios::out| std::ios::binary); std::wstring text = L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n" L"< root description=\"this is a naïve example\" >\n< /root >"; testFile << text; testFile.close(); return0; }
当你运行程序并打开文件时,仍和以前一样。哦,问题出在哪里了?好了,这个问题既不是ofstream也不是用wofstream来写UTF-8编码格式文件。如果你想要实现真正的UFT-8编码格式,你需要在输出字符串中使用UTF-8。我们可以使用WideCharToMultiByte()。这个API函数可以实现将宽字符串转化为新的字符串(不一定是一个多字节字符集)。该函数的第一个参数是字符集编码类型,对于UTF-8格式我们纸需要使用CP_UTF8.
And when you run it and open the file again, no change. So, where is the problem? Well, the problem is that neither ofstream nor wofstream write the text in a UTF-8 format. If you want the file to really be in UTF-8 format,
you have to encode the output buffer in UTF-8. And to do that we can use WideCharToMultiByte(). This Windows API maps a wide character string to a new character string (which is not necessary from a multibyte character set). The first argument indicates the
code page. For UTF-8 we need to specify CP_UTF8.
下面函数能帮助我们从std::wstring 转成 UTF-8编码的 std::string。
The following helper functions encode a std::wstring into a UTF-8 stream, wrapped into a std::string.
#include< windows.h > std::string to_utf8(constwchar_t* buffer,int len) { int nChars =::WideCharToMultiByte( CP_UTF8, 0, buffer, len, NULL, 0, NULL, NULL); if(nChars ==0)return""; string newbuffer; newbuffer.resize(nChars); ::WideCharToMultiByte( CP_UTF8, 0, buffer, len, const_cast<char*>(newbuffer.c_str()), nChars, NULL, NULL); return newbuffer; } std::string to_utf8(const std::wstring& str) { return to_utf8(str.c_str(),(int)str.size()); }
有了这个基础,你只需要做如下改动:
With that in hand, all you have to do is doing the following changes:
int main() { std::ofstream testFile; testFile.open("demo.xml", std::ios::out| std::ios::binary); std::wstring text = L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n" L"< root description=\"this is a naïve example\" >\n< /root >"; std::string outtext = to_utf8(text); testFile << outtext; testFile.close(); return0; }
现在你再打开文件,你得到你想要的结果:
And now when you open the file, you get what you wanted in the first place.
大功告成!
And that is all!
相关文章推荐
- VB打开/保存任意编码格式文本文件
- c++判断文件编码格式
- 使用hta保存utf8格式的文件的代码
- 数据表中既有utf8,又有gbk编码,保存的时候很是latin1的格式
- 保存文件为UTF8格式(Writing UTF-8 files in C++).
- VS2013将新建的源码文件的编码格式自动设置成UTF8
- VS2013中将新建的源码文件的编码格式自动设置成UTF8
- 【C++】保存和读取有规律格式的文件路径
- 使用FFMPEG编码保存MPEG-1/MPEG-2文件格式
- C#保存文件为无BOM的utf8格式
- 关于.JS文件保存编码格式的问题-兼容各种语言版本
- Dom4j保存数据乱码以及xml文件头编码格式改变的原因和解决方法
- 修改WebStorm默认保存的文件编码格式
- 62 ----这个文件夹保存的格式为utf8的,否则会出不来;----文件保存格式不对也出不来:
- 多种编码格式文件的保存
- c++读取utf8等不同编码文件
- VS2013将新建的源码文件的编码格式自动设置成UTF8
- C++打开特定编码格式的文件(utf-8)
- Nginx将utf8编码的url解码成\x的16进制格式导致无法匹配静态文件的问题处理
- VS2012将新建的源码文件的编码格式自动设置成UTF8