您的位置:首页 > 编程语言 > C语言/C++

保存文件为UTF8格式XML file(Writing UTF-8 files in C++)

2015-12-29 14:44 489 查看
Let’s say you need to write an XML file with this content:
< ?xml version="1.0" encoding="UTF-8"? >
< root description="this is a naïve example" >
< /root >


How do we write that in C++?

At a first glance, you could be tempted to write it like this:
#include< fstream >

int main()
{
std::ofstream testFile;

testFile.open("demo.xml", std::ios::out| std::ios::binary);

std::string text =
"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n"
"< root description=\"this is a naïve example\" >\n< /root >";

testFile << text;

testFile.close();

return0;
}


When you open the file in IE for instance, surprize! It's not rendered correctly:



So you could be tempted to say "let's switch to wstring and wofstream".
int main()
{
std::wofstream testFile;

testFile.open("demo.xml", std::ios::out| std::ios::binary);

std::wstring text =
L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n"
L"< root description=\"this is a naïve example\" >\n< /root >";

testFile << text;

testFile.close();

return0;
}


And when you run it and open the file again, no change. So, where is the problem? Well, the problem is that neither ofstream nor wofstream write the text in a UTF-8 format. If you want the file to really be in UTF-8 format, you have to encode the output buffer
in UTF-8. And to do that we can use WideCharToMultiByte(). This Windows API maps a wide character string to a new character string (which is not necessary from a multibyte character set). The first argument indicates the code page. For UTF-8 we need to specify
CP_UTF8.

The following helper functions encode a std::wstring into a UTF-8 stream, wrapped into a std::string.
#include< windows.h >

std::string to_utf8(constwchar_t* buffer,int len)
{
int nChars =::WideCharToMultiByte(
CP_UTF8,
0,
buffer,
len,
NULL,
0,
NULL,
NULL);
if(nChars ==0)return"";

string newbuffer;
newbuffer.resize(nChars);
::WideCharToMultiByte(
CP_UTF8,
0,
buffer,
len,
const_cast<char*>(newbuffer.c_str()),
nChars,
NULL,
NULL);

return newbuffer;
}

std::string to_utf8(const std::wstring& str)
{
return to_utf8(str.c_str(),(int)str.size());
}


With that in hand, all you have to do is doing the following changes:
int main()
{
std::ofstream testFile;

testFile.open("demo.xml", std::ios::out| std::ios::binary);

std::wstring text =
L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n"
L"< root description=\"this is a naïve example\" >\n< /root >";

std::string outtext = to_utf8(text);

testFile << outtext;

testFile.close();

return0;
}


And now when you open the file, you get what you wanted in the first place.



And that is all!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: