[C] wchar_t的格式控制字符(VC、BCB、GCC、C99标准)
2012-07-30 18:12
513 查看
作者:zyl910
随着wchar_t类型引入C语言,字符串处理变得越来越复杂。例如字符串输出有printf、wprintf这两个函数,当参数中既有char字符串又有wchar_t字符串时,该怎么填写格式控制字符呢?本文对此进行探讨。
先翻阅一下各个编译器的文档及C99标准,看看它们对格式控制字符的说明。
在MSDN官网上,可以找到printf与wprintf的格式字符串的说明,在《Format Specification Fields: printf and wprintf Functions》(http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx)。摘录——
A format specification, which consists of optional and required fields, has the following form:
% [flags] [width] [.precision] [{h | l | ll | I | I32 | I64}]type
先点“type”查看类型,进入《printf Type Field Characters》页面(http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx)。摘录——
printf Type Field Characters
后退,再点击《Size Specification》(http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx)的链接。摘录——
Thus to print single-byte or wide-characters with printf functions and wprintf functions, use format specifiers as follows.
To print strings with printf functions and wprintf functions, use the prefixes h and l analogously with format type-specifiers s and S.
上面介绍了很多控制字符。整理一下,发现对字符串来说,最有用的是这三个——
hs:printf、wprintf均是char字符串。
ls:printf、wprintf均是wchar_t字符串。
s:printf是char字符串,而wprintf是wchar_t字符串。与TCHAR搭配使用很方便。
打开BCB6帮助文件中的“C Runtime Library Reference”,在索引中输入“printf”,能很快找到格式控制字符的说明——
观察后可发现,它与VC是兼容的。可以使用hs/ls/s分别处理char/wchar_t/TCHAR字符串。
我这里装了Fedora 17,并装好了GCC 4.7.0。
打开控制台,输入“man 3 wprintf”查看wprintf函数的文档。摘录——
c
If no l modifier is present, the int argument is converted to a wide character by a call to the btowc(3) function, and the resulting wide character is written. If an l modifier is present, the wint_t (wide character) argument is written.
s
If no l modifier is present: The const char * argument is expected to be a pointer to an array of character type (pointer to a string) containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted to wide characters (each by a call to the mbrtowc(3) function with a conversion state starting in the initial state before the first byte). The resulting wide characters are written up to (but not including) the terminating null wide character. If a precision is specified, no more wide characters than the number specified are written. Note that the precision determines the number of wide characters written, not the number of bytes or screen positions. The array must contain a terminating null byte, unless a precision is given and it is so small that the number of converted wide characters reaches it before the end of the array is reached.
If an l modifier is present: The const wchar_t * argument is expected to be a pointer to an array of wide characters. Wide characters from the array are written up to (but not including) a terminating null wide character. If a precision is specified, no more than the number specified are written. The array must contain a terminating null wide character, unless a precision is given and it is smaller than or equal to the number of wide characters in the array.
根据上面的描述,GCC似乎只支持这两种字符串的格式控制字符——
s:printf、wprintf均是char字符串。
ls:printf、wprintf均是wchar_t字符串。
在C99标准的“7.24.2.1 The fwprintf function”中介绍了fwprintf等宽字符函数的格式控制字符。摘录——
7 The length modifiers and their meanings are:
h
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.
l (ell)
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.
……
8 The conversion specifiers and their meanings are:
c
If no l length modifier is present, the int argument is converted to a wide character as if by calling btowc and the resulting wide character is written.
If an l length modifier is present, the wint_t argument is converted to wchar_t and written.
s
If no l length modifier is present, the argument shall be a pointer to the initial element of a character array containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted as if by repeated calls to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted, and written up to (but not including) the terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the converted array, the converted array shall contain a null wide character.
If an l length modifier is present, the argument shall be a pointer to the initial element of an array of wchar_t type. Wide characters from the array are written up to (but not including) a terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null wide character.
可见,C99标准中c、s仅有“l”长度修正,没“l”的是char字符串,有“l”的是wchar_t字符串。
根据上面的资料,可以整理出一份表格——
*:未定义。
参考了上述文档,我觉的应该编写一个测试程序,实际测一下各个编译器对wchar_t格式控制字符的支持性。
测试程序的代码如下——
如果运行正常的话,该程序的输出结果应该是——
A: CHAR
W: WCHAR
T: TCHAR
A: CHAR
W: WCHAR
T: TCHAR
Fedora 17,GCC 4.7.0——
第3项的输出结果有误是很容易理解的。因为GCC文档与C99标准都规定“无l时的s代表char字符串”,而pst实际上是一个wchar_t字符串。
而第1项正确的输出结果反倒有点迷惑——GCC文档和C99标准中s不是没有“h”长度修正吗。想了一下才明白,文档上说的是“无l时的s代表char字符串”,因“hs”没有“l”,所以被识别为char字符串也是符合标准。
MinGW(20120426),GCC 4.6.2——
MinGW虽然用的也是GCC编译器,但为了兼容Windows环境,它调整了格式控制字符规则,与VC保持一致。
根据上面的测试结果,修订前面的表格——
总结如下——
1) 需要输出char字符串时,使用“hs”。
2) 需要输出wchar_t字符串时,使用“ls”。
3) 需要输出TCHAR字符串时,使用“s”,仅对VC、BCB、MinGW等Windows平台的编译器有效。
参考文献——
《ISO/IEC 9899:1999 (C99)》。ISO/IEC,1999。www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
《C99标准》。yourtommy。http://blog.csdn.net/yourtommy/article/details/7495033
《[VS2012] Format Specification Fields: printf and wprintf Functions》。http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx
《[VS2012] printf Type Field Characters》。http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx
《[VS2012] Size Specification》。http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx
《wprintf(3) - Linux manual page》。http://www.kernel.org/doc/man-pages/online/pages/man3/wprintf.3.html
源码下载——
http://files.cnblogs.com/zyl910/wcharfmt.rar
随着wchar_t类型引入C语言,字符串处理变得越来越复杂。例如字符串输出有printf、wprintf这两个函数,当参数中既有char字符串又有wchar_t字符串时,该怎么填写格式控制字符呢?本文对此进行探讨。
一、翻阅文档
先翻阅一下各个编译器的文档及C99标准,看看它们对格式控制字符的说明。
1.1 VC的文档
在MSDN官网上,可以找到printf与wprintf的格式字符串的说明,在《Format Specification Fields: printf and wprintf Functions》(http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx)。摘录——A format specification, which consists of optional and required fields, has the following form:
% [flags] [width] [.precision] [{h | l | ll | I | I32 | I64}]type
先点“type”查看类型,进入《printf Type Field Characters》页面(http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx)。摘录——
printf Type Field Characters
Character | Type | Output format |
c | int or wint_t | When used with printf functions, specifies a single-byte character; when used with wprintf functions, specifies a wide character. |
C | int or wint_t | When used with printf functions, specifies a wide character; when used with wprintf functions, specifies a single-byte character. |
s | String | When used with printf functions, specifies a single-byte–character string; when used with wprintf functions, specifies a wide-character string. Characters are displayed up to the first null character or until the precision value is reached. |
S | String | When used with printf functions, specifies a wide-character string; when used with wprintf functions, specifies a single-byte–character string. Characters are displayed up to the first null character or until the precision value is reached. |
To specify | Use prefix | With type specifier |
Single-byte character with printf functions | h | c or C |
Single-byte character with wprintf functions | h | c or C |
Wide character with printf functions | l | c or C |
Wide character with wprintf functions | l | c or C |
Single-byte – character string with printf functions | h | s or S |
Single-byte – character string with wprintf functions | h | s or S |
Wide-character string with printf functions | l | s or S |
Wide-character string with wprintf functions | l | s or S |
Wide character | w | c |
Wide-character string | w | s |
To print character as | Use function | With format specifier |
single byte | printf | c, hc, or hC |
single byte | wprintf | C, hc, or hC |
wide | wprintf | c, lc, lC, or wc |
wide | printf | C, lc, lC, or wc |
上面介绍了很多控制字符。整理一下,发现对字符串来说,最有用的是这三个——
hs:printf、wprintf均是char字符串。
ls:printf、wprintf均是wchar_t字符串。
s:printf是char字符串,而wprintf是wchar_t字符串。与TCHAR搭配使用很方便。
1.2 BCB的文档
打开BCB6帮助文件中的“C Runtime Library Reference”,在索引中输入“printf”,能很快找到格式控制字符的说明——观察后可发现,它与VC是兼容的。可以使用hs/ls/s分别处理char/wchar_t/TCHAR字符串。
1.3 GCC的文档
我这里装了Fedora 17,并装好了GCC 4.7.0。打开控制台,输入“man 3 wprintf”查看wprintf函数的文档。摘录——
c
If no l modifier is present, the int argument is converted to a wide character by a call to the btowc(3) function, and the resulting wide character is written. If an l modifier is present, the wint_t (wide character) argument is written.
s
If no l modifier is present: The const char * argument is expected to be a pointer to an array of character type (pointer to a string) containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted to wide characters (each by a call to the mbrtowc(3) function with a conversion state starting in the initial state before the first byte). The resulting wide characters are written up to (but not including) the terminating null wide character. If a precision is specified, no more wide characters than the number specified are written. Note that the precision determines the number of wide characters written, not the number of bytes or screen positions. The array must contain a terminating null byte, unless a precision is given and it is so small that the number of converted wide characters reaches it before the end of the array is reached.
If an l modifier is present: The const wchar_t * argument is expected to be a pointer to an array of wide characters. Wide characters from the array are written up to (but not including) a terminating null wide character. If a precision is specified, no more than the number specified are written. The array must contain a terminating null wide character, unless a precision is given and it is smaller than or equal to the number of wide characters in the array.
根据上面的描述,GCC似乎只支持这两种字符串的格式控制字符——
s:printf、wprintf均是char字符串。
ls:printf、wprintf均是wchar_t字符串。
1.4 C99标准
在C99标准的“7.24.2.1 The fwprintf function”中介绍了fwprintf等宽字符函数的格式控制字符。摘录——7 The length modifiers and their meanings are:
h
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.
l (ell)
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.
……
8 The conversion specifiers and their meanings are:
c
If no l length modifier is present, the int argument is converted to a wide character as if by calling btowc and the resulting wide character is written.
If an l length modifier is present, the wint_t argument is converted to wchar_t and written.
s
If no l length modifier is present, the argument shall be a pointer to the initial element of a character array containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted as if by repeated calls to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted, and written up to (but not including) the terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the converted array, the converted array shall contain a null wide character.
If an l length modifier is present, the argument shall be a pointer to the initial element of an array of wchar_t type. Wide characters from the array are written up to (but not including) a terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null wide character.
可见,C99标准中c、s仅有“l”长度修正,没“l”的是char字符串,有“l”的是wchar_t字符串。
1.5 小结
根据上面的资料,可以整理出一份表格——VC和BCB | GCC和C99标准 | |||
printf | wprintf | printf | wprintf | |
s | char | wchar_t | char | char |
S | wchar_t | char | * | * |
hs | char | char | * | * |
ls | wchar_t | wchar_t | wchar_t | wchar_t |
二、测试程序
参考了上述文档,我觉的应该编写一个测试程序,实际测一下各个编译器对wchar_t格式控制字符的支持性。测试程序的代码如下——
#include <stdio.h> #include <locale.h> #include <string.h> #include <wchar.h> char* psa = "CHAR"; // 单字节字符串. wchar_t* psw = L"WCHAR"; // 宽字符串. wchar_t* pst = L"TCHAR"; // 类型与printf/wprintf匹配的字符串. int main() { setlocale(LC_ALL, ""); // 使用系统当前代码页. // test wprintf(L"A:\t%hs\n", psa); wprintf(L"W:\t%ls\n", psw); wprintf(L"T:\t%s\n", pst); return 0; }
如果运行正常的话,该程序的输出结果应该是——
A: CHAR
W: WCHAR
T: TCHAR
三、测试结果
3.1 VC6与BCB6测试
跟意料中的一样,VC6与BCB6均正确输出了——A: CHAR
W: WCHAR
T: TCHAR
3.2 fedora中的GCC测试
Fedora 17,GCC 4.7.0——第3项的输出结果有误是很容易理解的。因为GCC文档与C99标准都规定“无l时的s代表char字符串”,而pst实际上是一个wchar_t字符串。
而第1项正确的输出结果反倒有点迷惑——GCC文档和C99标准中s不是没有“h”长度修正吗。想了一下才明白,文档上说的是“无l时的s代表char字符串”,因“hs”没有“l”,所以被识别为char字符串也是符合标准。
3.3 mingw中的GCC测试
MinGW(20120426),GCC 4.6.2——MinGW虽然用的也是GCC编译器,但为了兼容Windows环境,它调整了格式控制字符规则,与VC保持一致。
四、总结
根据上面的测试结果,修订前面的表格——VC、BCB、MinGW | Linux下的GCC、C99标准 | |||
printf | wprintf | printf | wprintf | |
s | char | wchar_t | char | char |
S | wchar_t | char | * | * |
hs | char | char | char | char |
ls | wchar_t | wchar_t | wchar_t | wchar_t |
1) 需要输出char字符串时,使用“hs”。
2) 需要输出wchar_t字符串时,使用“ls”。
3) 需要输出TCHAR字符串时,使用“s”,仅对VC、BCB、MinGW等Windows平台的编译器有效。
参考文献——
《ISO/IEC 9899:1999 (C99)》。ISO/IEC,1999。www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
《C99标准》。yourtommy。http://blog.csdn.net/yourtommy/article/details/7495033
《[VS2012] Format Specification Fields: printf and wprintf Functions》。http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx
《[VS2012] printf Type Field Characters》。http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx
《[VS2012] Size Specification》。http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx
《wprintf(3) - Linux manual page》。http://www.kernel.org/doc/man-pages/online/pages/man3/wprintf.3.html
源码下载——
http://files.cnblogs.com/zyl910/wcharfmt.rar
相关文章推荐
- [C] 跨平台使用TCHAR——让Linux等平台也支持tchar.h,解决跨平台时的格式控制字符问题,多国语言的同时显示(兼容vc/gcc/bcb,支持Windows/Linux/Mac)
- [C] 跨平台使用TCHAR——让Linux等平台也支持tchar.h,解决跨平台时的格式控制字符问题,多国语言的同时显示(兼容vc/gcc/bcb,支持Windows/Linux/Mac)
- [C] 跨平台使用TCHAR——让Linux等平台也支持tchar.h,解决跨平台时的格式控制字符问题,多国语言的同时显示(兼容vc/gcc/bcb,支持Windows/Linux/Mac)
- c语言-格式控制字符 %XXd 用法
- C#字符输出格式控制
- [C/C++] 显示各种C/C++编译器的预定义宏(C11标准、C++11标准、VC、BCB、Intel、GCC)
- 文件重定向,getline()获取一样,屏幕输出流,格式控制符dec,oct,hex,精度控制setprecision(int num),设置填充,cout.width和file(字符),进制输入
- [C/C++] 显示各种C/C++编译器的预定义宏(C11标准、C++11标准、VC、BCB、Intel、GCC)
- [C] 让VC、BCB支持C99的整数类型(stdint.h、inttypes.h)(兼容GCC)
- [C/C++] 各种C/C++编译器对UTF-8源码文件的兼容性测试(VC、GCC、BCB)
- __int64 类型(VC中)与long long 型(gcc中,C99标准)
- [C] 让VC支持C99的整数类型V1.01。避免包含目录问题,更名auto_stdint.h、auto_inttypes.h(在VC6至VC2012、GCC、BCB等编译器下测试通过)
- printf( ) 函数的格式控制字符转换说明
- c笔记04---输出格式控制字符
- BCB中用FormatFloat函数控制浮点数据的输出格式
- C语言学习笔记(二)--格式字符控制
- 文件重定向,getline()获取一样,屏幕输出流,格式控制符dec,oct,hex,精度控制setprecision(int num),设置填充,cout.width和file(字符),进制输入
- [C] 让VC支持C99的整数类型V1.01。避免包含目录问题,更名auto_stdint.h、auto_inttypes.h(在VC6至VC2012、GCC、BCB等编译器下测试通过)
- vc中字符格式的转换TCHAR char string
- [C/C++] 各种C/C++编译器对UTF-8源码文件的兼容性测试(VC、GCC、BCB)