strlen函数与多字节编码的字符串字节长度
2008-02-18 20:27
639 查看
Just to remember that strlen() return the number of characters of a string. Often the strlen() function is used to compute the length in bytes of a string. This is correct until string is single byte encoded. If multi-byte char-set is used this constraint i no more verified. So when you require the number of bytes of a ASCII or UTF-8 encoded string, it is better to use following function:
/**
* Count the number of bytes of a given string.
* Input string is expected to be ASCII or UTF-8 encoded.
* Warning: the function doesn't return the number of chars
* in the string, but the number of bytes.
*
* @param string $str The string to compute number of bytes
*
* @return The length in bytes of the given string.
*/
function strBytes($str)
{
// STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT
// Number of characters in string
$strlen_var = strlen($str);
// string bytes counter
$d = 0;
/*
* Iterate over every character in the string,
* escaping with a slash or encoding to UTF-8 where necessary
*/
for ($c = 0; $c < $strlen_var; ++$c) {
$ord_var_c = ord($str{$d});
switch (true) {
case (($ord_var_c >= 0x20) && ($ord_var_c <= 0x7F)):
// characters U-00000000 - U-0000007F (same as ASCII)
$d++;
break;
case (($ord_var_c & 0xE0) == 0xC0):
// characters U-00000080 - U-000007FF, mask 110XXXXX
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=2;
break;
case (($ord_var_c & 0xF0) == 0xE0):
// characters U-00000800 - U-0000FFFF, mask 1110XXXX
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=3;
break;
case (($ord_var_c & 0xF8) == 0xF0):
// characters U-00010000 - U-001FFFFF, mask 11110XXX
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=4;
break;
case (($ord_var_c & 0xFC) == 0xF8):
// characters U-00200000 - U-03FFFFFF, mask 111110XX
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=5;
break;
case (($ord_var_c & 0xFE) == 0xFC):
// characters U-04000000 - U-7FFFFFFF, mask 1111110X
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=6;
break;
default:
$d++;
}
}
return $d;
}
This function has been adapted form the JSON function used to convert character in UTF-8 representation.
With this new function we solved problem in JSON and in PEAR/SOAP php libraries.
/**
* Count the number of bytes of a given string.
* Input string is expected to be ASCII or UTF-8 encoded.
* Warning: the function doesn't return the number of chars
* in the string, but the number of bytes.
*
* @param string $str The string to compute number of bytes
*
* @return The length in bytes of the given string.
*/
function strBytes($str)
{
// STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT
// Number of characters in string
$strlen_var = strlen($str);
// string bytes counter
$d = 0;
/*
* Iterate over every character in the string,
* escaping with a slash or encoding to UTF-8 where necessary
*/
for ($c = 0; $c < $strlen_var; ++$c) {
$ord_var_c = ord($str{$d});
switch (true) {
case (($ord_var_c >= 0x20) && ($ord_var_c <= 0x7F)):
// characters U-00000000 - U-0000007F (same as ASCII)
$d++;
break;
case (($ord_var_c & 0xE0) == 0xC0):
// characters U-00000080 - U-000007FF, mask 110XXXXX
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=2;
break;
case (($ord_var_c & 0xF0) == 0xE0):
// characters U-00000800 - U-0000FFFF, mask 1110XXXX
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=3;
break;
case (($ord_var_c & 0xF8) == 0xF0):
// characters U-00010000 - U-001FFFFF, mask 11110XXX
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=4;
break;
case (($ord_var_c & 0xFC) == 0xF8):
// characters U-00200000 - U-03FFFFFF, mask 111110XX
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=5;
break;
case (($ord_var_c & 0xFE) == 0xFC):
// characters U-04000000 - U-7FFFFFFF, mask 1111110X
// see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
$d+=6;
break;
default:
$d++;
}
}
return $d;
}
This function has been adapted form the JSON function used to convert character in UTF-8 representation.
With this new function we solved problem in JSON and in PEAR/SOAP php libraries.
相关文章推荐
- Javascript 字符串字节长度计算函数代码与效率分析(for VS 正则)
- PHP字符串长度计算 - strlen()函数使用介绍
- 写一个返回字符串长度的函数,函数里面不用局部变量,不能调用系统函数strlen
- 实现字符串的长度检测strlen函数
- C语言strlen()函数:返回字符串的长度
- C++不使用变量求字符串长度strlen函数的实现方法
- PHP字符串长度计算 - strlen()函数使用介绍
- 自定义EL函数解决JSTL标签不足之处——按字节长度截取字符串
- 求字符串长度的函数strlen
- C程序中,strlen是求取字符串长度,若对整形数组,求取的是什么?还有整形数组作函数参数的问题?
- php中常用的字符串长度函数strlen()与mb_strlen()实例解释
- 自定义EL函数解决JSTL标签不足之处——按字节长度截取字符串
- C语言strlen()函数:返回字符串的实际长度
- strlen() 计算字符串长度函数
- php判断字符串长度 strlen()与mb_strlen()函数
- 用strlen函数计算字符串的长度(C语言)
- PHP内置的字符串长度函数strlen mb_strlen
- ASP UTF-8编码下字符串截取和获取长度函数
- C语言中strlen()函数和sizeof()函数区别[关于字符串长度]
- strlen、ord、substr函数——获取长度、ASCII码及部分字符串