关于utf-8,utf-7,unicode几种编码的区别
2006-10-11 23:20
363 查看
今天上csdn论坛时看到一个关于utf-8,utf-7......几种编码的区别,说法不一,虽然经常使用这几种编码,咋一想,还真有点模糊,于是百度一下,找了一些相关文章,总结如下(仅代表个人观点):
unicode :
每个字符2个字节
utf-8:
英文字符即能用8位表示的字符用1个字节表示
能用8 到 11位 表示的字符用2个字节表示
能用12 到 16 位表示的字符用2个字节表示
utf-7:
遇英语字母、数字和常见符号直接用8位表示(不过我也没搞清楚什么时常见符号,如"&"符就会当成非常见字符。。。)
其他的符号串用+-来标记始终,如"a中中a"
在遇到中时会如下编码
a的编码 +的编码 中的编码 中的编码 -的编码 a的编码
共6个字节。。。
测试代码
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
a = "a中中a";
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
byte[] bb = Encoding.UTF8.GetBytes(a);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine("长度:" + bb.Length);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
foreach (byte bbb in bb)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Console.Write(bbb.ToString()+" ");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
bb = Encoding.UTF7.GetBytes(a);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine("长度:"+bb.Length);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
foreach (byte bbb in bb)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Console.Write(bbb.ToString() + " ");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
bb = Encoding.ASCII.GetBytes(a);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine("长度:" + bb.Length);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
foreach (byte bbb in bb)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Console.Write(bbb.ToString() + " ");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.ReadLine();
unicode :
每个字符2个字节
utf-8:
英文字符即能用8位表示的字符用1个字节表示
能用8 到 11位 表示的字符用2个字节表示
能用12 到 16 位表示的字符用2个字节表示
utf-7:
遇英语字母、数字和常见符号直接用8位表示(不过我也没搞清楚什么时常见符号,如"&"符就会当成非常见字符。。。)
其他的符号串用+-来标记始终,如"a中中a"
在遇到中时会如下编码
a的编码 +的编码 中的编码 中的编码 -的编码 a的编码
共6个字节。。。
测试代码
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
a = "a中中a";
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
byte[] bb = Encoding.UTF8.GetBytes(a);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine("长度:" + bb.Length);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
foreach (byte bbb in bb)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Console.Write(bbb.ToString()+" ");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
bb = Encoding.UTF7.GetBytes(a);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine("长度:"+bb.Length);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
foreach (byte bbb in bb)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Console.Write(bbb.ToString() + " ");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
bb = Encoding.ASCII.GetBytes(a);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine();
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.WriteLine("长度:" + bb.Length);
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
foreach (byte bbb in bb)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockStart.gif)
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ContractedBlock.gif)
...{
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/InBlock.gif)
Console.Write(bbb.ToString() + " ");
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/ExpandedBlockEnd.gif)
}
![](http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif)
Console.ReadLine();
相关文章推荐
- 关于编码ansi、GB2312、unicode与utf-8的区别
- 关于编码ansi、GB2312、unicode与utf-8的区别
- 关于编码ansi、GB2312、unicode与utf-8的区别
- 关于编码ansi、GB2312、unicode与utf-8的区别(带源码下载)
- 关于编码ansi、GB2312、unicode与utf-8的区别
- 关于编码ansi、GB2312、unicode与utf-8的区别
- 关于编码ansi、GB2312、unicode与utf-8的区别
- 关于编码ansi、GB2312、unicode与utf-8的区别
- 关于编码ansi、GB2312、unicode与utf-8的区别
- 关于编码(Unicode)与转换格式(UTF)的区别
- 关于几种编码详解(Unicode,UTF-8,GB系列)
- 关于几种编码详解(Unicode,UTF-8,GB系列)
- 关于编码ansi、GB2312、unicode与utf-8的区别
- ASCII、Unicode、GBK和UTF-8字符编码的区别联系
- ASCII、Unicode、GBK和UTF-8字符编码的区别联系
- unicode 和 utf-8字符编码的区别
- JAVA 编码之 ASCII、Unicode、GBK和UTF-8字符编码的区别联系
- unicode ansi utf-8 unicode_big_endian编码的区别
- unicode,ansi,utf-8,unicode big endian编码的区别
- 通过这几天的研究,终于明白了Unicode和UTF-8之间编码的区别。Unicode是一个字符集,而UTF-8是Unicode的其中一种,Unicode是定长的都为双字节,而UTF-8是可变的,对于