IOS汉字编码转化
2015-07-08 10:30
465 查看
Unicode转化为汉字
+ (NSString *)replaceUnicode:(NSString *)unicodeStr {
NSString *tempStr1 = [unicodeStrstringByReplacingOccurrencesOfString:@"\\u"withString:@"\\U"];
NSString *tempStr2 = [tempStr1stringByReplacingOccurrencesOfString:@"\""withString:@"\\\""];
NSString *tempStr3 = [[@"\""stringByAppendingString:tempStr2]stringByAppendingString:@"\""];
NSData *tempData = [tempStr3dataUsingEncoding:NSUTF8StringEncoding];
NSString* returnStr = [NSPropertyListSerializationpropertyListFromData:tempData
mutabilityOption:NSPropertyListImmutable
format:NULL
errorDescription:NULL];
return [returnStrstringByReplacingOccurrencesOfString:@"\\r\\n"withString:@"\n"];
}
汉字与utf8相互转化
NSString* strA = [@"%E4%B8%AD%E5%9B%BD"stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString *strB = [@"中国"stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString 转化为utf8
NSString *strings = [NSStringstringWithFormat:@"abc"];
NSLog(@"strings : %@",strings);
CF_EXPORT
CFStringRef CFURLCreateStringByAddingPercentEscapes(CFAllocatorRef allocator,CFStringReforiginalString,CFStringRef charactersToLeaved, CFStringReflegalURLCharactersToBeEscaped,CFStringEncoding encoding);
NSString *encodedValue = (__bridge NSString*)CFURLCreateStringByAddingPercentEscapes(nil,
(__bridgeCFStringRef)strings,nil, (CFStringRef)@"!*'();:@&=+$,/?%#[]",kCFStringEncodingUTF8);
iso8859-1 到 unicode编码转换
+ (NSString *)changeISO88591StringToUnicodeString:(NSString *)iso88591String
{
NSMutableString *srcString = [[[NSMutableString alloc]initWithString:iso88591String] autorelease];
[srcString replaceOccurrencesOfString:@"&" withString:@"&" options:NSLiteralSearchrange:NSMakeRange(0,
[srcString length])];
[srcString replaceOccurrencesOfString:@"" withString:@"" options:NSLiteralSearchrange:NSMakeRange(0,
[srcString length])];
NSMutableString *desString = [[[NSMutableString alloc]init] autorelease];
NSArray *arr = [srcString componentsSeparatedByString:@";"];
for(int i=0;i<[arr count]-1;i++){
NSString *v = [arr objectAtIndex:i];
char *c = malloc(3);
int value = [StringUtil changeHexStringToDecimal:v];
c[1] = value &0x00FF;
c[0] = value >>8 &0x00FF;
c[2] = '\0';
[desString appendString:[NSString stringWithCString:c encoding:NSUnicodeStringEncoding]];
free(c);
}
return desString;
}
Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?
A: There are three or four options for making Unicode fit into an 8-bit format.
a) Use UTF-8. This preserves ASCII, but not Latin-1, because the characters >127 are different from Latin-1. UTF-8 uses the bytes in the ASCII only for ASCII characters. Therefore,
it works well in any environment where ASCII characters have a significance as syntax characters, e.g. file name syntaxes, markup languages, etc., but where the all other characters may use arbitrary bytes.
Example: “Latin Small Letter s with Acute” (015B) would be encoded as two bytes: C5 9B.
b) Use Java or C style escapes, of the form \uXXXXX or \xXXXXX. This format is not standard for text files, but well defined in the framework of the languages in question, primarily
for source files.
Example: The Polish word “wyjście” with character “Latin Small Letter s with Acute” (015B) in the middle (ś is one character) would look like: “wyj\u015Bcie".
c) Use the XXXX; or DDDDD; numeric character escapes as in HTML or XML. Again, these are not standard for plain text files, but well defined within the framework of these markup
languages.
Example: “wyjście” would look like “wyjście"
d) Use SCSU. This format compresses Unicode into 8-bit format,
preserving most of ASCII, but using some of the control codes as commands for the decoder. However, while ASCII text will look like ASCII text after being encoded in SCSU, other characters may occasionally be encoded with the same byte values, making SCSU
unsuitable for 8-bit channels that blindly interpret any of the bytes as ASCII characters.
Example: “<SC2> wyjÛcie” where <SC2> indicates the byte 0x12 and “Û” corresponds to byte 0xDB. [AF] & [KW]
如c所描述,这是一种“未标准"但广泛采用的做法,说是山寨编码也行 :-)
所以编码过程是
字符串 -> Unicode编码 -> XXXX; or DDDDD;
解码过程反过来即可
http://unicode.org/faq/utf_bom.html#General
+ (NSString *)replaceUnicode:(NSString *)unicodeStr {
NSString *tempStr1 = [unicodeStrstringByReplacingOccurrencesOfString:@"\\u"withString:@"\\U"];
NSString *tempStr2 = [tempStr1stringByReplacingOccurrencesOfString:@"\""withString:@"\\\""];
NSString *tempStr3 = [[@"\""stringByAppendingString:tempStr2]stringByAppendingString:@"\""];
NSData *tempData = [tempStr3dataUsingEncoding:NSUTF8StringEncoding];
NSString* returnStr = [NSPropertyListSerializationpropertyListFromData:tempData
mutabilityOption:NSPropertyListImmutable
format:NULL
errorDescription:NULL];
return [returnStrstringByReplacingOccurrencesOfString:@"\\r\\n"withString:@"\n"];
}
汉字与utf8相互转化
NSString* strA = [@"%E4%B8%AD%E5%9B%BD"stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString *strB = [@"中国"stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString 转化为utf8
NSString *strings = [NSStringstringWithFormat:@"abc"];
NSLog(@"strings : %@",strings);
CF_EXPORT
CFStringRef CFURLCreateStringByAddingPercentEscapes(CFAllocatorRef allocator,CFStringReforiginalString,CFStringRef charactersToLeaved, CFStringReflegalURLCharactersToBeEscaped,CFStringEncoding encoding);
NSString *encodedValue = (__bridge NSString*)CFURLCreateStringByAddingPercentEscapes(nil,
(__bridgeCFStringRef)strings,nil, (CFStringRef)@"!*'();:@&=+$,/?%#[]",kCFStringEncodingUTF8);
iso8859-1 到 unicode编码转换
+ (NSString *)changeISO88591StringToUnicodeString:(NSString *)iso88591String
{
NSMutableString *srcString = [[[NSMutableString alloc]initWithString:iso88591String] autorelease];
[srcString replaceOccurrencesOfString:@"&" withString:@"&" options:NSLiteralSearchrange:NSMakeRange(0,
[srcString length])];
[srcString replaceOccurrencesOfString:@"" withString:@"" options:NSLiteralSearchrange:NSMakeRange(0,
[srcString length])];
NSMutableString *desString = [[[NSMutableString alloc]init] autorelease];
NSArray *arr = [srcString componentsSeparatedByString:@";"];
for(int i=0;i<[arr count]-1;i++){
NSString *v = [arr objectAtIndex:i];
char *c = malloc(3);
int value = [StringUtil changeHexStringToDecimal:v];
c[1] = value &0x00FF;
c[0] = value >>8 &0x00FF;
c[2] = '\0';
[desString appendString:[NSString stringWithCString:c encoding:NSUnicodeStringEncoding]];
free(c);
}
return desString;
}
Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?
A: There are three or four options for making Unicode fit into an 8-bit format.
a) Use UTF-8. This preserves ASCII, but not Latin-1, because the characters >127 are different from Latin-1. UTF-8 uses the bytes in the ASCII only for ASCII characters. Therefore,
it works well in any environment where ASCII characters have a significance as syntax characters, e.g. file name syntaxes, markup languages, etc., but where the all other characters may use arbitrary bytes.
Example: “Latin Small Letter s with Acute” (015B) would be encoded as two bytes: C5 9B.
b) Use Java or C style escapes, of the form \uXXXXX or \xXXXXX. This format is not standard for text files, but well defined in the framework of the languages in question, primarily
for source files.
Example: The Polish word “wyjście” with character “Latin Small Letter s with Acute” (015B) in the middle (ś is one character) would look like: “wyj\u015Bcie".
c) Use the XXXX; or DDDDD; numeric character escapes as in HTML or XML. Again, these are not standard for plain text files, but well defined within the framework of these markup
languages.
Example: “wyjście” would look like “wyjście"
d) Use SCSU. This format compresses Unicode into 8-bit format,
preserving most of ASCII, but using some of the control codes as commands for the decoder. However, while ASCII text will look like ASCII text after being encoded in SCSU, other characters may occasionally be encoded with the same byte values, making SCSU
unsuitable for 8-bit channels that blindly interpret any of the bytes as ASCII characters.
Example: “<SC2> wyjÛcie” where <SC2> indicates the byte 0x12 and “Û” corresponds to byte 0xDB. [AF] & [KW]
如c所描述,这是一种“未标准"但广泛采用的做法,说是山寨编码也行 :-)
所以编码过程是
字符串 -> Unicode编码 -> XXXX; or DDDDD;
解码过程反过来即可
http://unicode.org/faq/utf_bom.html#General
相关文章推荐
- iOS 的 XMPPFramework 简介
- Xcode磁盘空间大清理
- iOS 支付宝应用(备用参考)
- iOS开发-使用Storyboard进行界面跳转及传值
- iOS开发-文件管理(一)
- iOS 多个按钮切换
- 用CocoaPods做iOS程序的依赖管理(唐巧)
- ios 实时检测2G、3G、4G
- [iOS]简单的让Navigation框架视图支持系统右滑返回
- iOS 获取当前时间以及计算年龄(时间差)
- iOS block的用法
- zabbix与nagios对比
- iostat,mpstat,sar即时查看工具,sar累计查看工具
- iostat,mpstat,sar即时查看工具,sar累计查看工具
- IOS小技巧——使用FMDB时如何把一个对像中的NSArray数组属性存到表中
- IOS小技巧——如何使用GCD创建单例模式
- IOS小技巧——如何将美工的16进制颜色转换成IOS中的RGB颜色
- IOS小技巧——如何润色一个Label, 一个label中,展现多种字体效果(图文混编 1)
- IOS不用AutoLayout也能实现自动布局的类(3)----MyRelativeLayout横空出世
- IOS开发之Storyboard应用