HTML Url 编码(Encode 和 Url Decode)
2013-05-09 10:24
573 查看
URL 编码 - 从 %00 到 %8f
ASCII Value | URL-encode | ASCII Value | URL-encode | ASCII Value | URL-encode |
---|---|---|---|---|---|
? | %00 | 0 | %30 | ` | %60 |
%01 | 1 | %31 | a | %61 | |
%02 | 2 | %32 | b | %62 | |
%03 | 3 | %33 | c | %63 | |
%04 | 4 | %34 | d | %64 | |
%05 | 5 | %35 | e | %65 | |
%06 | 6 | %36 | f | %66 | |
%07 | 7 | %37 | g | %67 | |
backspace | %08 | 8 | %38 | h | %68 |
tab | %09 | 9 | %39 | i | %69 |
linefeed | %0a | : | %3a | j | %6a |
%0b | ; | %3b | k | %6b | |
%0c | < | %3c | l | %6c | |
c return | %0d | = | %3d | m | %6d |
%0e | > | %3e | n | %6e | |
%0f | ? | %3f | o | %6f | |
%10 | @ | %40 | p | %70 | |
%11 | A | %41 | q | %71 | |
%12 | B | %42 | r | %72 | |
%13 | C | %43 | s | %73 | |
%14 | D | %44 | t | %74 | |
%15 | E | %45 | u | %75 | |
%16 | F | %46 | v | %76 | |
%17 | G | %47 | w | %77 | |
%18 | H | %48 | x | %78 | |
%19 | I | %49 | y | %79 | |
%1a | J | %4a | z | %7a | |
%1b | K | %4b | { | %7b | |
%1c | L | %4c | | | %7c | |
%1d | M | %4d | } | %7d | |
%1e | N | %4e | ~ | %7e | |
%1f | O | %4f | %7f | ||
space | %20 | P | %50 | ? | %80 |
! | %21 | Q | %51 | %81 | |
" | %22 | R | %52 | ? | %82 |
# | %23 | S | %53 | ? | %83 |
$ | %24 | T | %54 | ? | %84 |
% | %25 | U | %55 | … | %85 |
& | %26 | V | %56 | ? | %86 |
' | %27 | W | %57 | ? | %87 |
( | %28 | X | %58 | ? | %88 |
) | %29 | Y | %59 | ‰ | %89 |
* | %2a | Z | %5a | ? | %8a |
+ | %2b | [ | %5b | ? | %8b |
, | %2c | \ | %5c | ? | %8c |
- | %2d | ] | %5d | %8d | |
. | %2e | ^ | %5e | ? | %8e |
/ | %2f | _ | %5f | %8f |
URL 编码 - 从 %90 到 %ff
ASCII Value | URL-encode | ASCII Value | URL-encode | ASCII Value | URL-encode |
---|---|---|---|---|---|
%90 | ? | %c0 | ? | %f0 | |
‘ | %91 | ? | %c1 | ? | %f1 |
’ | %92 | ? | %c2 | ò | %f2 |
“ | %93 | ? | %c3 | ó | %f3 |
” | %94 | ? | %c4 | ? | %f4 |
? | %95 | ? | %c5 | ? | %f5 |
– | %96 | ? | %c6 | ? | %f6 |
— | %97 | ? | %c7 | ÷ | %f7 |
? | %98 | ? | %c8 | ? | %f8 |
? | %99 | ? | %c9 | ù | %f9 |
? | %9a | ? | %ca | ú | %fa |
? | %9b | ? | %cb | ? | %fb |
? | %9c | ? | %cc | ü | %fc |
%9d | ? | %cd | ? | %fd | |
? | %9e | ? | %ce | ? | %fe |
? | %9f | ? | %cf | ? | %ff |
%a0 | ? | %d0 | |||
? | %a1 | ? | %d1 | ||
? | %a2 | ? | %d2 | ||
? | %a3 | ? | %d3 | ||
%a4 | ? | %d4 | |||
? | %a5 | ? | %d5 | ||
| | %a6 | ? | %d6 | ||
§ | %a7 | %d7 | |||
¨ | %a8 | ? | %d8 | ||
? | %a9 | ? | %d9 | ||
? | %aa | ? | %da | ||
? | %ab | ? | %db | ||
? | %ac | ? | %dc | ||
? | %ad | ? | %dd | ||
? | %ae | ? | %de | ||
? | %af | ? | %df | ||
° | %b0 | à | %e0 | ||
± | %b1 | á | %e1 | ||
? | %b2 | ? | %e2 | ||
? | %b3 | ? | %e3 | ||
? | %b4 | ? | %e4 | ||
? | %b5 | ? | %e5 | ||
? | %b6 | ? | %e6 | ||
· | %b7 | ? | %e7 | ||
? | %b8 | è | %e8 | ||
? | %b9 | é | %e9 | ||
? | %ba | ê | %ea | ||
? | %bb | ? | %eb | ||
? | %bc | ì | %ec | ||
? | %bd | í | %ed | ||
? | %be | ? | %ee | ||
? | %bf | ? | %ef |
关于Url Encode 和 Url Decode
对于url中的中文字符,大多数网站都会做编码的处理,这里我们来探讨常用的2中编码和解码在perl中实现。常用的编码方式有2种,GBK和UTF-8,因此URL编码也使用GBK的URL编码和UTF-8的URL编码。
1:GBK进行URL Encode。
1)先对字符串进行GBK编码。请注意,汉字本身采用的就是GBK编码,因此对于汉字,不应该再使用GBK编码。所以实际上如果是针对URL有汉字的URL进行URL编码,就直接使用URL编码函数即可。
2)然后进行URL编码
while(<>){
chomp;
my $gbkec = Encode::encode("gbk",$_); #对字符串进行GBK编码,如果是汉字要省略掉这一步,否则为重复编码。
my $gbkuec = URI::Escape::uri_escape($gbkec); #对已经进行GBK编码的字符串进行URL编码
my $encode = URI::Escape::uri_escape($_); #如果是汉字,可以直接进行URL编码
print "$_ ->[GBK]$gbkec ->[URLencode]$gbkuec\n";
print "$_ -> [URLencode]$encode\n";
}
测试输出:
@# ->[GBK]@# ->[URLencode]%40%23
@# -> [URLencode]%40%23
然后进行URL的GBK解码
解码的过程也要注意,汉字是不是重复使用了GBK解密,代码如下
while(<>){
chomp;
my $gbkec = URI::Escape::uri_unescape($_); #对URL进行解码
my $chstr = Encode::decode("gbk",$gbkec); #对已经解码的url进行GBK解码,汉字本身为GBK编码,不用在进行GBK解码。
my $decode = URI::Escape::uri_unescape($_); #汉字编码的url可直接进行这一步
print "$_ ->[URLdecode]$gbkec ->[GBKDecode]$chstr\n";
print "$_ -> [URLdecode]$decode\n";
}
测试输出:
%40%23
%40%23 ->[URLdecode]@# ->[GBKDecode]@#
%40%23 -> [URLdecode]@#
2:UTF-8进行URL编码。
因为汉字采用的是GBK编码,因此不论在URL编码前还是在URL解码后,都需要调用相应函数进行操作。
对于UTF-8的编码和解码,我们使用UTF-8模块中的函数。
如下是编码后又解码的过程:
while(<>){
chomp;
my $utf8str = $_;
utf8::encode($utf8str); #进行UTF-8编码,函数直接操作变量
my $encode = URI::Escape::uri_escape($utf8str); #对已经经过UTF-8编码的对象进行URL编码
print "UTF-8 Encode:[$_] -> [$utf8str] -> [$encode]\n";
my $urldecode = URI::Escape::uri_unescape($encode); #对经过UTF-8编码的URL进行URL解码
my $decode = $urldecode;
utf8::decode($decode); #对经过URL解码后的字符串进行UTF-8解码
print "UTF-8 Decode:[$encode] -> [$urldecode] -> [$decode]\n";
}
测试输出:
测试
UTF-8 Encode:[测试] -> [???????”] -> [%C2%B2%C3%A2%C3%8A%C3%94]
UTF-8 Decode:[%C2%B2%C3%A2%C3%8A%C3%94] -> [???????”] -> [测试]
总结:大多数国内网站对中文编码都采用GBK,因为这样会少一步编码处理。比如百度,即采用GBK编码。
但是国外网站大多采用UTF-8编码,因为UTF-8相对GBK更广泛。
from : http://hi.baidu.com/youzhch/item/744df75338a741948c12ed96
html encode包含了252个字符,格式为‘&name;’,其中name为大小写敏感的;
xml encode只包含了5个字符,它们是$,<,>,',", 其格式与html相同;
url encode主要是ASCII的控制符号,Non-ASCII,url里的特殊字符(如/,?,&等),不安全字符(会引起二义性的),而encode规则是,使用%和16进制的2位数字(对应的ISO-Lation位置)组成。如空格就是%20,其位置为32.
HTML character references
Character entity references have the format &name; where "name" is acase-sensitive alphanumericstring.
The character entity references <, >, " and & are predefined in HTML and SGML, because <, >, " and & are already used to delimit markup.
XML character references
Unlike traditional HTML with its large range of character entity references, in XML there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:[7]
& → & (ampersand, U+0026)
< → < (less-than sign, U+003C)
> → > (greater-than sign, U+003E)
" → " (quotation mark, U+0022)
' → ' (apostrophe, U+0027)
& has the special problem that it starts with the character to be escaped. A simple Internet search finds thousands of sequences &amp;amp; … in HTML pages for which the algorithm to replace an ampersand by the corresponding
character entity reference was applied too often.
http://en.wikipedia.org/wiki/Character_encodings_in_HTML
List of XML and HTML character entity references
The XML specification defines five "predefined entities" representing special characters, and requires that all XML processors honor them.
The HTML 4 DTDs define 252 named entities, references to which act as mnemonic aliases for certain Unicode characters.
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
URL encode
ASCII Control characters
Why: These characters are not printable.
Non-ASCII characters
Why: These are by definition not legal in URLs since they are not in the ASCII set.
"Reserved characters"
Why: URLs use some characters for special use in defining their syntax. When these characters are not used in their special role inside a URL, they need to be encoded.
"Unsafe characters"
Why: Some characters present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded.
How are characters URL encoded?
URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
Example
Space = decimal code point 32 in the ISO-Latin set.
32 decimal = 20 in hexadecimal
The URL encoded representation will be "%20"
XSS (Cross Site Scripting) Prevention Cheat Sheet
http://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet
相关文章推荐
- HTML Url 编码(Encode 和 Url Decode)
- Html 编码 queryUrl = encodeURI(queryUrl);
- html url 编码 Url Encode 和 Url Decode 防止sql注入URL
- HTML URL 编码
- encodeURLComponent编码问题
- encodeURIComponent编码 URLDecoder.decode解码乱码的问题
- JS 字符串编码函数(解决URL特殊字符传递问题):escape()、encodeURI()、encodeURIComponent()区别详解
- URL解码(Decode)/编码(Encode)
- html-----013----实体字符/HTML URL 编码
- URL中如果出现非ASCII字符时需要进行编码(encode)
- HTML URL 编码 参考手册
- encode url 编码和解码处理
- javascript 中的URL 编码问题! encodeURI, encodeURIComponent 如何使用
- HTML URL 编码
- HTML URL 编码:请参阅:http://www.w3school.com.cn/tags/html_ref_urlencode.html
- 关于python 的url_encode关于的嵌套字典类型变量的编码扩展
- HTML URL 编码(学习笔记)
- java.net.URLEncoder和java.net.URLDecoder的使用和js 中编码(encode)和解码(decode)方法
- JS 字符串编码函数(解决URL特殊字符传递问题):escape()、encodeURI()、encodeURIComponent()区别详解
- js中使用encodeURIComponent编码url后java后台的解码