您的位置：首页 > 编程语言 > Python开发

Python中关于URL的处理（基于Python2.7版本）

2018-07-15 09:27 232 查看

参考官方文档：https://docs.python.org/3/library/urllib.html点击打开链接

1、完整的url语法格式：

协议://用户名@密码:子域名.域名.顶级域名:端口号/目录/文件名.文件后缀?参数=值#标识

2 、urlparse模块对url的处理方法
urlparse模块对url的主要处理方法有：urljoin/urlsplit/urlunsplit/urlparse等。该模块对url的定义采用六元组的形式：schema://netloc/path;parameters?query#fragment。其中，netloc包含下表的后4个属性

urlparse()
利用urlparse()方法对url进行解析，返回六元组；urlunparse()对六元组进行组合
urljoin()
利用urljoin()方法对绝对url地址与相对url地址进行拼合

主要使用urljoin()比较常用——给出以下示例：

>>>from urllib.parse import urljoin
>>> urljoin("http://www.chachabei.com/folder/currentpage.html", "anotherpage.html")
'http://www.chachabei.com/folder/anotherpage.html'
>>> urljoin("http://www.chachabei.com/folder/currentpage.html", "/anotherpage.html")
'http://www.chachabei.com/anotherpage.html'
>>> urljoin("http://www.chachabei.com/folder/currentpage.html", "folder2/anotherpage.html")
'http://www.chachabei.com/folder/folder2/anotherpage.html'
>>> urljoin("http://www.chachabei.com/folder/currentpage.html", "/folder2/anotherpage.html")
'http://www.chachabei.com/folder2/anotherpage.html'
>>> urljoin("http://www.chachabei.com/abc/folder/currentpage.html", "/folder2/anotherpage.html")
'http://www.chachabei.com/folder2/anotherpage.html'
>>> urljoin("http://www.chachabei.com/abc/folder/currentpage.html", "../anotherpage.html")
'http://www.chachabei.com/abc/anotherpage.html'

urlsplit()
利用urlsplit()方法可以对URL进行分解；与urlparse()相比，urlsplit()函数返回一个五元组，没有parameter参数。
相应的，urlunsplit()方法可以对urlsplit()分解的五元组进行合并。两种方法组合在一起，可以对URL进行有效地格式化，特殊字符在此过程中得到转换。

3 urllib模块对url的编码与解码
urllib模块的quote_plus()方法实现对url的编码，包括对中文的编码；unquote_plus()方法实现对url的解码，包括对中文的解码。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航