python 简单爬虫原理
2016-04-13 22:16
429 查看
#coding=utf-8
#导入常用模块
import urllib
import urllib2
import cookielib
from bs4 import BeautifulSoup
#抓取的url
url = "http://www.baidu.com"
values = {
'userName':'aaaaaa',
'password':'bbbbbb'
}
postdata = urllib.urlencode(values)
user_agent = "Mozilla/5.0 (Windows NT 6.1;WOW64)"
headers = {"User-Agent":user_agent}
request =urllib2.Request(url, data = None, headers=headers)
try:
response =urllib2.urlopen(request, timeout = 2)
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.reason
except:
print"Error"
data = response.read()
soup = BeautifulSoup(data, "lxml")
for link in soup.find_all('a'):
print link
#导入常用模块
import urllib
import urllib2
import cookielib
from bs4 import BeautifulSoup
#抓取的url
url = "http://www.baidu.com"
values = {
'userName':'aaaaaa',
'password':'bbbbbb'
}
postdata = urllib.urlencode(values)
user_agent = "Mozilla/5.0 (Windows NT 6.1;WOW64)"
headers = {"User-Agent":user_agent}
request =urllib2.Request(url, data = None, headers=headers)
try:
response =urllib2.urlopen(request, timeout = 2)
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.reason
except:
print"Error"
data = response.read()
soup = BeautifulSoup(data, "lxml")
for link in soup.find_all('a'):
print link
相关文章推荐
- Python装饰器
- 数字:
- pyspark使用 jupyter ,matplotlib, ipython
- 初识Python
- ubuntu安装pip+python27+ipython+scrapy+zlib-及遇到的各种问题解决
- ubuntu-pip+python27+ipython-安装-及遇到的各种问题解决
- python基本数据类型-字典
- sublime 安装Python
- python基本数据类型-元组
- Python3爬虫实战:爬取大众点评网某地区所有酒店相关信息
- python的函数
- 使用2to3将代码移植到Python 3
- Python模块受欢迎排行榜Top200
- python文件操作
- Windows下Python包的pip安装
- [转]关于Sublime的repl设置
- python 日志
- python 经典博客链接
- Python urllib2的基本用法
- python3不再支持mysqldb 请用pymysql和mysql.connector