python 利用selenium模拟登录帐号验证网站并获取cookie
2015-12-06 21:44
1201 查看
1.安装selenium package:
sudo pip install -U selenium如果没有pip,先安装pip:
sudo python setup.py install
2.引入selenium package, 建立webdriver对象:
from selenium import webdriver sel = selenium.webdriver.Chrome()
在这一步,可能会提示chrome path 的错误,这是因为操作chrome浏览器需要有ChromeDriver的驱动来协助,驱动下载地址:
http://chromedriver.storage.googleapis.com/index.html?path=2.7/
.下载相应版本,并解压到目录
/usr/bin
3.打开设定的url,并等待response:
loginurl = 'http://weibo.com/' #open the login in page sel.get(loginurl) time.sleep(10)
4.通过xpath找到登录框,并填入相应帐号密码,模拟点击登录:
#sign in the username try: sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[1]/div[1]/input").send_keys('yourusername') print 'user success!' except: print 'user error!' time.sleep(1) #sign in the pasword try: sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[2]/div[1]/input").send_keys('yourPW') print 'pw success!' except: print 'pw error!' time.sleep(1) #click to login try: sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click() print 'click success!' except: print 'click error!' time.sleep(3)
5.验证登录成功与否,若currenturl发生变化,则认为登录成功:
curpage_url = sel.current_url print curpage_url while(curpage_url == loginurl): #print 'please input the verify code:' print 'please input the verify code:' verifycode = sys.stdin.readline() sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[3]/div[1]/input").send_keys(verifycode) try: sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click() print 'click success!' except: print 'click error!' time.sleep(3) curpage_url = sel.current_url
6.通过对象的方法获取当前访问网站的session cookie:
#get the session cookie cookie = [item["name"] + "=" + item["value"] for item in sel.get_cookies()] #print cookie cookiestr = ';'.join(item for item in cookie) print cookiestr
7.得到cookie之后,就可以通过urllib2访问相应的网站,并可实现网页爬取等工作:
import urllib2 print '%%%using the urllib2 !!' homeurl = sel.current_url print 'homeurl: %s' % homeurl headers = {'cookie':cookiestr} req = urllib2.Request(homeurl, headers = headers) try: response = urllib2.urlopen(req) text = response.read() fd = open('homepage', 'w') fd.write(text) fd.close() print '###get home page html success!!' except: print '### get home page html error!!'
参考链接:
http://splinter.readthedocs.org/en/latest/drivers/chrome.html http://www.testwo.com/blog/6931 http://docs.seleniumhq.org/projects/ http://docs.seleniumhq.org/docs/
相关文章推荐
- Python配置Houdini项目环境变量以及集成工具架
- Python小爬虫练习
- 在window上使用python
- Python面向对象
- 机器学习python实战——决策树
- Python开发----变量的定义
- python随机数整理
- Python进阶05 循环设计
- Python中的__main__
- Python进阶04 函数的参数对应
- Python开发环境的搭建
- Python进阶03 模块
- Python进阶02 文本文件的输入输出
- Python进阶01 词典
- Python基础10 反过头来看看
- Python基础09 面向对象的进一步拓展
- Python基础08 面向对象的基本概念
- Python基础07 函数
- Python基础06 循环
- Python基础05 缩进和选择