您的位置:首页 > 编程语言 > Python开发

python 利用selenium模拟登录帐号验证网站并获取cookie

2015-12-06 21:44 1201 查看

1.安装selenium package:

sudo pip install -U selenium
如果没有pip,先安装pip:
sudo python setup.py install

2.引入selenium package, 建立webdriver对象:

from selenium import webdriver

    sel = selenium.webdriver.Chrome()


在这一步,可能会提示chrome path 的错误,这是因为操作chrome浏览器需要有ChromeDriver的驱动来协助,驱动下载地址:

http://chromedriver.storage.googleapis.com/index.html?path=2.7/

.下载相应版本,并解压到目录
/usr/bin


3.打开设定的url,并等待response:

loginurl = 'http://weibo.com/'
    #open the login in page
    sel.get(loginurl)
    time.sleep(10)


4.通过xpath找到登录框,并填入相应帐号密码,模拟点击登录:

#sign in the username
    try:
        sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[1]/div[1]/input").send_keys('yourusername')
        print 'user success!'
    except:
        print 'user error!'
    time.sleep(1)
    #sign in the pasword
    try:
        sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[2]/div[1]/input").send_keys('yourPW')
        print 'pw success!'
    except:
        print 'pw error!'
    time.sleep(1)
    #click to login
    try:
        sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()
        print 'click success!'
    except:
        print 'click error!'
    time.sleep(3)


5.验证登录成功与否,若currenturl发生变化,则认为登录成功:

curpage_url = sel.current_url
    print curpage_url
    while(curpage_url == loginurl):
        #print 'please input the verify code:'
        print 'please input the verify code:'
        verifycode = sys.stdin.readline()
        sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[3]/div[1]/input").send_keys(verifycode)
        try:
            sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()
            print 'click success!'
        except:
             print 'click error!'
        time.sleep(3)
        curpage_url = sel.current_url


6.通过对象的方法获取当前访问网站的session cookie:

#get the session cookie
    cookie = [item["name"] + "=" + item["value"] for item in sel.get_cookies()]
    #print cookie

    cookiestr = ';'.join(item for item in cookie)
    print cookiestr


7.得到cookie之后,就可以通过urllib2访问相应的网站,并可实现网页爬取等工作:

import urllib2

    print '%%%using the urllib2 !!'
    homeurl = sel.current_url
    print 'homeurl: %s' % homeurl
    headers = {'cookie':cookiestr}
    req = urllib2.Request(homeurl, headers = headers)
    try:
        response = urllib2.urlopen(req)
        text = response.read()
        fd = open('homepage', 'w')
        fd.write(text)
        fd.close()
        print '###get home page html success!!'
    except:
        print '### get home page html error!!'


参考链接:
http://splinter.readthedocs.org/en/latest/drivers/chrome.html http://www.testwo.com/blog/6931 http://docs.seleniumhq.org/projects/ http://docs.seleniumhq.org/docs/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: