您的位置:首页 > Web前端 > JavaScript

selenium 和 phantomJS或chrome浏览器抓取渲染网页

2016-07-05 12:36 381 查看
首先pip安装selenium

一、phantomjs

1、下载phantomjs压缩包,解压,把bin文件夹路径加入PATH环境变量

2、代码
#coding=utf-8
import requests
import  re
from pyquery import PyQuery as pq
from lxml import etree
from bs4 import BeautifulSoup
import sys
from selenium import webdriver
reload(sys)
sys.setdefaultencoding("utf-8")

def getHtml(url):
driver = webdriver.PhantomJS(executable_path='/home/lhy/phantomjs-1.9.8-linux-x86_64/bin/phantomjs')
driver.get(url)
fo = open("phonesinfo2.txt", "wb")
fo.write(driver.page_source)
fo.close()
return driver.page_source


二、chrome浏览器

1、必须安装chrome浏览器

2、下载chrome驱动chromedriver

3、把驱动加如PATH环境变量(注意最好修改/etc/profile配置,永久生效)

4、代码

#coding=utf-8
import requests
import  re
from pyquery import PyQuery as pq
from lxml import etree
from bs4 import BeautifulSoup
import sys
from selenium import webdriver
reload(sys)
sys.setdefaultencoding("utf-8")

def getHtml(url):
  driver=webdriver.Chrome();
driver.get(url)
fo = open("phonesinfo2.txt", "wb")
fo.write(driver.page_source)
fo.close()
return driver.page_source
注意运行过程中会打开chrome浏览器
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: