您的位置:首页 > 其它

Scrapy抓取网页数据

2014-05-12 17:12 190 查看
工具,firefox(推荐好用)

相较于soupbeautifull,scrapy显得优雅又快速,试手如下

from scrapy.spider import Spider
from scrapy.selector import Selector
from health.items import HealthItem
# from health.pipelines import HealthPipeline
import simplejson

class DmozSpider(Spider):
name = "all"
allowed_domains = ["xywy.org"]
start_urls = [
"http://zzk.xywy.com/",
]

def parse(self, response):
filename = response.url.split(".")[-2]

sel = Selector(response)
sites = sel.xpath('//div[@class="shentih"]')
results = sites.xpath('./div/div/div/*/*/a[@class="fsize14"]')
# results = sites.xpath('./div/div/div[@id="AList"]/*/*/a[@class="fsize14"]')
item = HealthItem()

for site in results:
for title,link in zip(site.xpath('text()').extract(),site.xpath('@href').extract()):
item['title'] = title.encode('utf-8')
item['link'] = link.encode('utf-8')

yield item
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: