您的位置：首页 > 其它

Scrapy抓取网页数据

2014-05-12 17:12 190 查看

工具，firefox(推荐好用)

相较于soupbeautifull，scrapy显得优雅又快速，试手如下

from scrapy.spider import Spider
from scrapy.selector import Selector
from health.items import HealthItem
# from health.pipelines import HealthPipeline
import simplejson

class DmozSpider(Spider):
name = "all"
allowed_domains = ["xywy.org"]
start_urls = [
"http://zzk.xywy.com/",
]

def parse(self, response):
filename = response.url.split(".")[-2]

sel = Selector(response)
sites = sel.xpath('//div[@class="shentih"]')
results = sites.xpath('./div/div/div/*/*/a[@class="fsize14"]')
# results = sites.xpath('./div/div/div[@id="AList"]/*/*/a[@class="fsize14"]')
item = HealthItem()

for site in results:
for title,link in zip(site.xpath('text()').extract(),site.xpath('@href').extract()):
item['title'] = title.encode('utf-8')
item['link'] = link.encode('utf-8')

yield item

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航