您的位置:首页 > 编程语言 > Python开发

基于python和amap(高德地图)web api的爬虫,用于搜索某POI点

2016-12-27 09:06 666 查看
目的:

通过Python实现的爬虫技术,及高德地图提供的web api,来获取地图上的POI点及其相关信息

方法:

1.通过Python的urllib模块来处理网络请求和响应,向高德地图发送请求,并接收响应

2.高德地图的响应是xml文件,通过Python的DOM来解析xml,并保存POI数据

3.高德地图api使用说明,http://lbs.amap.com/api/webservice/reference/search/

4.强调一点,如果需要使用api,必须向高德地图申请一个key,然后用你申请到的Key替换掉url_amap变量中的YOURKEY

实现:

#this module use amap api to get all the factories of beichen discrit of tianjin
#thanks to amap company. please visit http://ditu.amap.com/ to get more information

#coding:utf-8

import urllib
import xml.dom.minidom as minidom
import string

file_name='result.txt'                 #write result to this file
url_amap=r'http://restapi.amap.com/v3/place/text?&keyword=&types=170300&city=120113&citylimit=true&&output=xml&offset=20&page=1&key=YOURKEY&extensions=base'
facility_type=r'types=170300'       #factory facilities
region=r'city=120113'               #beichen of tianjin
each_page_rec=20                    #results that displays in one page
which_pach=r'page=1'                #display which page
xml_file='tmp.xml'                  #xml filen name

#write logs
def log2file(file_handle,text_info):
file_handle.write(text_info)

#get html by url and save the data to xml file
def getHtml(url):
page = urllib.urlopen(url)
html = page.read()

try:
#open xml file and save data to it
with open(xml_file,'w') as xml_file_handle:
xml_file_handle.write(html)
except IOError as err:
print "IO error: "+str(err)
return -1

return 0

#phrase data from xml
def parseXML():
total_rec=1                      #record number

#open xml file and get data record
try:
with open(file_name,'a') as file_handle:
dom = minidom.parse(xml_file)
root = dom.getElementsByTagName("response") #The function getElementsByTagName returns NodeList.

for node in root:
total_rec=node.getElementsByTagName('count')[0].childNodes[0].nodeValue

pois = node.getElementsByTagName("pois")
for poi in pois[0].getElementsByTagName('poi'):
name=poi.getElementsByTagName("name")[0].childNodes[0].nodeValue
location=poi.getElementsByTagName("location")[0].childNodes[0].nodeValue
text_info=''+name+','+location+'\n'
print text_info
#save data record
log2file(file_handle,text_info.encode('utf-8'))

except IOError as err:
print "IO error: "+str(err)

return total_rec
if __name__=='__main__':
if getHtml(url_amap)==0:
print 'parsing page 1 ... ...'
#parse the xml file and get the total record number
total_record_str=parseXML()

total_record=string.atoi(str(total_record_str))
if (total_record%each_page_rec)!=0:
page_number=total_record/each_page_rec+2
else:
page_number=total_record/each_page_rec+1

#retrive the other records
for each_page in range(2,page_number):
print 'parsing page '+str(each_page)+' ... ...'
url_amap=url_amap.replace('page='+str(each_page-1),'page='+str(each_page))
getHtml(url_amap)
parseXML()

else:
print 'error: fail to get xml from amap'
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: