python之扫描本地信息到Solr服务
2015-11-19 00:18
489 查看
##scan_to_solr.py
@@用法说明
scan_to_solr.py
扫描本地文件提交到Solr的模糊检索服务中
#-*-coding:utf-8-*- import urllib import os,sys,re import urllib2,httplib import threading import time urlMap = {} gEnd = False gExit = False gOK = 0 gErr = 0 reload(sys) #sys.setdefaultencoding('utf-8') sys.setdefaultencoding('gbk') gCount = 0 gErrorList = [] gErrNumber = 0 def getUrl(url): try: return urllib2.urlopen(url,timeout=240).read() except: print "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXload error~" return None def addIndex(key,value): url = u'http://192.168.4.60/baidu/src/addServer.php?ch=%s&jp=%s' % (key.decode("gbk"),value) ret = getUrl(url.encode("utf-8")) if ret != None: return True return False def getList(dirname,pFunc): try: ls=os.listdir(dirname) except: print dirname,'is access deny' else: for file in ls: temp = os.path.join(dirname,file) if(os.path.isdir(temp)): getList(temp,pFunc) else: pFunc(dirname,file) """ callBack Function """ def doAddFile(dir,file): global gCount global urlMap,gOK,gErr,gErrNumber,gErrorList if file == "entries" or file == "toIndex.py" or file == "index.txt": return if re.search("(.ppt|.pptx|.xls|.txt|.doc|.vsd|.pdf|.rar|.xlsx|.zip|.docx|.tar|.gz|.iso|.rpm|.sql)$",file): gCount = gCount + 1 dir = dir.replace("\\","/") svnUrl = 'svn://192.168.0.248:9997/DOC/' + u'软件部/' anchor = "%s%s" % (svnUrl,dir[11:].decode("gbk")) print file file = file.replace(" ","") print "###############" print file print "###############(%d)" % (gCount) ret = addIndex(file,anchor) if ret == True: gOK = gOK + 1 print "OOO KKK(%d)\n" % (gOK) time.sleep(1) else: gErr = gErr + 1 print "NNN GGG(%d)\n" % (gErr) addIndex(key,value) """ Main Inport """ print "\n============================== Begin ScanDir ========================================" getList(os.getcwd(),doAddFile) print "ScanFile(%d)" % (gCount) print "\n========================================================================================" print "OK(%d) ,Error(%d)" % (gOK, gErr)
@@用法说明
scan_to_solr.py
扫描本地文件提交到Solr的模糊检索服务中
相关文章推荐
- python之自动生成table网页(行列可指定)
- python之读取Excel数据
- 使用PyCharm配合部署Python的Django框架的配置纪实
- Qpython
- 利用Psyco让Python程序执行更快
- Python IDLE快捷键一览
- Python List+Tuple+Dict+Set小结
- python之自定义爬虫脚本
- python之自动生成图像列表
- python之自动化生成解析ini文件的Qt类
- python之定时执行截屏
- Python 学习LINK
- leetcode Roman to Integer python
- python之生成文件列表(链接方式)
- python之批量重命名目录文件
- 练习PYTHON之EPOLL
- python之转换源码后缀名为txt后缀名
- python urllib模块
- leetcode Integer to Roman python
- python之模块hashlib(提供了常见的摘要算法,如MD5,SHA1等等)