您的位置:首页 > 编程语言 > Python开发

python 参议院文本预处理的一维数组的间隔空间

2015-10-23 12:55 525 查看
#!/usr/bin/python

import re

def pre_process_msg ( msgIn ):

if msgIn=="":

return "msgIn_Input_Error,should'nt Null, it is Strings"

else:

#1 trim

msg = msgIn

msg = msg.strip()

#2 process msg internal special char replace with “ ”

dst_replace_pattern1 = re.compile('\n')

msg = dst_replace_pattern1.sub(" ",msg)

dst_replace_pattern1 = re.compile('\r')

msg = dst_replace_pattern1.sub(" ",msg)

dst_replace_pattern1 = re.compile('\t')

msg = dst_replace_pattern1.sub(" ",msg)

#3 one or more space replaced with one space,to form srings with " " internal

result=""

result=re.sub(" {1,}", " ", msg)

msg=result.strip()

print "'"+msg+"'"

return msg
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: