您的位置:首页 > 编程语言 > Python开发

关于Python中,re.sub(pattern, repl, string, count=0, flags=0)方法的个人理解

2015-08-27 12:01 1836 查看
在Python中,re模块用来实现正则表达式功能。

pattern: 是re.compile()方法生成Pattern类型,也就是索要匹配的模式。

repl : 可以是一段字符串,或者是一个方法

string: 需要被匹配和替换的原字符串

count: 指的是最大的可以被替换的匹配到的字符串的个数,默认为0,就是所有匹配到的字符串。

flags : 标志位

其中repl比较特殊:

当repl为字符串的时候,也就是需要 将string中与pattern匹配的字符串都替换成repl

当repl为方法的时候,就必须是一个带有一个参数,且参数为MatchObject类型的方法,该方法需要返回一个字符串。

例子:

__author__ = 'zhoujinyu'

import fileinput,re

field_pat = re.compile(r'\[(.+?)\]')
scope = {}

def replacement(match):
code = match.group(1)
try:
return str(eval(code,scope))
except SyntaxError:
exec code in scope
return ''
lines = []
for line in fileinput.input():
lines.append(line)
text=''.join(lines)

print field_pat.sub(replacement,text)


该例子的最后一句 print field_pat.sub(replacement,text)

中的replacement就是replacement(match)函数,其中match就是一个MatchObject对象。

由于sub()方法是一个循环方法,也就是会逐个找出text(string)中与field_pat(pattern)中匹配的字符串并将其替换成replacement(repl)返回的字符串。

所以每次进行匹配查找的时候都会运行一次replacement(match)函数,而每次运行这个函数的时候,match.group(1)这个方法会找到和field_pat(pattern)中第一个括号匹配的字符,由于sub方法是循环不重复进行的(下一次会从上一次匹配到的字符串之后开始匹配),所以在第二次匹配过程中

并不会重复上次一匹配到的字符串,会自动找到下一个可以匹配到的字符串。

经过多次循环,最终会找到所有匹配的字符串进行替换。

官方文档内容:

re.sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes such as \j are left alone. Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. For example:

>

re.sub(r’def\s+([a-zA-Z_][a-zA-Z_0-9])\s(\s*):’,

… r’static PyObject*\npy_\1(void)\n{‘,

… ‘def myfunc():’)

‘static PyObject*\npy_myfunc(void)\n{’

If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example:

>

def dashrepl(matchobj):

… if matchobj.group(0) == ‘-‘: return ’ ’

… else: return ‘-’

re.sub(‘-{1,2}’, dashrepl, ‘pro—-gram-files’)

‘pro–gram files’

re.sub(r’\sAND\s’, ’ & ‘, ‘Baked Beans And Spam’, flags=re.IGNORECASE)

‘Baked Beans & Spam’

The pattern may be a string or an RE object.

The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous match, so sub(‘x*’, ‘-‘, ‘abc’) returns ‘-a-b-c-‘.

In string-type repl arguments, in addition to the character escapes and backreferences described above, \g will use the substring matched by the group named name, as defined by the (?P…) syntax. \g uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character ‘0’. The backreference \g<0> substitutes in the entire substring matched by the RE.

Changed in version 2.7: Added the optional flags argument.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python 正则表达式