您的位置:首页 > 编程语言 > Python开发

python 4-1-2 正则表达式一张图清晰归纳和实现细节

2017-01-18 08:50 405 查看
python 正则表达式一张图清晰归纳和实现细节

正则表达式一定要和Linux Shell 通配符需要分开,不然会很混乱的

Linux shell通配符 * ? [a-z] {“a”,”x”} [!a-z] \



-- coding: utf-8 --

import re

from string import ljust

1.1 re.match首先是从字符串的头开始匹配的,如果匹配不到,就失败了,如果匹配不到,访问m.group(),AttributeError:

regx = re.compile("abc")
m11 = re.match(regx,"abcdefg")
print "m11 match is ",m11.group()


1.2 . 匹配除\n之外所有字符(.|\n)

m12 = re.match("ab(.|\n)cd","ab\ncd")
print "m12 match is ", m12.group()


1.3 \ 转义字符 ,可以用* []

m13 = re.match('\*',"*abc")
m13 = re.match('[*]',"*abc")
print "m13 match is ",m13.group()


1.4 […]字符集 匹配字符中任意一个字符,[^abc]内的^代表不包含[]内的字符集

m14 = re.match("[^abcdef]","zabcxyz")
print "m14 match is ",m14.group()


2.1 预定义字符 \d 等价于 [0-9]

m21 = re.match("\d","321a")

print "m21 match is ",m21.group()


2.2 预定义字符\D 等价于[^0-9]

m22 = re.match("\D","a321")
print "m22 match is ",m22.group()


2.3 \s 空白字符 等价于[\t\r\n\f\v]

m23 = re.match("\s","    abc")
print "m23 match is ",m23.group()


2.4 \S 非空白字符 等价于[^\s]

m24 = re.match("\S","abc")
print "m24 match is ",m24.group()


2.5 单词字符 \w [A-Za-z0-9]

m25 = re.match("\w","abc")
print "m25 match is ",m25.group()


2.6 \W 等价于 [^\w]

m26 = re.match("\W","?abc")
print "m26 match is ",m26.group()


3.1 * 匹配前一个字符0次或者多次,linux shell 通配符*代表任意字符并且数量不限

m31 = re.match("ab*","abbbbbbbbbbc")
print "m31.match is ",m31.group()
m311 = re.match("ab*","a")
print "m311 match is ",m311.group()


3.2 + 匹配前一个字符一次或者多次

m32 = re.match("ab+","abbbbbbbbbbc")
print "m32 match is ",m32.group()

m321 = re.match("ab+","ab")
print "m321 match is ",m321.group()


3.3 ?匹配前一个字符0次或者1次

m33 = re.match("ab?","a")
print "m33 match is ",m33.group()


3.4 {m} 匹配前一个字符m次

m34 = re.match("a{5}","aaaaab")
print "m34 match is ",m34.group()


3.5 {m,n}匹配前一个字符每m次到n次至少m次,至多n次,m < n

m35 = re.match("a{3,5}","aaabbb")
print "m35 match is ",m35.group()


4.1 ^匹配^后面一个字符开头的字符串

m41 = re.match("^abc","abcdef")
print "m41 is ",m41.group()


4.2 匹配以前面一个字符结尾的字符串

m42 = re.search(r"c$","2abc")
print "m42 match is ",m42.group()


4.3 \A 匹配一后面一个字符开头的字符串

m43 = re.match("\Aa","abc")
print "m43 match is ",m43.group()


4.4 \Z以前一个字符结束的字符串

m44 = re.search(r"c\Z","abc")
print "m44 match is ",m44.group()


4.5 \b 匹配前面一个字符\w且后面一个字符\W的字符串

m45 = re.match(r"a\\bc","a?c")
#print "m45 is ",m45.group()


4.6 [^\b]

5.1 | 匹配|左右两边任意一串字符串

m51 = re.match("abc|abd","abdxyz")
print "m51 match is ",m51.group()


5.2 () 作为分组匹配

m52 = re.match("(abc)","abcxabc")
print "m52 match is ",m52.group()


5.3 (?P )分组,除原有编号外指定一个名为name的别名

m53 = re.match("(?P<name>123)","123")
print "m53 match is ",m53.group()


5.4 \ r”(abc)-(\1)” 将编号为number的分组匹配到字符串

m54 = re.match("(abc)-\\1","abc-abc")
print "m54 match is ",m54.group()
m541 = re.match(r"(abc)-(\1)","abc-abc")
print "m541 match is ",m541.group()


5.5 (?P)(?P=name)分组 将别名为name的分组匹配到字符串

m55 = re.match("(?P<name>abc)-(?P=name)","abc-abc")
print "m55 match is ",m55.group()


6.1 (?#..) #后面的作为注释,

m61 = re.match("asb(?#iambaby)123","asb123")
print "m61 match is ",m61.group()


6.2 (?= ..) 前一个字符等于后一个字符才能匹配

6.3 (?!…)

6.4 (?<= …)

6.5 (?

6.6 (?(id/name)yes-pattern/no-pattern)

7.1 贪婪模式 m71.group(1) 打印出来是6-789-123 本来期望123456-789-123,是因为.+ 匹配了大部分数字

m71 = re.match(".+(\d+-\d+-\d+)","abcedfasdfa;lasdfjasdfkasdf::123456-789-123")
print "m71 match is ",m71.group(1)


m7.2 非贪婪模式 .+? *? ?? {m,n}? 非贪婪模式就变成了非贪婪模式打印出来是123456-789-123

m72 = re.match(".+?(\d+-\d+-\d+)","abcedfasdfa;lasdfjasdfkasdf::123456-789-123")
print "m72 matchis is ",m72.group(1)


8.1 返回pattern对象

p = re.compile("abc")
m81 = re.match(p, "abc")
print "m81 match is ",m81.group()


8.2 re.match(pattern,string,flags)

m82 = re.match("(ben1949)-(\d{4}-\d{2}-\d{2})-(\\1)","ben1949-2017-01-18-ben1949xxxyyywww")
print "m82 match is ",m82.group(1),m82.group(2)


8.3 re.search(pattern,string,flags)

m83 = re.search("(ben1949)-(\d{4}-\d{2}-\d{2})-(\\1)","aaaaaaben1949-2017-01-18-ben1949xxxyyywwwaaaaaa")
print "m83 match is ",m83.group(1),m83.group(2)


8.4 re.split(pattern,string)

str1 = "ben1949-2017-01-18-ben1949xxxyyywww"
m84 = re.split("-",str1)
print "m84 is ",m84


8.5 re.findall(pattern,string,flags)

m85 = re.findall("-", str1)
print "m85 is ",m85


8.6 re.finditer(pattern,string,flags) 返回的是迭代器,sre.SRE_Match object at 0x0251F480

m86 = re.finditer("-", str1)

for m in m86:
print "m86 is ",m.group()


8.7 re.sub(pattern,repl,string),将书籍卖价都提高2.02

def func(m):
#print "func was called"
price = float(m.group(2))
price += 2.02
return "%s%s"%(ljust(m.group(1).strip(),10),price)
str3 = ["english  100.0","china       120.0"]
m87 = []

for i in xrange(2):
m87.append(re.sub(r"(\w+\s+)(\d+\.?\d?)",func,str3[i]))

for i in xrange(2):
print "m87 is %s"%(m87[i])


8.8 re.subn(pattern,repl,string)

m88 = []
for i in xrange(2):
m88.append(re.sub(r"(\w+\s+)(\d+\.?\d?)",func,str3[i]))

for i in xrange(2):
print "m88 is %s"%(m88[i])

#8.9 (?P<name>...) "(?P<year>\d{4})-(?P<month>\d{2})-(?P<date>\d{2})","\g<date>-\g<month>-\g<year>","2017-01-18"

print re.sub("(?P<year>\d{4})-(?P<month>\d{2})-(?P<date>\d{2})","\g<date>-\g<month>-\g<year>","2017-01-18")

print re.sub("(?P<year>\d{4})-(?P<month>\d{2})-(?P<date>\d{2})","\g<date>-\g<month>-\g<year>","2017-01-18")


8.9 (?P…) “(?P\d{4})-(?P\d{2})-(?P\d{2})”,”\g-\g-\g”,”2017-01-18”

print re.sub("(?P<year>\d{4})-(?P<month>\d{2})-(?P<date>\d{2})","\g<date>-\g<month>-\g<year>","2017-01-18")

print re.sub("(\d{4})-(\d{2})-(\d{2})", r"\2/\3/\1", "2017-01-18")
\2因为存在转义字符,因此我们需要用r 表示原始字符,避免使用了转义后的字符
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: