您的位置:首页 > 编程语言 > Python开发

[Python]re模块的核心函数和方法

2014-07-22 20:52 639 查看
re模块常用的函数和方法

Function/Method

Description
re
Module Function Only
compile(pattern, flags=0)
Compile REpattern with any optionalflags and return a regex object
re
Module Functions and regex Object Methods
match(pattern, string, flags=0)
Attempt to match REpattern tostring with optionalflags;
return match object on success,None on failure
search(pattern, string, flags=0)
Search for first occurrence of REpattern withinstring with optionalflags;
return match object on success,None on failure
findall(pattern, string[,flags])
Look for all (non-overlapping) occurrences ofpattern instring; return a list of matches
finditer(pattern, string[, flags])
Same asfindall() except returns an iterator instead of a list; for each match, the iterator returns a match object
split(pattern,
string, max=0)
Splitstring into a list according to REpattern delimiter and return list of successful matches, splitting
at mostmax times (split all occurrences is the default)
sub(pattern, repl, string, max=0)
Replace all occurrences of the REpattern instring withrepl,
substituting all occurrences unlessmax provided (also seesubn() which, in addition, returns the number of substitutions made)
Match Object Methods
group(num=0)
Return entire match (or specific subgroupnum)
groups()
Return all matching subgroups in a tuple (empty if there weren't any)
1,使用compile()编译正则表达式
大多数re模块函数都可以作为regex对象的方法。建议对模式进行预编译。

2,匹配对象和group()、groups()方法

match()、和search()被成功调用后返回一种对象类型-匹配对象。匹配对象有两个主要方法:group()和groups()。group()方法或者返回所有匹配对象或是根据要求返回某个特定自组。groups()则很简单,它返回一个包含唯一或所有子组的元组。如果正则表达式中没有子组的话,groups()将返回一个空元组,而group()仍会返回全部匹配对象。

3,用match()匹配字符串

match()函数尝试从字符串的开头开始对模式进行匹配。如果匹配成功,就返回一个匹配对象,而如果匹配失败了,就返回None。

>>> m = re.match('foo','foo')
>>> if m is not None:
m.group()

'foo'
>>> m
<_sre.SRE_Match object at 0x0138F170>
>>> m = re.match('foo','bar')
>>> if m is not None: m.group()

>>> m = re.match('foo','food on the table')
>>> m.group()
'foo'
>>> re.match('foo','food on the table').group()
'foo'
4,search()在一个字符串中查找一个模式(搜索与匹配的比较)

search()和match()的工作一样,不同之处在于search()会检查参数字符串任意位置的地方给定正则表达式的匹配情况。如果匹配成功,则会返回一个匹配对象,否则返回None。

>>> m = re.match('foo','seafood')
>>> if m is not None: m.group()

>>> m = re.search('foo','seafood')
>>> if m is not None: m.group()

'foo'
5,匹配多个字符串(|)

>>> bt = 'bat|bet|bit'
>>> m = re.match(bt,'bat')
>>> if m is not None: m.group()

'bat'
>>> m = re.match(bt,'blt')
>>> if m is not None: m.group()

>>> m = re.match(bt,'He bit me!')
>>> if m is not None: m.group()

>>> m = re.search(bt,'He bit me!')
>>> if m is not None: m.group()

'bit'
6,匹配任意单个字符(.)

句点是不能匹配换行符或非字符(即空字符串)。

>>> anyend = '.end'
>>> m = re.match(anyend,'bend')
>>> if m is not None: m.group()

'bend'
>>> m = re.match(anyend,'end')
>>> if m is not None: m.group()

>>> m = re.match(anyend,'\nend')
>>> if m is not None: m.group()

>>> m = re.search('.end','The end.')
>>> if m is not None: m.group()

' end'
>>> patt314 = '3.14'
>>> pi_patt = '3\.14'
>>> m = re.match(pi_patt,'3.14')
>>> if m is not None: m.group()

'3.14'
>>> m = re.match(patt314,'3014')
>>> if m is not None: m.group()

'3014'
>>> m = re.match(patt314,'3.14')
>>> if m is not None: m.group()

'3.14'


7,创建字符集合([])

>>> m = re.match('[cr][23][dp][o2]','c3po')
>>> if m is not None: m.group()

'c3po'
>>> m = re.match('[cr][23][dp][o2]','c2do')
>>> if m is not None: m.group()

'c2do'
>>> m = re.match('r2d2|c3po','c2do')
>>> if m is not None: m.group()

>>> m = re.match('r2d2|c3po','r2d2')
>>> if m is not None: m.group()

'r2d2'
8,重复、特殊字符和子组

>>> patt = '\w+@(\w+\.)?\w+\.com'
>>> re.match(patt,'nobody@xxx.com').group()
'nobody@xxx.com'
>>> re.match(patt,'nobody@www.xxx.com').group()
'nobody@www.xxx.com'
>>> patt = '\w+@(\w+\.)*\w+\.com'
>>> re.match(patt,'nobody@www.xxx.yyy.zzz.com').group()
'nobody@www.xxx.yyy.zzz.com'
>>> m = re.match('\w\w\w-\d\d\d','abc-123')
>>> if m is not None: m.group()

'abc-123'
>>> m = re.match('\w\w\w-\d\d\d','abc-xyz')
>>> if m is not None: m.group()

>>> m = re.match('(\w\w\w)-(\d\d\d)','abc-123')
>>> m.group()
'abc-123'
>>> m.group(1)
'abc'
>>> m.group(2)
'123'
>>> m.groups()
('abc', '123')
>>> m = re.match('ab','ab') # 无子组
>>> m.group()
'ab'
>>> m.groups()
()
>>> m = re.match('(ab)','ab')
>>> m.group()
'ab'
>>> m.groups(1)
('ab',)
>>> m.groups()
('ab',)
>>> m = re.match('(a)(b)','ab')
>>> m.group()
'ab'
>>> m.group(1)
'a'
>>> m.group(2)
'b'
>>> m.groups()
('a', 'b')
>>> m = re.match('(a(b))','ab')
>>> m.group()
'ab'
>>> m.group(1)
'ab'
>>> m.group(2)
'b'
>>> m.groups()
('ab', 'b')
9,从字符串的开头或结尾匹配及在单词边界上的匹配

>>> m = re.search('^The','The end.')
>>> if m is not None: m.group()

'The'

>>> m = re.search('^The','end. The')
>>> if m is not None: m.group()

>>> m = re.search(r'\bthe','bitethe dog')
>>> if m is not None: m.group()

>>> m = re.search(r'\bthe','bite the dog')
>>> if m is not None: m.group()

'the'
>>> m = re.search(r'\Bthe','bitethe dog')
>>> if m is not None: m.group()

'the'
10,用findall()找到每个出现的匹配部分

非重叠地搜索某个字符串中一个正则表达式模式出现的情况。findall()和search()相似之处在于二者都执行字符串搜索,但findall()和match()与search()不同之处是,findall()总返回一个列表。如果findall()没有找到匹配部分,会返回空列表;如果成功找到匹配部分,则返回所有匹配部分的列表。

>>> re.findall('car','car')
['car']
>>> re.findall('car','scary')
['car']
>>> re.findall('car','carry the barcardi to the car')
['car', 'car', 'car']
11,用sub()和subn()进行搜索和替换

>>> re.sub('X','Mr.Smith','attn: X\n\nDear X,\n')
'attn: Mr.Smith\n\nDear Mr.Smith,\n'
>>> re.subn('X','Mr.Smith','attn: X\n\nDear X,\n')
('attn: Mr.Smith\n\nDear Mr.Smith,\n', 2)
>>> print re.sub('X','Mr.Smith','attn: X\n\nDear X,\n')
attn: Mr.Smith

Dear Mr.Smith,

>>> re.sub('[ae]','X','abcdef')
'XbcdXf'
>>> re.subn('[ae]','X','abcdef')
('XbcdXf', 2)
12,用split()分割(分割模式)

>>> re.split(':','str1:str2:str3')
['str1', 'str2', 'str3']


正则表达式练习的数据生成代码

from random  import randint ,choice
from string import lowercase
from sys import maxint
from time import ctime
doms = ('com','deu','net','org','gov')

for i in range(randint(5,10)):
dtint = randint(0, maxint - 1) #date
dtstr = ctime(dtint)
shorter = randint(4,7) #login shorter

em = ''
for j in range(shorter):     #generate login
em += choice(lowercase)

longer = randint(shorter,12) # domain longer

dn = ''
for j in range(longer):
dn += choice(lowercase)

print '%s::%s@%s.%s::%d-%d-%d'%(dtstr,em,dn,choice(doms),dtint,shorter,longer)
Wed Sep 10 01:27:06 2025::ivepup@lyduwbnwec.deu::1757438826-6-10

Thu Mar 24 10:37:16 2011::hvvyogn@wtplvnkuocfh.net::1300934236-7-12

Sun Sep 14 18:00:06 2036::uebmhs@vmcjmxjpqiul.org::2104999206-6-12

Thu Mar 28 10:46:48 1985::phsdtd@srgwdpovndy.deu::480826008-6-11

Wed Dec 12 23:31:36 2012::xhfd@qgtrtgkfja.com::1355326296-4-10

Mon Mar 14 10:20:36 2011::uynyvfm@xiimpgwkmw.gov::1300069236-7-10

Mon Oct 11 01:42:40 1976::ehqt@ntxfyu.gov::213817360-4-6

Wed Feb 27 12:07:46 1985::zwcqrlu@zyifcxsleb.com::478325266-7-10

Sun Sep 13 11:27:21 1970::mtfn@umbsfsrmrue.deu::22044441-4-11

Sun Sep 04 16:13:58 2005::qhwz@mvbgvpe.net::1125821638-4-7

REF:Core Python Programming
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: