Python 中正则表达式的使用浅析
2016-09-29 11:39
253 查看
Python 学习,请参考这个网站:
https://pythonprogramming.net/
很多专题,每个都有视频,我觉得讲得不错。
Python 中的 正则表达式 (Regular Expression)的模块是: re
正则表达式中常见的各种rule:
Identifiers:
\d = any number
\D = anything but a number
\s = space
\S = anything but a space
\w = any letter
\W = anything but a letter
. = any character, except for a new line
\b = space around whole words
\. = period. must use backslash, because . normally means any character.
Modifiers:
{1,3} = for digits, u expect 1-3 counts of digits, or "places"
+ = match 1 or more
? = match 0 or 1 repetitions.
* = match 0 or MORE repetitions
$ = matches at the end of string
^ = matches start of a string
| = matches either/or. Example x|y = will match either x or y
[] = range, or "variance"
{x} = expect to see this amount of the preceding code.
{x,y} = expect to see this x-y amounts of the precedng code
White Space Charts:
\n = new line
\s = space
\t = tab
\e = escape
\f = form feed
\r = carriage return
Characters to REMEMBER TO ESCAPE IF USED!
. + * ? [ ] $ ^ ( ) { } | \
Brackets:
[] = quant[ia]tative = will find either quantitative, or quantatative.
[a-z] = return any lowercase letter a-z
[1-5a-qA-Z] = return all numbers 1-5, lowercase letters a-q and uppercase A-Z
举例说明:
import re
exampleString = '''
Jessica is 15 years old, and Daniel is 27 years old.
Edward is 97 years old, and his grandfather, Oscar, is 102.
'''
ages = re.findall(r'\d{1,3}', exampleString)
names = re.findall(r'[A-Z][a-z]*', exampleString)
print(ages)
#print is:['15', '27', '97', '102']
print(names)
#print is:['Jessica', 'Daniel', 'Edward', 'Oscar']
ageDict={}
x=0
for eachName in names:
ageDict[eachNmae] = ages[x]
x+=1
print(ageDict)
#print is: {'Jessica': '15', 'Oscar': '102', 'Edward': '97', 'Daniel': '27'}
上面的例子中只用到了re.findall () 这一个函数,re模块还有很多其他的函数。
re.findall() 返回的是 列表。
举例2:
用到了re.sub() 函数。
re.sub() 用来实现通过正则表达式,实现比普通字符串的replace更加强大的替换功能;
如果输入字符串是:
而你是想把123和456,都换成222
就需要借助于re.sub,通过正则表达式,来实现这种相对复杂的字符串的替换:
当然,实际情况中,会有比这个例子更加复杂的,其他各种特殊情况,就只能通过此re.sub去实现如此复杂的替换的功能了。
所以,re.sub的功能就是:
对于输入的一个字符串,利用正则表达式(的强大的字符串处理功能),去实现(相对复杂的)字符串替换处理,然后返回被替换后的字符串
其中re.sub还支持各种参数,比如count指定要替换的个数等等。
https://pythonprogramming.net/
很多专题,每个都有视频,我觉得讲得不错。
Python 中的 正则表达式 (Regular Expression)的模块是: re
正则表达式中常见的各种rule:
Identifiers:
\d = any number
\D = anything but a number
\s = space
\S = anything but a space
\w = any letter
\W = anything but a letter
. = any character, except for a new line
\b = space around whole words
\. = period. must use backslash, because . normally means any character.
Modifiers:
{1,3} = for digits, u expect 1-3 counts of digits, or "places"
+ = match 1 or more
? = match 0 or 1 repetitions.
* = match 0 or MORE repetitions
$ = matches at the end of string
^ = matches start of a string
| = matches either/or. Example x|y = will match either x or y
[] = range, or "variance"
{x} = expect to see this amount of the preceding code.
{x,y} = expect to see this x-y amounts of the precedng code
White Space Charts:
\n = new line
\s = space
\t = tab
\e = escape
\f = form feed
\r = carriage return
Characters to REMEMBER TO ESCAPE IF USED!
. + * ? [ ] $ ^ ( ) { } | \
Brackets:
[] = quant[ia]tative = will find either quantitative, or quantatative.
[a-z] = return any lowercase letter a-z
[1-5a-qA-Z] = return all numbers 1-5, lowercase letters a-q and uppercase A-Z
举例说明:
import re
exampleString = '''
Jessica is 15 years old, and Daniel is 27 years old.
Edward is 97 years old, and his grandfather, Oscar, is 102.
'''
ages = re.findall(r'\d{1,3}', exampleString)
names = re.findall(r'[A-Z][a-z]*', exampleString)
print(ages)
#print is:['15', '27', '97', '102']
print(names)
#print is:['Jessica', 'Daniel', 'Edward', 'Oscar']
ageDict={}
x=0
for eachName in names:
ageDict[eachNmae] = ages[x]
x+=1
print(ageDict)
#print is: {'Jessica': '15', 'Oscar': '102', 'Edward': '97', 'Daniel': '27'}
上面的例子中只用到了re.findall () 这一个函数,re模块还有很多其他的函数。
re.findall() 返回的是 列表。
举例2:
用到了re.sub() 函数。
re.sub() 用来实现通过正则表达式,实现比普通字符串的replace更加强大的替换功能;
如果输入字符串是:
就需要借助于re.sub,通过正则表达式,来实现这种相对复杂的字符串的替换:
所以,re.sub的功能就是:
对于输入的一个字符串,利用正则表达式(的强大的字符串处理功能),去实现(相对复杂的)字符串替换处理,然后返回被替换后的字符串
其中re.sub还支持各种参数,比如count指定要替换的个数等等。
相关文章推荐
- python中使用正则表达式替换
- 使用Python正则表达式从文章中取出所有图片路径
- Python中使用正则表达式
- PHP正则表达式字符集的使用浅析
- Python正则表达式操作指南(re使用)(转)
- 使用python和正则表达式获取url,及总结
- python 正则表达式的使用
- [转]使用python的正则表达式做词法分析器
- 在Python中使用正则表达式同时匹配邮箱和电话并进行简单的分类
- PYTHON正则表达式 re模块使用说明
- Python 正则表达式 RE模块的使用方法
- Python使用正则表达式替换源码前序号
- 看看如何在python中使用正则表达式(-)
- 【语言处理与Python】3.4使用正则表达式检测词组搭配
- PYTHON正则表达式 re模块使用说明
- python使用带汉字的正则表达式
- 比较详细Python正则表达式操作指南(re使用)
- 在Python中使用正则表达式同时匹配邮箱和电话并进行简单的分类
- python正则表达式介绍及使用方法
- Python天天美味(15) - Python正则表达式操作指南(re使用)(转)