您的位置:首页 > 编程语言 > Python开发

第01阶段-基础入门-02-Python 爬虫基础-21节=====12.认识正则表达式

2018-04-02 20:38 1186 查看
一.一个例子提取价格
 

import re

def re_demo():
txt="If have purchasemore than 100 sets ,the price of product A is $9.90.";
#解析数量和价格
#\d 表示数字
#+ 表示匹配1-9数字
# . 表示匹配所有字符
# * 表示匹配0-多个字符
# \$ 反斜杠表示转义 $符号不变
#\.? 表示匹配一个或者0个点
#\d* 表示匹配一个或者多个数字
m=re.search(r'(\d+).*\$(\d+\.?\d*)',txt)
print(m.groups())

if __name__=="__main__":
re_demo();


 
 
 
 
二.re模块介绍



1.search()与mach()的区别print(re.search("c",txt)) #搜索匹配的字符串
print(re.match("c",txt)) #从字符串开始开始匹配
print(re.match(".*c",txt)) #c字符前面有一个或者多个字符
结果<_sre.SRE_Match object; span=(2, 3), match='c'> 查询成功
None 失败
<_sre.SRE_Match object; span=(0, 3), match='abc'> 查询成功
2.split()def c():
txt="this is people";
print(re.split(r'\W',txt)) #\W 表示用非字符分割字符串

txt = "this & is || people";
print(re.split(r'\W', txt)) # \W 表示用非字符分割字符串结果['this', 'is', 'people']
['this', '', '', 'is', '', '', '', 'people'] 分割出文本,费字符没有啦3.findall()def c():
txt="this is people";
print(re.findall(r'\w+',txt)) #\w 表示用非字符分割字符串

txt = "If have purchase more than 100 sets ,the price of product A is $9.90";
print(re.findall(r'\d+\.?\d+', txt)) # \d+\.?\d+ 表示匹配xxxx.xxx
c()
结果['this', 'is', 'people']
['100', '9.90']
4.finditer()def c():
txt = "If have purchase more than 100 sets ,the price of product A is $9.90";
a=re.finditer(r"\d+\.?\d*",txt)
for i in a:
print(i.group())
c()结果100
9.90
5.sub (替换)def c():
txt = "If have purchase more than 100 sets ,the price of product A is $9.90";
a=re.sub(r"\d+\.?\d*","999",txt)
print(a)

c()结果If have purchase more than 999 sets ,the price of product A is $999

 
 
 
 
 
 
 
 
 
 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: