您的位置:首页 > 编程语言 > Python开发

python正则表达式提取文本中的电话号码和邮箱

2017-12-04 16:54 561 查看
代码:

#! python3

import pyperclip,re

phoneregex = re.compile(r'''
(\d{3}|\(\d{3}\))?              # area code
(\s|-|\.)?                       # separator
(\d{3})                          # first 3 digits
(\s|-|\.)                        # separator
(\d{4})                          # last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
''', re.VERBOSE)

emailregex = re.compile(r'''(
[a-zA-Z0-9._%+-]+               #username
@                               #@symbol
[a-zA-Z0-9.-]+                  #domain name
(\.[a-zA-Z]{2,4})               #dot-something
)''',re.VERBOSE)

text = str(pyperclip.paste())
matches=[]
print(phoneregex.findall(text))
for groups in phoneregex.findall(text):
print(groups)
phonenum='-'.join([groups[0],groups[2],groups[4]])
if groups[7] !='':
phonenum+=' x'+groups[7]
matches.append(phonenum)
for groups in emailregex.findall(text):
matches.append(groups[0])

if len(matches)>0:
pyperclip.copy('\n'.join(matches))
print('copied to clipbpard:')
print('\n'.join(matches))
else:
print('no phone numbers or eamil addresses found.')

输出:

[('800', '-', '420', '-', '7240', '', '', ''), ('415', '-', '863', '-', '9900', '', '', ''), ('415', '-', '863', '-', '9950', '', '', '')]

('800', '-', '420', '-', '7240', '', '', '')

('415', '-', '863', '-', '9900', '', '', '')

('415', '-', '863', '-', '9950', '', '', '')

copied to clipbpard:

800-420-7240

415-863-9900

415-863-9950

info@nostarch.com

media@nostarch.com

academic@nostarch.com

info@nostarch.com

说明:

书中r'''之后有个括号,所以findall会先返还整个匹配成功对象,后面的大括号同理,extension部分先返回整个括号匹配的,在返回两个小括号匹配的
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: