[原创]C语言利用pcre正则表达式库
2015-07-02 14:26
459 查看
C语言使用正则表达式,可以利用pcre库,这个比较不错的哦。
在使用过程中,利用python进行测试正则表达式是否OK,后发现出现了问题。如下所示:
regex.c:11:18: warning: unknown escape sequence: '\/' [enabled by default]
char* url_re="(https?|ftp|mms):\/\/([A-z0-9]+[_\-]?[A-z0-9]?\.)*[A-z0-9]+\-?[A-z0-9]+\.[A-z]{2,}(\/.*)?";
^
regex.c:11:18: warning: unknown escape sequence: '\/' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\-' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\.' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\-' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\.' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\/' [enabled by default]
这到底怎么回事呢?利用Python执行结果是OK的呀。代码如下:
查询后,发现有可能是在C语言中,如果使用正则表达式,那么转移字符需要使用双份的,也即:"/"需要用“\\/”,对url_re做了调整后,再次测试发现编译告警消失,执行结果也是OK啦。
在使用过程中,利用python进行测试正则表达式是否OK,后发现出现了问题。如下所示:
regex.c:11:18: warning: unknown escape sequence: '\/' [enabled by default]
char* url_re="(https?|ftp|mms):\/\/([A-z0-9]+[_\-]?[A-z0-9]?\.)*[A-z0-9]+\-?[A-z0-9]+\.[A-z]{2,}(\/.*)?";
^
regex.c:11:18: warning: unknown escape sequence: '\/' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\-' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\.' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\-' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\.' [enabled by default]
regex.c:11:18: warning: unknown escape sequence: '\/' [enabled by default]
这到底怎么回事呢?利用Python执行结果是OK的呀。代码如下:
#!/usr/bin/env python # import re import sys import os #restr="(https?|ftp|mms):\/\/([A-z0-9]+[_\-]?[A-z0-9]+\.)*[A-z0-9]+\-?[A-z0-9]+\.[A-z]{2,}(\/.*)*\/?" def geturl(url=''): restr="(https?|ftp|mms):\/\/([A-z0-9]+[_\-]?[A-z0-9]?\.)*[A-z0-9]+\-?[A-z0-9]+\.[A-z]{2,}" pattern = re.compile(restr) match=re.search(pattern, url) if match: return match.group() ################# GetLine ############################ def dealUrl(fmtfile): i=0 file = open(fmtfile,'r') fo = open("tmp.txt",'w') while 1: line = file.readline() if not line: break newline=geturl(line) if(newline!=None): print(i, newline) fo.writelines(''.join([newline,'\n'])) i+=1 ################# Main ############################## if __name__=='__main__': if(len(sys.argv)<2): filename='url.info' else: filename=sys.argv[1] dealUrl(filename)
查询后,发现有可能是在C语言中,如果使用正则表达式,那么转移字符需要使用双份的,也即:"/"需要用“\\/”,对url_re做了调整后,再次测试发现编译告警消失,执行结果也是OK啦。
int filter(char* str,char* url) { pcre *re; const char* error; int erroffset; int ovector[RE_OVERCOUNT]; int rc; char* url_re="(https?|ftp|mms):\\/\\/([A-z0-9]+[_\\-]?[A-z0-9]?\\.)*[A-z0-9]+\\-?[A-z0-9]+\\.[A-z]{2,}"; if(str==NULL || url==NULL) return 0; printf("str: %s\n", str); re = pcre_compile(url_re, 0, &error, &erroffset, NULL); if(re == NULL){ printf("PCRE pcre_compile failed at offset %d: %s\n", erroffset, error); return 0; } char *p=str; if((rc=pcre_exec(re,NULL,p,strlen(p),0,0,ovector,RE_OVERCOUNT))!=PCRE_ERROR_NOMATCH){ char* url_start = p + ovector[0]; int urllen = ovector[1] - ovector[0]; strncpy(url, url_start, urllen); printf("urllen %d, url:%s\n", urllen, url); return urllen; } pcre_free(re); return 0; }
相关文章推荐
- LeetCode-House Robber II-解题报告
- C++11中的std::function
- 最小二乘法椭圆拟合
- LeetCode-Shortest Palindrome-解题报告
- LeetCode-Kth Smallest Element in a BST-解题报告
- C++—函数探幽
- C语言中main函数的参数
- Java调用C语言
- VC++ List Control 的具体用法实例
- 关于C++派生类中构造函数调用顺序的问题
- LeetCode-Best Time to Buy and Sell Stock IV -解题报告
- 第十七周oj刷题——Problem B: 分数类的四则运算【C++】
- C++服务编程
- LeetCode-Best Time to Buy and Sell Stock III -解题报告
- C语言中const的用法
- (7)风色从零单排《C++ Primer》 string
- 给出年、月、日,计算该日是该年的第几天。
- C++ 排序函数 sort(),qsort()的用法
- C++ STL中Map的按Key排序和按Value排序
- C++学习:** 多重指针