awk 的怪异现象以及其解决之道:正则表达式范围与语言环境的悲伤故事
2013-10-30 18:37
302 查看
[root@centos ~]# echo abcABC | /bin/gawk '{gsub(/([a-z])/, "x"); print $0}' xxxxxx [root@centos ~]# echo abcABC | /bin/gawk '{gsub(/([[:lower:]])/, "x"); print $0}' xxxABC
问题1:神奇了,[a-z]不能表示小写了! 咋回事儿呢?
[root@centos ~]# /bin/gawk 'BEGIN{match("womasdRDfadfasKNd",/[A-Z][A-Z]+/);print RSTART,RLENGTH}' 1 3
问题2:输出怎么是1 3,按道理不应该是7 2么?
往下看:
[root@centos ~]# /bin/gawk --version GNU Awk 3.1.5 Copyright (C) 1989, 1991-2005 Free Software Foundation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. [root@centos ~]# [root@centos ~]# echo $LANG en_US.UTF-8 [root@centos ~]# echo abcABC | /bin/gawk '{gsub(/[a-z]/,"x");print $0}' xxxxxx [root@centos ~]# export LANG=C [root@centos ~]# echo abcABC | /bin/gawk '{gsub(/[a-z]/,"x");print $0}' xxxABC [root@centos ~]# [root@centos ~]#
以上是3.1.5版本的结果。
CU上经过了一番激烈的讨论,终于发现了问题之所在,原来以上问题原来是awk的正则表达式与语言环境的一个悲伤的故事导致的,请阅读:
http://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html
这个英文文档只看懂了个大概,里边儿说awk在4.0版本中已经解决了这个问题!于是我下载了新版本进行测试,果然OK了!
[root@centos ~]# /usr/local/bin/gawk --version GNU Awk 4.1.0, API: 1.0 Copyright (C) 1989, 1991-2013 Free Software Foundation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/. [root@centos ~]# [root@centos ~]# echo $LANG en_US.UTF-8 [root@centos ~]# [root@centos ~]# /usr/local/bin/gawk '{gsub(/[a-z]/,"x");print $0}' [root@centos ~]# echo abcABC | /usr/local/bin/gawk '{gsub(/[a-z]/,"x");print $0}' xxxABC [root@centos ~]# [root@centos ~]# [root@centos ~]#
以上是4.1.0版本的,很显然是正确的了!
参考:
1. http://bbs.chinaunix.net/thread-4065242-1-1.html
2. http://bbs.chinaunix.net/thread-4103721-1-1.html
相关文章推荐
- C语言环境下正则表达式支持库
- 正则表达式不区分大小写以及解决思路的探索 .
- 正则表达式不区分大小写以及解决思路的探索 .
- Struts2中使用OGNL表达式语言访问静态方法和静态属性以及我遇到的问题和解决方法
- 正则表达式不区分大小写以及解决思路的探索
- 正则表达式不区分大小写以及解决思路的探索 .
- 正则表达式大小写匹配以及解决思路的探索
- Linux下正则表达式以及几种工具:grep,sed,awk,cut,sort,uniq的简单应用
- 正则表达式不区分大小写以及解决思路的探索 .
- Linux学习--第十一天--source、环境变量目录、欢迎信息、正则、cut、awk、sed、sort、判断表达式、if、for、case、一些脚本
- java正则表达式的基本语法以及不同环境的表单验证
- 黑马程序员_java语言_正则表达式以及Date类
- 【学神】 1-20正则表达式以及sed、awk的使用
- 正则表达式不区分大小写以及解决思路的探索 .
- 黑马程序员_学习笔记4 IO流以及正则表达式解决一个传智播客的问题
- 正则表达式中量词贪婪型和勉强型的讨论(Java语言描述)
- 正则表达式 查找以某些字符开始 某些字符结束的匹配项 解决之道
- 正则表达式之旅_sed_awk
- 正则表达式regexp_substr解决where in list问题(读书笔记之二)
- 正则表达式搜索以及匹配