您的位置:首页 > 其它

关于String替换操作的一点笔记

2013-11-09 18:19 387 查看
最近项目需要抓取学校百合的一些热点信息,免不了频繁使用正则和String的一些替换操作,遇到了一些问题,值得小记一下。

下面是一个操作的片段

Pattern textareaContent = Pattern.compile("(?s)(<table)(.*?)<textarea.*?class=hide>(.*?)</textarea>");

Matcher contentMatcher = textareaContent.matcher(resultHTML);
		StringBuffer buff = new StringBuffer();
		while(contentMatcher.find()) {
			contentMatcher.appendReplacement(buff, contentMatcher.group(1) + " style='BORDER: 2px solid;BORDER-COLOR: D0F0C0;' "
					+ contentMatcher.group(2) + contentMatcher.group(3));
		}
		resultHTML = contentMatcher.appendTail(buff).toString();

由于抓取的内容可能还有‘$’,‘\\’等字符,在appendReplacement(StringBuffer,String replacement)中可能会导致错误,比如$在replace可以作为group的选择器。其实可以通过jdk的源码明确的看出appendRelacement的处理方式:

char nextChar = replacement.charAt(cursor);
            if (nextChar == '\\') {//当读到'\\'时直接跳过将nextChar压入buffer
                cursor++;
                nextChar = replacement.charAt(cursor);
                result.append(nextChar);
                cursor++;
            } else if (nextChar == '$') {//当读取到'$'时,根据nextChar不同处理不同
                // Skip past $跳过了'$'!!!!!
                cursor++;
                // A StringIndexOutOfBoundsException is thrown if
                // this "$" is the last character in replacement
                // string in current implementation, a IAE might be
                // more appropriate.
                nextChar = replacement.charAt(cursor);
                int refNum = -1;
                if (nextChar == '{') {
                    cursor++;//跳过'{'
                    StringBuilder gsb = new StringBuilder();
                    while (cursor < replacement.length()) {//将'{'后的字母和数字暂存
                        nextChar = replacement.charAt(cursor);
                        if (ASCII.isLower(nextChar) ||
                            ASCII.isUpper(nextChar) ||
                            ASCII.isDigit(nextChar)) {
                            gsb.append(nextChar);
                            cursor++;
                        } else {
                            break;
                        }
                    }
                    if (gsb.length() == 0)//如果buffer里没有就报错
                        throw new IllegalArgumentException(
                            "named capturing group has 0 length name");
                    if (nextChar != '}')
                        throw new IllegalArgumentException(
                            "named capturing group is missing trailing '}'");
                    String gname = gsb.toString();
                    if (ASCII.isDigit(gname.charAt(0)))//组名不可能以数字开头
                        throw new IllegalArgumentException(
                            "capturing group name {" + gname +
                            "} starts with digit character");
                    if (!parentPattern.namedGroups().containsKey(gname))//在pattern中查找组
                        throw new IllegalArgumentException(
                            "No group with name {" + gname + "}");
                    refNum = parentPattern.namedGroups().get(gname);
                    cursor++;
                } else {//如果不是上述情况那下一个char应当是字符

// The first number is always a group
                    refNum = (int)nextChar - '0';
                    if ((refNum < 0)||(refNum > 9))
                        throw new IllegalArgumentException(
                            "Illegal group reference");
                    cursor++;
                    // Capture the largest legal group string
                    boolean done = false;
                    while (!done) {
                        if (cursor >= replacement.length()) {
                            break;
                        }
                        int nextDigit = replacement.charAt(cursor) - '0';
                        if ((nextDigit < 0)||(nextDigit > 9)) { // not a number
                            break;
                        }
                        int newRefNum = (refNum * 10) + nextDigit;
                        if (groupCount() < newRefNum) {
                            done = true;
                        } else {
                            refNum = newRefNum;
                            cursor++;
                        }
                    }
                }
                // Append group
                if (start(refNum) != -1 && end(refNum) != -1)
                    result.append(text, start(refNum), end(refNum));
            }

处理的方法:Matcher.quoteReplacement()

if ((s.indexOf('\\') == -1) && (s.indexOf('$') == -1))
            return s;
        StringBuilder sb = new StringBuilder();
        for (int i=0; i<s.length(); i++) {
            char c = s.charAt(i);
            if (c == '\\' || c == '$') {
                sb.append('\\');
            }
            sb.append(c);
        }
        return sb.toString();


在特殊字前插入'\\'‘;

另外String.replace()

public String replace(CharSequence target, CharSequence replacement) {
        return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
                this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
    }


是通过Matcher.replaceAll来实现的。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐