您的位置:首页 > 编程语言 > Java开发

JDK学习之AbstractStringBuilder接口&&StringBuffer&&StringBuilder

2015-12-28 10:11 387 查看
今天暂时没事,看了码农网上一篇文章关于StringBuiler的StringBuffer,附上原文链接http://www.codeceo.com/article/java-stringbuilder-performance.html,下面是让我非常感激的内容:

初始长度好重要,值得说四次。

StringBuilder的内部有一个char[], 不断的append()就是不断的往char[]里填东西的过程。

new StringBuilder() 时char[]的默认长度是16,然后,如果要append第17个字符,怎么办?

用System.arraycopy成倍复制扩容!!!!

这样一来有数组拷贝的成本,二来原来的char[]也白白浪费了要被GC掉。可以想见,一个129字符长度的字符串,经过了16,32,64, 128四次的复制和丢弃,合共申请了496字符的数组,在高性能场景下,这几乎不能忍。

所以,合理设置一个初始值多重要。

一直以来只知道字符串拼接要用StringBuilder,不要用+,只知道最肤浅的内容,非常汗颜,决定今天开始慢慢看JDK源码,就从StringBuilder开始吧!

StringBuilder和StringBuffer都继承了AbstractStringBuilder,而AbstractStringBuilder是一个接口并且实现了Appendable和 CharSequence接口,下面先看看这两个接口:

package java.lang;

public interface CharSequence {
/**
* Returns the length of this character sequence.  The length is the number
*/
int length();

/**
* Returns the <code>char</code> value at the specified index.  An index ranges from zero
* to <tt>length() - 1</tt>.  The first <code>char</code> value of the sequence is at
* index zero, the next at index one, and so on, as for array
* indexing. </p>
*/
char charAt(int index);

/**
* Returns a new <code>CharSequence</code> that is a subsequence of this sequence.
* The subsequence starts with the <code>char</code> value at the specified index and
* ends with the <code>char</code> value at index <tt>end - 1</tt>.  The length
* (in <code>char</code>s) of the
* returned sequence is <tt>end - start</tt>, so if <tt>start == end</tt>
* then an empty sequence is returned. </p>   */
CharSequence subSequence(int start, int end);

/**
* Returns a string containing the characters in this sequence in the same
* order as this sequence.  The length of the string will be the length of
* this sequence. </p>
*/
public String toString();
}


只是选了一些注释,伟大的作者什么的都省了!!毕竟注释太多了吖!这是一个字符序列接口,可以返回序列的长度,某个位置的字符,也可以返回子序列,当然还有toString()。

下面是Appendable接口:

package java.lang;

import java.io.IOException;

/**
* An object to which <tt>char</tt> sequences and values can be appended.  The
* <tt>Appendable</tt> interface must be implemented by any class whose
* instances are intended to receive formatted output from a {@link
* java.util.Formatter}.
*/
public interface Appendable {
/**
* Appends the specified character sequence to this <tt>Appendable</tt>.
*
* <p> Depending on which class implements the character sequence
* <tt>csq</tt>, the entire sequence may not be appended.  For
* instance, if <tt>csq</tt> is a {@link java.nio.CharBuffer} then
* the subsequence to append is defined by the buffer's position and limit.
*
* @param  csq
*         The character sequence to append.  If <tt>csq</tt> is
*         <tt>null</tt>, then the four characters <tt>"null"</tt> are
*         appended to this Appendable.
*
* @return  A reference to this <tt>Appendable</tt>
*
* @throws  IOException
*          If an I/O error occurs
*/
Appendable append(CharSequence csq) throws IOException;

/**
* Appends a subsequence of the specified character sequence to this
* <tt>Appendable</tt>.
*
* <p> An invocation of this method of the form <tt>out.append(csq, start,
* end)</tt> when <tt>csq</tt> is not <tt>null</tt>, behaves in
* exactly the same way as the invocation
*
* <pre>
*     out.append(csq.subSequence(start, end)) </pre>
*
* @param  csq
*         The character sequence from which a subsequence will be
*         appended.  If <tt>csq</tt> is <tt>null</tt>, then characters
*         will be appended as if <tt>csq</tt> contained the four
*         characters <tt>"null"</tt>.
*
* @param  start
*         The index of the first character in the subsequence
*
* @param  end
*         The index of the character following the last character in the
*         subsequence
*
* @return  A reference to this <tt>Appendable</tt>
*
* @throws  IndexOutOfBoundsException
*          If <tt>start</tt> or <tt>end</tt> are negative, <tt>start</tt>
*          is greater than <tt>end</tt>, or <tt>end</tt> is greater than
*          <tt>csq.length()</tt>
*
* @throws  IOException
*          If an I/O error occurs
*/
Appendable append(CharSequence csq, int start, int end) throws IOException;

/**
* Appends the specified character to this <tt>Appendable</tt>.
*
* @param  c
*         The character to append
*
* @return  A reference to this <tt>Appendable</tt>
*
* @throws  IOException
*          If an I/O error occurs
*/
Appendable append(char c) throws IOException;
}


注释写的真是炒鸡棒!不忍删。这个接口就是An object to which <tt>char</tt> sequences and values can be appended,Java专门把can be appended中的append拿出来了写了一个接口,感觉非常的细致和规矩,原谅我的表达能力。下面终于到AbstractStringBuilder了,在java中如果要了解某个东西,需要抽丝拨茧,毕竟要继承和实现那么一堆东西!!

容我先吐一口血。。。。。本来还想吐槽这个类的描述,然而这点槽早就消逝在看代码的路途上了。写下笔记:

里面有这么几个方法:ensureCapacity(int minimumCapacity)、ensureCapacityInternal(int minimumCapacity),expandCapacity(int minimumCapacity)。意思就是需要确保当前的容量也就是value.length至少与这个minimumCapacity相等。如果比这个参数小,则这个内部的数组(也就是value)需要重新分配。也就是需要expandCapacity(int minimumCapacity):

void expandCapacity(int minimumCapacity) {
int newCapacity = value.length * 2 + 2;
if (newCapacity - minimumCapacity < 0)
newCapacity = minimumCapacity;
if (newCapacity < 0) {
if (minimumCapacity < 0) // overflow
throw new OutOfMemoryError();
newCapacity = Integer.MAX_VALUE;
}
value = Arrays.copyOf(value, newCapacity);
}


首先把原来的length *2 + 2,如果还小与minimumCapacity,就直接让新的capacity = minimumCapacity,然后利用Arrays.copyOf(value, newCapacity)进行扩展。原本以为不需要再整些别的类中的东西了,真是天真,容我再吐口血,下面是copeOf方法:

public static char[] copyOf(char[] original, int newLength) {
char[] copy = new char[newLength];
System.arraycopy(original, 0, copy, 0,
Math.min(original.length, newLength));
return copy;
}


这个函数首先创建一个newLength的char数组,然后调用arraycopy函数吧原数组中的内容,选择原数组length和newLength中的最小个数的内容,复制到新的数组copy中,然后返回。终于找到这个arraycopy了,去看看这个函数的内部实现:

public static native void arraycopy(Object src, int srcPos,Object dest, int destPos,int length);

找到是一个native方法,怎么办,没事,我们去百度!
http://www.360doc.com/content/14/0713/19/1073512_394157835.shtml
然后找到了这篇文章,意识到这种方法还需要c来实现突然有点小失落,但是毕竟效率在那放着,然而c之后调用了汇编,原谅我的目!瞪!口!呆!!!本来还想深度学习一下来着。。。。。

好了,下一段笔记(关于append):

在AbstractStringBuilder中append重载了很多,像append Sting类型,StringBuffer类型,CharSequence类型等等,但大多数其实都是一个思想,举个栗子:

public AbstractStringBuilder append(String str) {
if(str == null) str = "null";
int len = str.length();
ensureCapacityInternal(count + len);
str.getChars(0, len, value, count);
count += len;
return this;
}


几乎都是先判空,然后ensureCapacityInternal确保当前数组的容量(该扩展的时候扩展),然后调用getChars函数添加str,我们再来看看这个getChars函数:

public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
{
if (srcBegin < 0)
throw new StringIndexOutOfBoundsException(srcBegin);
if ((srcEnd < 0) || (srcEnd > count))
throw new StringIndexOutOfBoundsException(srcEnd);
if (srcBegin > srcEnd)
throw new StringIndexOutOfBoundsException("srcBegin > srcEnd");
System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}


就是先判断一个是否有越界,然后就调用了arraycopy这个函数,把value数组从srcBegin位置开始一共数srcEnd-srcBegin个对象复制给dst数组,当然是从dst的dstBegin开始。

对了,其中有个delete函数非常有意思:

public AbstractStringBuilder delete(int start, int end) {
if (start < 0)
throw new StringIndexOutOfBoundsException(start);
if (end > count)
end = count;
if (start > end)
throw new StringIndexOutOfBoundsException();
int len = end - start;
if (len > 0) {
System.arraycopy(value, start+len, value, start, count-end);
count -= len;
}
return this;
}


它也是调用了arraycopy函数,但是确实是实现了delete的功能,真是精妙(是我太蠢吗),越是学习越是感觉到自己的无知。

在AbstractStringBuilder中重载比较多的函数有一个是insert()函数,这里选一个比较复杂,看懂这个其他的也就显而易见了:

public AbstractStringBuilder insert(int index, char[] str, int offset,
int len)
{
if ((index < 0) || (index > length()))
throw new StringIndexOutOfBoundsException(index);
if ((offset < 0) || (len < 0) || (offset > str.length - len))
throw new StringIndexOutOfBoundsException(
"offset " + offset + ", len " + len + ", str.length "
+ str.length);
ensureCapacityInternal(count + len);
System.arraycopy(value, index, value, index + len, count - index);
System.arraycopy(str, offset, value, index, len);
count += len;
return this;
}


把str从offset位置开始向value的index位置插入len个字符。JDK中都是先判断是否越界,然后确保数组的容量,第一个arraycopy把value从index位置把value中的对象向后挪len个位置给str,第二个arraycopy把str中需要复制的字符复制过去。

源码中还有一个精妙的reverse()反转字符串的函数:

public AbstractStringBuilder reverse() {
boolean hasSurrogate = false;
int n = count - 1;
for (int j = (n-1) >> 1; j >= 0; --j) {
char temp = value[j];
char temp2 = value[n - j];
if (!hasSurrogate) {
hasSurrogate = (temp >= Character.MIN_SURROGATE && temp <= Character.MAX_SURROGATE)
|| (temp2 >= Character.MIN_SURROGATE && temp2 <= Character.MAX_SURROGATE);
}
value[j] = temp2;
value[n - j] = temp;
}
if (hasSurrogate) {
// Reverse back all valid surrogate pairs
for (int i = 0; i < count - 1; i++) {
char c2 = value[i];
if (Character.isLowSurrogate(c2)) {
char c1 = value[i + 1];
if (Character.isHighSurrogate(c1)) {
value[i++] = c1;
value[i] = c2;
}
}
}
}
return this;
}


最基本的就是把第一个对象个最后一个互换,但是这个函数还判断了一下每个字符是否在Character.MIN_SURROGATE(\ud800)和Character.MAX_SURROGATE(\udfff)之间。如果发现整个字符串中含有这种情况,则再次从头至尾遍历一次,同时判断value[i]是否满足Character.isLowSurrogate(),如果满足的情况下,继续判断value[i+1]是否满足Character.isHighSurrogate(),如果也满足这种情况,则将第i位和第i+1位的字符互换。可能有的人会疑惑,为什么要这么做,因为Java中的字符已经采用Unicode代码,每个字符可以放下一个汉字。为什么还要这么做?

一个完整的 Unicode 字符叫代码点CodePoint,而一个 Java char 叫 代码单元 code unit。String 对象以UTF-16保存 Unicode 字符,需要用2个字符表示一个超大字符集的汉字,这这种表示方式称之为 Surrogate,第一个字符叫 Surrogate High,第二个就是 Surrogate Low。具体需要注意的事宜如下:

判断一个char是否是Surrogate区的字符,用Character的 isHighSurrogate()/isLowSurrogate()方法即可判断。从两个Surrogate High/Low 字符,返回一个完整的 Unicode CodePoint 用 Character.toCodePoint()/codePointAt()方法。

一个Code Point,可能需要一个也可能需要两个char表示,因此不能直接使用 CharSequence.length()方法直接返回一个字符串到底有多少个汉字,而需要用String.codePointCount()/Character.codePointCount()。

要定位字符串中的第N个字符,不能直接将N作为偏移量,而需要从字符串头部依次遍历得到,需要用String/Character.offsetByCodePoints() 方法。

从字符串的当前字符,找到上一个字符,也不能直接用offset-- 实现,而需要用 String.codePointBefore()/Character.codePointBefore(),或用 String/Character.offsetByCodePoints()

从当前字符,找下一个字符,不能直接用 offset++实现,需要判断当前 CodePoint的长度后,再计算得到,或用String/Character.offsetByCodePoints()。

(上面那段摘自http://www.jb51.net/article/37399.htm)

本来打算把StringBuilder和StringBuffer写一写的,没想到一个AbstractStringBuilder就把我整的够呛,向Java设计人员们致敬。改题目去,我选择狗带!

额,今天抽空看了下StringBuilder和StringBuffer,原来里面大部分方法都是调用了super也就是AbstractStringBuilder中的方法,StringBuilder和StringBuffer的区别就是由于StringBuffer为了保证在多线程中保证安全,在大部分方法都添加了Java的内置关键字synchronized。没啥。。。。。。。。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: