您的位置:首页 > 编程语言 > Java开发

java.lang.String的substring、split方法引起的内存问题

2015-06-25 13:45 555 查看
本文大部分内容,摘自下面两篇文章:

http://blog.xebia.com/2007/10/04/leaking-memory-in-java/

http://www.iteye.com/topic/626801

先用一个极端例子说明String的substring方法引起的OutOfMemoryError问题:

[java] view
plaincopy

public class TestGC {

private String large = new String(new char[100000]);

public String getSubString() {

return this.large.substring(0,2);

}

public static void main(String[] args) {

ArrayList<String> subStrings = new ArrayList<String>();

for (int i = 0; i <1000000; i++) {

TestGC testGC = new TestGC();

subStrings.add(testGC.getSubString());

}

}

}

运行该程序,结果出现:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

为什么会出现这个情况?查看一下JDK String类substring方法的源码,可以找到原因,源码如下:

[java] view
plaincopy

public String substring(int beginIndex, int endIndex) {

if (beginIndex < 0) {

throw new StringIndexOutOfBoundsException(beginIndex);

}

if (endIndex > count) {

throw new StringIndexOutOfBoundsException(endIndex);

}

if (beginIndex > endIndex) {

throw new StringIndexOutOfBoundsException(endIndex - beginIndex);

}

return ((beginIndex == 0) && (endIndex == count)) ? this :

new String(offset + beginIndex, endIndex - beginIndex, value);

}

该方法最后一行,调用了String的一个私有的构造方法,如下:

[java] view
plaincopy

// Package private constructor which shares value array for speed.

String(int offset, int count, char value[]) {

this.value = value;

this.offset = offset;

this.count = count;

}

从该构造函数的访问权限和注释,可以看出,SUN为了优化性能而专门写了这个构造方法。

该方法为了避免内存拷贝,提高性能,并没有重新创建char数组,而是直接复用了原String对象的char[],通过改变偏移量和长度来标识不同的字符串内容。也就是说,substring出的来String小对象,仍然会指向原String大对象的char[],所以就导致了OutOfMemoryError问题。

找到问题之后,将上面代码中,getSubString的方法修改一下,如下:

[java] view
plaincopy

public String getSubString() {

return new String(this.large.substring(0,2));

}

将substring的结果,重新new一个String出来。再运行该程序,则没有出现OutOfMemoryError的问题。为什么?因为此时调用的是String类的public的构造方法,该方法源码如下:

[java] view
plaincopy

public String(String original) {

int size = original.count;

char[] originalValue = original.value;

char[] v;

if (originalValue.length > size) {

// The array representing the String is bigger than the new

// String itself. Perhaps this constructor is being called

// in order to trim the baggage, so make a copy of the array.

int off = original.offset;

v = Arrays.copyOfRange(originalValue, off, off+size);

} else {

// The array representing the String is the same

// size as the String, so no point in making a copy.

v = originalValue;

}

this.offset = 0;

this.count = size;

this.value = v;

}

从代码可以看出,在String对象中value的length大于count的情况下,会重新创建一个char[],并进行内存拷贝。

除了substring方法之后,String的split方法,也存在同样的问题,split的源码如下:

[java] view
plaincopy

public String[] split(String regex, int limit) {

urn Pattern.compile(regex).split(this, limit);

}

可以看出,String的split方法通过Pattern的split方法来实现,Pattern的split方法源码如下:

[java] view
plaincopy

public String[] split(CharSequence input, int limit) {

int index = 0;

boolean matchLimited = limit > 0;

ArrayList<String> matchList = new ArrayList<String>();

Matcher m = matcher(input);

// Add segments before each match found

while(m.find()) {

if (!matchLimited || matchList.size() < limit - 1) {

String match = input.subSequence(index, m.start()).toString();

matchList.add(match);

index = m.end();

} else if (matchList.size() == limit - 1) { // last one

String match = input.subSequence(index,

input.length()).toString();

matchList.add(match);

index = m.end();

}

}

// If no match was found, return this

if (index == 0)

return new String[] {input.toString()};

// Add remaining segment

if (!matchLimited || matchList.size() < limit)

matchList.add(input.subSequence(index, input.length()).toString());

// Construct result

int resultSize = matchList.size();

if (limit == 0)

while (resultSize > 0 && matchList.get(resultSize-1).equals(""))

resultSize--;

String[] result = new String[resultSize];

return matchList.subList(0, resultSize).toArray(result);

}

方法中的第9行:Stirng
match = input.subSequence(intdex, m.start()).toString();

调用了String类的subSequence方法,该方法源码如下:

[java] view
plaincopy

public CharSequence subSequence(int beginIndex, int endIndex) {

return this.substring(beginIndex, endIndex);

}

通过代码可以看出,最终调用的是String类的substring方法,因此存在同样的问题。split出来的小对象,直接使用原String对象的char[]。

看了一下StringBuilder和StringBuffer的substring方法,则不存在这样的问题。其源码如下:

[java] view
plaincopy

public String substring(int start, int end) {

(start < 0)

throw new StringIndexOutOfBoundsException(start);

(end > count)

throw new StringIndexOutOfBoundsException(end);

(start > end)

throw new StringIndexOutOfBoundsException(end - start);

return new String(value, start, end - start);

}

最后一行,调用了String类的public构造方法,方法源码如下:

[java] view
plaincopy

public String(char value[], int offset, int count) {

if (offset < 0) {

throw new StringIndexOutOfBoundsException(offset);

}

if (count < 0) {

throw new StringIndexOutOfBoundsException(count);

}

// Note: offset or count might be near -1>>>1.

if (offset > value.length - count) {

throw new StringIndexOutOfBoundsException(offset + count);

}

this.offset = 0;

this.count = count;

this.value = Arrays.copyOfRange(value, offset, offset+count);

}

该方法不是直接使用原String对象的char[],而是重新进行了内存拷贝。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: