5 Tips for Reducing Your Java Garbage Collection Overhead
2016-01-24 15:35
801 查看
What are some of the most useful tips for keeping your GC overhead low?
With the upcoming-yet-delayed-once-again release of Java 9, the G1 (“Garbage First”) garbage collector is set to become the default collector of the HotSpot JVM. From the serial garbage collector
all the way to the CMS collector, the JVM has seen many GC implementations throughout its lifetime, and the G1 collector is next in line.
As garbage collectors evolve, each generation (no pun intended) brings to the table advancements and improvements over previous ones. The parallel GC that followed the serial collector made garbage
collection multithreaded, utilizing the compute capabilities of multi-core machines. The CMS (“Concurrent Mark-Sweep”) collector that followed divided the collection into multiple phases, allowing much of the collection work to be done concurrently while the
application threads are running — resulting in much less frequent “stop-the-world” pauses. G1 adds better performance on JVMs with very large heaps, and has much more predictable and uniform pauses.
New Post: 5 Tips for Reducing Your #Java Garbage
Collection Overheadhttps://t.co/Xe2PfiFZuM pic.twitter.com/DO4SKbis9G
— Takipi (@takipid) December 16, 2015
However advanced GCs get, their achilles’ heel remains the same: redundant and unpredictable object allocations. Here are some quick, applicable, eternal tips that will help you keep your GC overhead
at bay, no matter which garbage collector you choose to utilize.
All standard Java collections, as well as most custom and extended implementations (such as Trove and Google’s
Guava), use underlying arrays (either primitive- or object-based). Since arrays are immutable in size once allocated, adding items to a collection may in many cases cause an old underlying array to be dropped in favor of a larger newly-allocated
array.
Most collection implementations try to optimize this re-allocation process and keep it to an amortized minimum, even if the expected size of the collection is not provided. However, the best results
can be achieved by providing the collection with its expected size upon construction.
Let’s take the following code as a simple example:
?
This method allocates a new array, then fills it up with items from another list, only in reverse order.
The point that could be painful and can be optimized is the line that adds items to the new list. With each addition, the list needs to make sure its underlying array has enough free slots in it
to accommodate the new item. If it does, it simply stores the new item in the next free slot. If not, it allocates a new underlying array, copies the old array’s content into the new array, then adds the new item. This results in multiple allocations of arrays,
that remain there for the GC to eventually collect.
We can avoid these redundant allocations by letting the array know how many items it’s expected to hold, while constructing it:
?
This makes the initial allocation performed by the ArrayList constructor large enough to hold list.size() items,
which means it does not have to reallocate memory during the iteration.
Guava’s collection classes take this a step further, allowing us to initialize collections either with ab exact number of expected items, or an estimate.
?
The former is for cases in which we know exactly how many items the collection is going to hold, while the latter allocates some padding to account for estimation errors.
When processing streams of data, such as data read from files, or data downloaded over the network, for example, it’s very common to see something along the lines of:
?
The resulting byte array could then be parsed into an XML document, JSON object or Protocol Buffer message, to name a few popular options.
When dealing with large files or ones of unpredictable size, this is obviously a bad idea, as it exposes us to OutOfMemoryErrors in case the JVM can’t
actually allocate a buffer the size of the whole file.
But even if the size of the data seems manageable, using the above pattern can cause significant overhead when it comes to garbage collection, as it allocates a relatively large blob on the heap
to hold the file data.
A better way to approach this is to use the appropriate InputStream (FileInputStream in this case) and feed
it directly into the parser, without first reading the whole thing into a byte array. All major libraries expose APIs to parse streams directly, for example:
?
Immutability has many, many advantages. Don’t even get me started. However, one advantage that’s rarely given the attention it deserves is its effect on garbage collection.
An immutable object is an object whose fields (and specifically non-primitive fields in our case) cannot be modified after the object has been constructed. For example:
?
Instantiating the above class results in an immutable object — all its fields are markedfinal and cannot be modified past construction.
Immutability implies that all objects referenced by an immutable container have been created before the construction of the container completes. In GC terms: The container is at least as young as
the youngest reference it holds. This means that when performing garbage collection cycles on young generations, the GC can skip immutable objects that lie in older generations, since it knows for sure they cannot reference anything in the generation that’s
being collected.
Less objects to scan mean less memory pages to scan, and less memory pages to scan mean shorter GC cycles, which mean shorter GC pauses and better overall throughput.
Strings are probably the most prevalent non-primitive data structure in any JVM-based application. However, their implicit weight and convenience of use make them easy culprits in applications’ large
memory footprints.
The problem does not lie with literal strings obviously, as these are inlined and interned, but rather with strings that are allocated and constructed at runtime. Let’s take a look at a quick example
of dynamic string construction:
?
This is a nice little method that takes an array and returns a string representation for it. That is also hell in terms of object allocation.
It’s hard to see past all of this syntactic sugar, but what’s actually going on behind the scenes is this:
?
Strings are immutable, which means they in themselves do not get modified when concatenation takes place, but rather new strings are allocated in turn. In addition, the compiler utilizes the standard StringBuilder
class in order to actually perform these concatenations. This leads to double trouble, as in each iteration of the loop, we get both (1) implicit allocations of interim strings, and (2) implicit allocations of interim StringBuilder objects to help us construct
the final result.
The best way to avoid this is to explicitly use StringBuilder and append to it directly, instead of using the somewhat naive concatenation operator (“+”). Here’s what it may look like:
?
Here, only one StringBuilder is allocated by us at the beginning of the method. From that point on, all strings and list items are appended to that sole StringBuilder,
that’s eventually converted only once into a string using its toString method, and returned.
Java’s standard collection library is convenient and generic, allowing us to use collections with semi-static type binding. This is fantastic if we want to use, for example, a set of strings (Set<String>),
or a map between a pair and a list of strings (Map<Pair, List<String>>).
The real problem begins when we want to hold a list of ints, or a map with values of typedouble. Since generic
types can’t be used with primitives, the alternative is using the boxed types instead, so instead of List<int>, we need to use List<Integer>.
This is very wasteful, as an Integer is a full-fledged Object, replete with an object header of 12-bytes and
an internal 4-byte int field holding its value. This sums up to 16-bytes per Integer item. That’s 4 times the size of a list of primitive ints
of the same size! The bigger problem with this, however, is the fact that all these Integers are actually object instances that need to be accounted for during garbage collection.
In order to tackle this problem, we at Takipi use
the excellent Trove collection library.Trove gives up some (but not all) generics in favor
of specialized memory-efficient primitive collections. For example, instead of the wasteful Map<Integer, Double>, there is a specialized alternative in the form of TIntDoubleMap:
?
Trove’s underlying implementation uses primitive arrays, so no boxing (int -> Integer) or unboxing (Integer -> int)
takes place while manipulating collections, and no objects are stored in place of the primitives.
As garbage collectors continue to advance, and as run-time optimization and JIT compilers get smarter, we as developers will find ourselves caring less and less about how to write GC-friendly code.
However, for the time being, and no matter how advanced G1 may be, there’s still a lot we can do in order to help the JVM.
Stop using log files for debugging your production JVMs – See
a live demo
Java exceptions have never been so beautiful – Try Takipi
for Java
From:http://blog.takipi.com/5-tips-for-reducing-your-java-garbage-collection-overhead/
With the upcoming-yet-delayed-once-again release of Java 9, the G1 (“Garbage First”) garbage collector is set to become the default collector of the HotSpot JVM. From the serial garbage collector
all the way to the CMS collector, the JVM has seen many GC implementations throughout its lifetime, and the G1 collector is next in line.
As garbage collectors evolve, each generation (no pun intended) brings to the table advancements and improvements over previous ones. The parallel GC that followed the serial collector made garbage
collection multithreaded, utilizing the compute capabilities of multi-core machines. The CMS (“Concurrent Mark-Sweep”) collector that followed divided the collection into multiple phases, allowing much of the collection work to be done concurrently while the
application threads are running — resulting in much less frequent “stop-the-world” pauses. G1 adds better performance on JVMs with very large heaps, and has much more predictable and uniform pauses.
New Post: 5 Tips for Reducing Your #Java Garbage
Collection Overheadhttps://t.co/Xe2PfiFZuM pic.twitter.com/DO4SKbis9G
— Takipi (@takipid) December 16, 2015
However advanced GCs get, their achilles’ heel remains the same: redundant and unpredictable object allocations. Here are some quick, applicable, eternal tips that will help you keep your GC overhead
at bay, no matter which garbage collector you choose to utilize.
Tip #1: Predict Collection Capacities
All standard Java collections, as well as most custom and extended implementations (such as Trove and Google’sGuava), use underlying arrays (either primitive- or object-based). Since arrays are immutable in size once allocated, adding items to a collection may in many cases cause an old underlying array to be dropped in favor of a larger newly-allocated
array.
Most collection implementations try to optimize this re-allocation process and keep it to an amortized minimum, even if the expected size of the collection is not provided. However, the best results
can be achieved by providing the collection with its expected size upon construction.
Let’s take the following code as a simple example:
?
The point that could be painful and can be optimized is the line that adds items to the new list. With each addition, the list needs to make sure its underlying array has enough free slots in it
to accommodate the new item. If it does, it simply stores the new item in the next free slot. If not, it allocates a new underlying array, copies the old array’s content into the new array, then adds the new item. This results in multiple allocations of arrays,
that remain there for the GC to eventually collect.
We can avoid these redundant allocations by letting the array know how many items it’s expected to hold, while constructing it:
?
which means it does not have to reallocate memory during the iteration.
Guava’s collection classes take this a step further, allowing us to initialize collections either with ab exact number of expected items, or an estimate.
?
Tip #2: Process Streams Directly
When processing streams of data, such as data read from files, or data downloaded over the network, for example, it’s very common to see something along the lines of:?
When dealing with large files or ones of unpredictable size, this is obviously a bad idea, as it exposes us to OutOfMemoryErrors in case the JVM can’t
actually allocate a buffer the size of the whole file.
But even if the size of the data seems manageable, using the above pattern can cause significant overhead when it comes to garbage collection, as it allocates a relatively large blob on the heap
to hold the file data.
A better way to approach this is to use the appropriate InputStream (FileInputStream in this case) and feed
it directly into the parser, without first reading the whole thing into a byte array. All major libraries expose APIs to parse streams directly, for example:
?
Tip #3: Use Immutable Objects
Immutability has many, many advantages. Don’t even get me started. However, one advantage that’s rarely given the attention it deserves is its effect on garbage collection.An immutable object is an object whose fields (and specifically non-primitive fields in our case) cannot be modified after the object has been constructed. For example:
?
Immutability implies that all objects referenced by an immutable container have been created before the construction of the container completes. In GC terms: The container is at least as young as
the youngest reference it holds. This means that when performing garbage collection cycles on young generations, the GC can skip immutable objects that lie in older generations, since it knows for sure they cannot reference anything in the generation that’s
being collected.
Less objects to scan mean less memory pages to scan, and less memory pages to scan mean shorter GC cycles, which mean shorter GC pauses and better overall throughput.
Tip #4: Be wary of String Concatenation
Strings are probably the most prevalent non-primitive data structure in any JVM-based application. However, their implicit weight and convenience of use make them easy culprits in applications’ largememory footprints.
The problem does not lie with literal strings obviously, as these are inlined and interned, but rather with strings that are allocated and constructed at runtime. Let’s take a look at a quick example
of dynamic string construction:
?
It’s hard to see past all of this syntactic sugar, but what’s actually going on behind the scenes is this:
?
class in order to actually perform these concatenations. This leads to double trouble, as in each iteration of the loop, we get both (1) implicit allocations of interim strings, and (2) implicit allocations of interim StringBuilder objects to help us construct
the final result.
The best way to avoid this is to explicitly use StringBuilder and append to it directly, instead of using the somewhat naive concatenation operator (“+”). Here’s what it may look like:
?
that’s eventually converted only once into a string using its toString method, and returned.
Tip #5: Use Specialized Primitive Collections
Java’s standard collection library is convenient and generic, allowing us to use collections with semi-static type binding. This is fantastic if we want to use, for example, a set of strings (Set<String>),or a map between a pair and a list of strings (Map<Pair, List<String>>).
The real problem begins when we want to hold a list of ints, or a map with values of typedouble. Since generic
types can’t be used with primitives, the alternative is using the boxed types instead, so instead of List<int>, we need to use List<Integer>.
This is very wasteful, as an Integer is a full-fledged Object, replete with an object header of 12-bytes and
an internal 4-byte int field holding its value. This sums up to 16-bytes per Integer item. That’s 4 times the size of a list of primitive ints
of the same size! The bigger problem with this, however, is the fact that all these Integers are actually object instances that need to be accounted for during garbage collection.
In order to tackle this problem, we at Takipi use
the excellent Trove collection library.Trove gives up some (but not all) generics in favor
of specialized memory-efficient primitive collections. For example, instead of the wasteful Map<Integer, Double>, there is a specialized alternative in the form of TIntDoubleMap:
?
takes place while manipulating collections, and no objects are stored in place of the primitives.
Final Thoughts
As garbage collectors continue to advance, and as run-time optimization and JIT compilers get smarter, we as developers will find ourselves caring less and less about how to write GC-friendly code.However, for the time being, and no matter how advanced G1 may be, there’s still a lot we can do in order to help the JVM.
Stop using log files for debugging your production JVMs – See
a live demo
Java exceptions have never been so beautiful – Try Takipi
for Java
More from Takipi:
From:http://blog.takipi.com/5-tips-for-reducing-your-java-garbage-collection-overhead/
If You've Written Java Code in 2015 - Here Are the Trends You Couldn't Have Missed
4 out of 5 Java Developers Failed to Solve This Question
Java 9 Countdown
相关文章推荐
- SpringMVC接收json数据中的一些问题(415错误的解决)
- 【源】从零自学Hadoop(07):Eclipse插件
- Java编程思想
- Java数据类型和MySql数据类型对照表
- 1.24 Java周末总结 ①乘法数据的溢出 ②测试random随机数几率是否相等 ③判断字符串是否为纯数字
- java.time时间/日期API
- Java 基础——类的加载
- 【java】POJO 与javaBean(普通bean和EJB)
- java--String类练习---计算字符串的次数
- Java并发编程:阻塞队列及实现生产者-消费者模式
- 1.23 Java基础总结 方法的声明和调用
- Java EE知识体系
- 安卓蓝牙入门二-------整合版本
- struts的开发模式
- 1.22 Java基础总结 最常用冒泡排序
- JavaSE-匿名类_匿名内部类的使用
- java练习题---String类之求给定字符串中大小写字母的个数
- JDBC编程JAVA
- Java 高级类(上) —— 抽象类和接口
- JAVA web设置默认首页