您的位置：首页 > 其它

JVM的各种垃圾回收器

2017-09-25 12:40 197 查看

Garbage collection in HotSpot JVM

In this blog you may find few in-depth articles related to garbage collection in Oracle's HotSpot and other JVMs. But if you a novice in field of garbage collection, you may feel lost due to level of details. So I decided that high level overview of GC modes in HotSpot would add nicely to existing fairly detailed content.

Foreword

Java (and dominating majority of other modern languages) features automatic memory management aka garbage collection. Once instance of object becomes unreachable from executing program, it is classified as garbage and, eventually, its memory should be added to free pool.

One of simplest approaches to automatic memory management is reference counting. But it has serious limitation though, inability to handle cyclic links in object graph.

If we want to find all effectively unreachable objects, the only way is to find all reachable via recursive object graph traversing. It may sound like a complicated task and it really is.

Garbage collection in modern JVMs including Oracle's HotSpot is a result of lengthy evolution. Huge amounts hard work was put into them, to make them as efficient as they is now. Of cause they may have limitations, deficiencies but in most time this due to tradeoffs which was made for reason. So, please think twice before blaming JVM garbage collector being slow or stupid. Most likely you just underestimating complexity of its work.

Ok, now, let me get straight to garbage collection in HotSpot JVM circa 2011.

HotSpot JVM circa 2011

HotSpot has several GC modes. Modes are controlled with JVM command line options. Default GC mode depends on JVM version, client/server mode of JVM and hardware you are running on (JVM distinguish server and desktop grade hardware by set of heuristics).

Serial GC

JVM switch: -XX:+UseSerialGC

Serial GC is generational garage collection algorithm (if you wander what generation means, read this article).

Actually all GC modes in HotSpot are using generational approach, so won't repeat it for every GC mode.

Young collection is a copy collection. Old space is collected using an implementation of mark/sweep/compact (MSC) algorithm. Both, young and old, collections require stop-the-world (STW) pause and, as name suggests, are executed by single thread. During old space collection, all live objects are moved to beginning of space. This allows JVM to return unused memory to OS.

If you enable GC logging by -XX:+PrintGCDetails you will see following indicators of GC pauses in log:

Young collection

41.614 [GC 41.614: [DefNew: 130716K->7953K(138240K), 0.0525908 secs] 890546K->771614K(906240K), 0.0527947 secs] [Times: user=0.05 sys=0.00, real=0.05 secs]

Full (young + old + perm) collection

41.908 [GC 41.908: [DefNew: 130833K->130833K(138240K), 0.0000257 secs]41.909: [Tenured: 763660K->648667K(768000K), 1.4323505 secs] 894494K->648667K(906240K), [Perm : 1850K->1850K(12288K)], 1.4326801 secs] [Times: user=1.42 sys=0.00, real=1.43 secs]

Parallel scavenge

JVM switch: -XX:+UseParallelGC

Some phases of garbage collection could be naturally parallelized between multiple threads. Parallel processing can reduce time required for GC and thus STW pause duration by keeping multiple physical CPU cores busy. Adoption of multiprocessor/multicore hardware have made parallelization of GC a must for modern VM.

Parallel scavenge GC mode is using parallel implementation of young collection algorithm. Old space is still collected by one thread. Thus, using this mode may shorten young collection pauses (which are more frequent), but still suffers from long full collection freezes.

Log output samples for parallel scavenge.

Young collection

59.821: [GC [PSYoungGen: 147904K->4783K(148288K)] 907842K->769258K(916288K), 0.2382801 secs] [Times: user=0.31 sys=0.00, real=0.24 secs]

Full collection

60.060: [Full GC [PSYoungGen: 4783K->0K(148288K)] [PSOldGen: 764475K->660316K(768000K)] 769258K->660316K(916288K) [PSPermGen: 1850K->1850K(12288K)], 1.2817061 secs] [Times: user=1.26 sys=0.00, real=1.28 secs]

Parallel old GC

JVM switch: -XX:+UseParallelOldGC

This mode is a incremental improvement over parallel scavenge mode. It adds parallel processing (parallel mark-sweep-compact (MSC) algorithm) for old space collection. Young space is using same algorithm as mode above. Old space collection still requires quite long STW pause, but now multiple cores could be employed to make it shorter. Unlike serial MSC, parallel version does not create single continuous free memory region at the end of heap, so JVM cannot return memory to OS after full GC in this mode.

Log output samples for parallel old GC.

Young collection

65.411: [GC [PSYoungGen: 147878K->5434K(144576K)] 908129K->770314K(912576K), 0.2734699 secs] [Times: user=0.41 sys=0.00, real=0.27 secs]

Full collection

65.685: [Full GC [PSYoungGen: 5434K->0K(144576K)] [ParOldGen: 764879K->623094K(768000K)] 770314K->623094K(912576K) [PSPermGen: 1850K->1849K(12288K)], 2.5954844 secs] [Times: user=3.95 sys=0.03, real=2.60 secs]

Adaptive size policy

JVM switch: -XX:+UseAdaptiveSizePolicy

This is a special mode of parallel scavenge collector in which it can dynamically adjust configuration of young space to adapt for an application. IMHO it does not bring much benefits. Never seriously tried to use this option.

Concurrent mark sweep

JVM switch: -XX:+UseConcMarkSweepGC

While collectors above are generally called throughput collectors. Concurrent mark sweep (CMS) is low pause collector. It is designed to minimize stop-the-world JVM pauses and thus keep application responsive. For young space collection it may use either serial copy collector or parallel one (parallel alhorithm is similar to algorithm in parallel scavenge mode, but they are two totally different code bases and they may use slightly different configuration options, e.g. adaptive size policy is not implemented for CMS).

Old (and if enabled permanent) space a collected by mostly concurrently. As name suggests CMS is mark sweep algorithm (notice lack of compact in its name). CMS requires only two short pauses during each old space collection cycle. But unlike its stop-the-world counterparts, CMS cannot do compaction (relocate objects in memory) and this makes it prone to fragmentation. CMS is using some tricks to fight fragmentation, but it is still a threat.

If concurrent collector fails to reclaim memory fast enough to keep with application needs, JVM will fall back to serial stop-the-world mark- sweep-compact algorithm to defragment (and compact) old space (notice serial word, usually such pause would be 50-500 times longer than normal CMS pause).

Log output samples for CMS.

Young collection

613.154: [GC 13.154: [DefNew: 130821K->8230K(138240K), 0.0507238 secs] 507428K->388797K(906240K), 0.0509611 secs] [Times: user=0.06 sys=0.00, real=0.05 secs]

Concurrent old space collection

13.433: [GC [1 CMS-initial-mark: 384529K(768000K)] 395044K(906240K), 0.0045952 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]

13.438: [CMS-concurrent-mark-start]

...

14.345: [CMS-concurrent-mark: 0.412/0.907 secs] [Times: user=1.20 sys=0.00, real=0.91 secs]

14.345: [CMS-concurrent-preclean-start]

14.366: [CMS-concurrent-preclean: 0.020/0.021 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]

14.366: [CMS-concurrent-abortable-preclean-start]

...

14.707: [CMS-concurrent-abortable-preclean: 0.064/0.340 secs] [Times: user=0.36 sys=0.02, real=0.34 secs]

14.707: [GC[YG occupancy: 77441 K (138240 K)]14.708: [Rescan (non-parallel) 14.708: [grey object rescan, 0.0058016 secs]14.714: [root rescan, 0.0424011 secs], 0.0485593 secs]14.756: [weak refs processing, 0.0000109 secs] [1 CMS-remark: 404346K(768000K)] 481787K(906240K), 0.0487607 secs] [Times: user=0.05 sys=0.00, real=0.05 secs]

14.756: [CMS-concurrent-sweep-start]

...

14.927: [CMS-concurrent-sweep: 0.116/0.171 secs] [Times: user=0.23 sys=0.02, real=0.17 secs]

14.927: [CMS-concurrent-reset-start]

14.953: [CMS-concurrent-reset: 0.026/0.026 secs] [Times: user=0.05 sys=0.00, real=0.03 secs]

Times marked with green are times of concurrent phases – CMS do its work in parallel with application. You can find out more about CMS pauses here.

CMS failure and fallback to mark-sweep-compact

557.079: [GC 557.079: [DefNew557.097: [CMS-concurrent-abortable-preclean: 0.010/0.109 secs] [Times: user=0.12 sys=0.00, real=0.11 secs]

(promotion failed) : 130817K->130813K(138240K), 0.1401674 secs]557.219: [CMS (concurrent mode failure): 731771K->584338K(768000K), 2.4659665 secs] 858916K->584338K(906240K), [CMS Perm : 1841K->1835K(12288K)], 2.6065527 secs] [Times: user=2.48 sys=0.03, real=2.61 secs]

You can read more about failures here.

CMS incremental mode

JVM switch: -XX:+CMSIncrementalMode

CMS is using one of more back ground threads to do GC in parallel with application. These thread will compete with application threads for CPU cores. Incremental mode is limiting amount of CPU time consumed by background GC thread. This helps to improve application responsiveness if you have just 1 or 2 physical cores. Of cause old space collection cycles would be longer and risk of full collection fall back higher.

G1 garbage collector

JVM switch: -XX:+UseG1GC

G1 (garbage first) is a new garbage collection mode in HotSpot JVM. It was introduced in late versions of JDK6. G1 is low pause collector implementing incremental version of mark-sweep-compact algorithm. G1 breaks heap into regions of fixed size and can collects only subset (partial collection) of them during stop-the-world (STW) pause (unlike CMS, G1 have to do most of its work during STW). Incremental approach allow G1 to employ larger number of shorter pauses instead of fewer number longer JVM freeze (cumulative amount of pauses will still be much higher compared to concurrent collector like CMS) . To be accurate, G1 also employs background threads to do heap marking concurrently with application (similar to CMS), but most of work is still done during STW.

G1 is using copy collection algorithm for its partial collections. Thus each collection produces several completely empty regions which can be returned to OS.

G1 is also exploiting generational principle. Set of regions is considered young space and treated accordingly.

G1 has a lot of hype as a garbage collection silver bullet, but I'm personally quite skeptical about it. Here are few reasons:

G1 have to maintain few additional data structures to make partial GC possible, it taxes performance.

It is still doing most of heavy lifting during STW pause unlike CMS which is mostly concurrent. That IMHO will hinder G1's ability to scale as well as CMS with growing heap sizes.

Large objects (comparable to region size) are problematic for G1 (due to fragmentation).

G1 collections in logs:

G1 young collection

[GC pause (young), 0.00176242 secs]
[Parallel Time: 1.6 ms]
[GC Worker Start Time (ms): 15751.4 15751.4]
[Update RS (ms): 0.1 0.3
Avg: 0.2, Min: 0.1, Max: 0.3]
[Processed Buffers : 2 1
Sum: 3, Avg: 1, Min: 1, Max: 2]
[Ext Root Scanning (ms): 1.0 0.9
Avg: 0.9, Min: 0.9, Max: 1.0]
[Mark Stack Scanning (ms): 0.0 0.0
Avg: 0.0, Min: 0.0, Max: 0.0]
[Scan RS (ms): 0.0 0.0
Avg: 0.0, Min: 0.0, Max: 0.0]
[Object Copy (ms): 0.3 0.3
Avg: 0.3, Min: 0.3, Max: 0.3]
[Termination (ms): 0.0 0.0
Avg: 0.0, Min: 0.0, Max: 0.0]
[Termination Attempts : 1 1
Sum: 2, Avg: 1, Min: 1, Max: 1]
[GC Worker End Time (ms): 15752.9 15752.9]
[Other: 0.1 ms]
[Clear CT: 0.0 ms]
[Other: 0.1 ms]
[Choose CSet: 0.0 ms]
[ 18M->12M(26M)]
[Times: user=0.00 sys=0.02, real=0.00 secs]

G1 partial collection

[GC pause (partial), 0.01589707 secs]
[Parallel Time: 15.6 ms]
[GC Worker Start Time (ms): 15774.1 15774.2]
[Update RS (ms): 0.0 0.0
Avg: 0.0, Min: 0.0, Max: 0.0]
[Processed Buffers : 0 3
Sum: 3, Avg: 1, Min: 0, Max: 3]
[Ext Root Scanning (ms): 1.0 0.7
Avg: 0.8, Min: 0.7, Max: 1.0]
[Mark Stack Scanning (ms): 0.0 0.0
Avg: 0.0, Min: 0.0, Max: 0.0]
[Scan RS (ms): 0.0 0.1
Avg: 0.0, Min: 0.0, Max: 0.1]
[Object Copy (ms): 14.3 14.5
Avg: 14.4, Min: 14.3, Max: 14.5]
[Termination (ms): 0.0 0.0
Avg: 0.0, Min: 0.0, Max: 0.0]
[Termination Attempts : 3 3
Sum: 6, Avg: 3, Min: 3, Max: 3]
[GC Worker End Time (ms): 15789.5 15789.5]
[Other: 0.4 ms]
[Clear CT: 0.0 ms]
[Other: 0.2 ms]
[Choose CSet: 0.0 ms]
[ 13M->12M(26M)]
[Times: user=0.03 sys=0.00, real=0.02 secs]

G1 full collection (incremental mode failure)

32.940: [Full GC 772M->578M(900M), 1.9597901 secs]
[Times: user=2.29 sys=0.08, real=1.96 secs]

Train GC

Train GC was removed from HotSpot JVM long time ago. But due to most articles about GC are fairly out dated you may find references to it sometimes. It is gone, period.

Permanent space

In case you are wondering. Permanent space is a part of old space use by JVM for internal data structures (mostly related to class loading and JIT). Permanent space is not necessary cleaned on every old space collection iteration and sometimes you may need to use additional switches to make JVM collect unused data in PermGen. Normally data in permanent space are, well ..., immortal, but ability of JVM to unload classes makes things complicated.