Understanding "GC (Allocation Failure)" Messages In the logs

Blog Post created by jerryl on Feb 12, 2016

Starting with Version 5.4.2, VMTurbo enabled options within Java to provide additional information about its internal operation.  This causes large numbers of worrying-looking lines like the following to appear in the catalina.out log file:


1124.105: [GC (Allocation Failure) [PSYoungGen: 7618041K->346918K(7738880K)] 8245087K->973972K(8662528K), 0.2008248 secs] [Times: user=1.39 sys=0.02, real=0.20 secs]


These messages are, in fact, part of normal operation.  To understand them requires a bit of background about how memory allocation in a language such as Java works.


Java requests some amount of memory from the operating system, then hands it out to the running program as needed.  The program uses the memory, and eventually has no need for it.  It isn't required to return the memory to Java; instead, it just abandons it.  It might seem that this would lead the program to simply grow indefinitely ... but there's more to the story.  When the program asks for memory and no more is available, or at certain other times, Java runs the Garbage Collector (usually known as just the GC).  The GC scans through all of memory, finds any that was abandoned, and recycles it.  It's now available for the program to reuse.


The line above describes the operation of one cycle of garbage collection.  It starts with the frightening words "GC (Allocation Failure)".  "Allocation Failure" is an explanation of why the Collector was asked to run - and "Allocation Failure" simply means that the program asked for memory and there was none available.  The first numbers after "PSYoungGen" is the amount of memory "checked out" when the Garbage Collector started:  about 7.6 GigaBytes.  The second number, after the arrow, is the amount of space that the Garbage Collector found was actually still being used:  about 3.5 GigaBytes.  So over half of memory was "abandoned" garbage.  There's nothing wrong with this:  It's part of the normal, expected operation of Java programs.  For example, when a chunk of data about a VM arrives, Operations Manager must store it in a Java object, analyze it, and record the relevant information in its long-term representation of that VM.  It then abandons the chunk of data - it's not needed any more.


The number in parentheses is the space actually available for allocation:  About 7.7 GigaBytes.  As the program runs, it may accumulate additional memory that it has to hold onto.  If that happens the Garbage Collector will ask the operating system for additional memory.  This is most obvious when the program first starts running.  The following lines cover about the first 20 seconds of a new Operations Manager's lifetime (the first number on each line is seconds since startup):



8.474: [GC (Allocation Failure) [PSYoungGen: 65536K->10503K(76288K)] 65536K->10583K(251392K), 0.0204444 secs] [Times: user=0.08 sys=0.00, real=0.02 secs]

9.164: [GC (Allocation Failure) [PSYoungGen: 76039K->10722K(76288K)] 76119K->16735K(251392K), 0.0191276 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]

9.767: [GC (Allocation Failure) [PSYoungGen: 76258K->10725K(76288K)] 82271K->25363K(251392K), 0.0240821 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]

10.450: [GC (Allocation Failure) [PSYoungGen: 76261K->10733K(141824K)] 90899K->34428K(316928K), 0.0218849 secs] [Times: user=0.12 sys=0.00, real=0.03 secs]

11.648: [GC (Allocation Failure) [PSYoungGen: 141805K->10727K(141824K)] 165500K->51549K(316928K), 0.0432044 secs] [Times: user=0.20 sys=0.02, real=0.05 secs]

12.441: [GC (Allocation Failure) [PSYoungGen: 141799K->30292K(288256K)] 182621K->71122K(463360K), 0.0384668 secs] [Times: user=0.16 sys=0.04, real=0.04 secs]

14.294: [GC (Allocation Failure) [PSYoungGen: 287828K->35299K(292864K)] 328658K->90575K(467968K), 0.0804071 secs] [Times: user=0.39 sys=0.04, real=0.08 secs]

15.920: [GC (Allocation Failure) [PSYoungGen: 292835K->42250K(557568K)] 348111K->99420K(732672K), 0.0592416 secs] [Times: user=0.25 sys=0.02, real=0.06 secs]

18.929: [GC (Allocation Failure) [PSYoungGen: 552202K->52712K(562688K)] 609372K->141577K(737792K), 0.0957909 secs] [Times: user=0.23 sys=0.07, real=0.10 secs]


If you look at the numbers in parentheses, you can see the Garbage Collector growing the process by getting more memory from the operating system.  This, too, is expected behavior:  The Operations Manager is loading code, initializing various data structures in memory, and may well be starting discovery (which will store information about the environment in memory).


The lines above are taken from a real customer log file.  What happened after those first 20 seconds?  The following graph summarizes garbage collection activity over about two days:




The horizontal axis is time, in seconds, from startup; the vertical axis is KiloBytes.  The red line represents memory usage before each garbage collection - the values before the arrow.  The green line represents memory usage after each garbage collection - the value after the arrow.  You can see the initial ramp-up of demand.  The red line shows some peaks - during discovery, runs of plans, collection of a diagnostic upload, and such; some of them correspond to much smaller peaks in the green line.  But over all, the green line settles down to slowly varying values.  This is a healthy Operations Manager, keeping its data center operating "in the green".