We all may have noticed that when it comes to memory consumption, the memory usage of our Java applications doesn't follow our strict instructions based on -Xmx (max heap size) options. In fact, the JVM has more memory regions than just the heap.
To confine total memory usage, there are some additional memory settings to be aware of, so let's start with the memory structure of Java applications and sources of memory allocations.
2. Memory Structure of a Java Process
The memory of the Java Virtual Machine (JVM) is divided into two main categories: heap and non-heap.
Heap memory is the most well-known part of the JVM memory. It stores objects created by the application. The JVM initiates the heap at start-up. When the application creates a new instance of an object, the object resides in the heap until the application releases the instance. Then, the garbage collector (GC) frees the memory held by the instance. Hence, the heap size varies based on load, although we can configure max JVM heap size using the -Xmx option.
Non-heap memory makes up the rest. It's what allows our applications to use more memory than the configured heap size. The JVM’s non-heap memory is divided into several different areas. Data like JVM code and internal structures, loaded profiler agent code, per-class structures like the constant pool, metadata of fields and methods, and the code for methods and constructors, as well as interned strings, are all classified as non-heap memory.
It's worth mentioning that we can tune some areas of non-heap memory with -XX options, like -XX:MaxMetaspaceSize (equivalent to –XX:MaxPermSize in Java 7 and earlier). We'll see more flags throughout this tutorial.
Besides the JVM itself, there are other areas where a Java process consumes memory. As an example, we have off-heap techniques that usually use direct ByteBuffers to handle big memory and keep it out of the control of the GC. Another source is the memory used by native libraries.
3. JVM's Non-Heap Memory Areas
Let's continue with non-heap memory areas of the JVM.
Metaspace is a native memory region that stores metadata for classes. When a class is loaded, the JVM allocates the metadata of the class, which is its runtime representation, into Metaspace. Whenever the class loader and all its classes are removed from the heap, then their allocation in Metaspace can be considered to be freed by GC.
However, the released Metaspace is not necessarily returned to the OS. All or part of that memory may still be retained by the JVM to be reused for future class loading.
In Java versions older than 8, Metaspace is called Permanent Generation (PermGen). However, unlike Metaspace, which is an area off-heap memory, PermGen resides in a special heap area.
3.2. Code Cache
The Just-In-Time (JIT) compiler stores its output in the code cache area. A JIT compiler compiles bytecode to native code for frequently executed sections, aka Hotspots. The tiered compilation, introduced in Java 7, is the means by which the client compiler (C1) compiles code with instrumentation, and then, the server compiler (C2) uses the profiled data to compile that code in an optimized manner.
The goal of the tiered compilation is to mix C1 and C2 compilers to have both fast startup times and good long-term performance. Tiered compilation increases the amount of code that needs to be cached in memory by up to four times. Since Java 8, this is enabled by default for JIT, although we still can disable tiered compilation.
The thread stack contains all local variables for each executed method and the methods the thread has called to reach the current point of execution. The thread stack is only accessible by the thread that created it.
In theory, as the thread stack memory is a function of the number of running threads, and as there's no limit on the number of threads, the thread area is unbounded and can occupy a big portion of memory. In reality, the OS limits the number of threads, and the JVM has a default value for the size of the memory of the stack per thread based on the platform.
3.4. Garbage Collection
The JVM ships with a set of GC algorithms that can be chosen based on our application's use case. Whatever algorithm we use, some amount of native memory will be allocated to the GC process, but the amount of used memory varies depending on which garbage collector is used.
The JVM uses the Symbol area to store symbols such as field names, method signatures, and interned strings. In the Java development kit (JDK), symbols are stored in three different tables:
- The System Dictionary contains all the loaded type information like Java classes.
- The Constant Pool uses the Symbol Table data structure to save loaded symbols for classes, methods, fields, and enumerable types. The JVM maintains a per-type constant pool called the Run-Time Constant Pool, which contains several kinds of constants, ranging from compile-time numeric literals to runtime methods and even field references.
- The String Table contains the reference to all the constant strings, also referred to as interned strings.
To understand the String Table, we need to know a bit more about the String Pool.
The String Pool is the JVM mechanism that optimizes the amount of memory allocated to a String
by storing only one copy of each literal String
in the pool, by a process called interning. The String Pool has two parts:
- The contents of interned strings live in the Java heap as regular String objects.
- The hash table, which is the so-called String Table, is allocated off-heap and contains the references to the interned strings.
In other words, the String Pool has both on-heap and off-heap parts. The off-heap part is the String Table. Although it's usually smaller, it still can take a significant amount of extra memory when we have more interned strings.
The Arena is the JVM's own implementation of Arena-based memory management, which is distinct from the arena memory management of glibc. It is used by some subsystems of the JVM like the compiler and symbols or when native code uses internal objects that rely on JVM arenas.
Every other memory usage that can't be categorized in the native memory area falls in this section. As an example, DirectByteBuffer usage is indirectly visible in this part.
Now that we've discovered that Java memory usage is not just limited to the heap, we're going to investigate ways to track down total memory usage. Discovery can be done with the help of profiling and memory monitoring tools, and then, we can tweak total usage with some specific tuning.
Let's have a quick view of tools we can use for JVM memory monitoring that ship with the JDK:
- jmap is a command-line utility that prints the memory map for a running VM or core file. We can use jmap to query a process on a remote machine as well. However, after introducing jcmd in JDK8, it is recommended to use jcmd instead of jmap for enhanced diagnostics and reduced performance overhead.
- jcmd is used to send diagnostic command requests to the JVM, where these requests are useful for controlling Java Flight Recordings, troubleshooting, and diagnosing the JVM and Java applications. jcmd does not work with remote processes. We'll see some specific usage of jcmd in this article.
- jhat visualizes a heap dump file via booting up a local webserver. There are several ways to create a heap dump like using jmap -dump or jcmd GC.heap_dump filename.
- hprof is capable of presenting CPU usage, heap allocation statistics, and monitoring contention profiles. Depending on the type of requested profiling, hprof instructs the virtual machine to collect the relevant JVM Tool Interface (JVM TI) events and processes the event data into profiling information.
Other than JVM-shipped tools, there are also OS-specific commands to check the memory of a process. pmap is a tool available for Linux distributions that provides a complete view of memory used by a Java process.
5. Native Memory Tracking
Native Memory Tracking (NMT) is a JVM feature we can use to track the internal memory usage of the VM. NMT does not track all native memory usage like third-party native code memory allocations, however, it's sufficient for a large class of typical applications.
To start with NMT, we have to enable it for our application:
java -XX:NativeMemoryTracking=summary -jar app.jar
Other available values for -XX:NativeMemoryTracking is off and detail. Just be aware that enabling NMT has an overhead cost that will impact performance. Moreover, NMT adds two machine words to all malloced memory as a malloc header.
Then we can use jps or jcmd with no arguments to find the process id (pid) of our application:
After we find our application pid, we can continue with jcmd, which offers a long list of options to monitor. Let's ask jcmd for help to see available options:
jcmd <pid> help
And from the output, we see that jcmd supports different categories such as Compiler, GC, JFR, JVMTI, ManagementAgent, and VM. Some options like VM.metaspace, VM.native_memory can help us with memory tracking. Let's explore a few of these.
5.1. Report of Native Memory Summary
The handiest one is VM.native_memory. We can use it to see the summary of our application's VM internal native memory usage:
jcmd <pid> VM.native_memory summary
Native Memory Tracking:
Total: reserved=1779287KB, committed=503683KB
- Java Heap (reserved=307200KB, committed=307200KB)
- Class (reserved=1089000KB, committed=44824KB)
- Thread (reserved=41139KB, committed=41139KB)
- Code (reserved=248600KB, committed=17172KB)
- GC (reserved=62198KB, committed=62198KB)
- Compiler (reserved=175KB, committed=175KB)
- Internal (reserved=691KB, committed=691KB)
- Other (reserved=16KB, committed=16KB)
- Symbol (reserved=9704KB, committed=9704KB)
- Native Memory Tracking (reserved=4812KB, committed=4812KB)
- Shared class space (reserved=11136KB, committed=11136KB)
- Arena Chunk (reserved=176KB, committed=176KB)
- Logging (reserved=4KB, committed=4KB)
- Arguments (reserved=18KB, committed=18KB)
- Module (reserved=175KB, committed=175KB)
- Safepoint (reserved=8KB, committed=8KB)
- Synchronization (reserved=4235KB, committed=4235KB)
Looking at the output, we can see the summary of JVM memory areas like Java heap, GC, and thread. The term “reserved” memory means the total address range pre-mapped via malloc or mmap, so it is the maximum addressable memory for this area. The term “committed” means the memory actively in use.
Here, we can find a detailed explanation of the output. To see the changes in memory usage, we can use VM.native_memory baseline and VM.native_memory summary.diff in sequence.
We can try other VM options of jcmd to have an overview of some specific areas of native memory, like Metaspace, symbols, and interned strings.
Let's try Metaspace:
jcmd <pid> VM.metaspace
And our output looks like this:
Total Usage - 1072 loaders, 9474 classes (1176 shared):
Non-class space: 38.00 MB reserved, 36.67 MB ( 97%) committed
Class space: 1.00 GB reserved, 5.62 MB ( <1%) committed
Both: 1.04 GB reserved, 42.30 MB ( 4%) committed
Waste (percentages refer to total committed size 42.30 MB):
Committed unused: 192.00 KB ( <1%)
Waste in chunks in use: 2.98 KB ( <1%)
Free in chunks in use: 1.05 MB ( 2%)
Overhead in chunks in use: 232.12 KB ( <1%)
In free chunks: 77.00 KB ( <1%)
Deallocated from chunks in use: 191.62 KB ( <1%) (890 blocks)
-total-: 1.73 MB ( 4%)
CompressedClassSpaceSize: 1.00 GB
InitialBootClassLoaderMetaspaceSize: 4.00 MB
Now, let's see our application's String Table:
jcmd <pid> VM.stringtable
And let's see the output:
Number of buckets : 65536 = 524288 bytes, each 8
Number of entries : 20046 = 320736 bytes, each 16
Number of literals : 20046 = 1507448 bytes, avg 75.000
Total footprint : = 2352472 bytes
Average bucket size : 0.306
Variance of bucket size : 0.307
Std. dev. of bucket size: 0.554
Maximum bucket size : 4
6. JVM Memory Tuning
We know that the Java application is using total memory as the sum of heap allocations and a bunch of non-heap allocations by the JVM or third-party libraries.
Non-heap memory is less likely to change in size under load. Usually, our application has steady non-heap memory usage, once all of the in-use classes are loaded and the JIT is fully warmed up. However, there are flags we can use to instruct the JVM on how to manage memory usage in some areas.
jcmd offers a VM.flag option to see which flags our Java process already has including default values, hence we can use it as a tool to inspect default configurations and get an idea of how the JVM is configured:
jcmd <pid> VM.flags
And here, we see the used flags with their values:
Let's have a look at some VM flags for memory tuning of different areas.
We have a handful of flags for tuning the JVM heap. To configure minimum and maximum heap sizes, we have -Xms (-XX:InitialHeapSize) and -Xmx (-XX:MaxHeapSize). If we prefer to set the heap size as a percentage of physical memory, we can use -XX:MinRAMPercentage and -XX:MaxRAMPercentage. It's important to know that the JVM ignores these two when we use the -Xms and -Xmx options, respectively.
Another option that could affect the memory allocation patterns is XX:+AlwaysPreTouch. By default, the JVM max heap is allocated in virtual memory, not physical memory. The OS might decide to not allocate memory as long as there is no write operation. To circumvent this (especially with huge DirectByteBuffers, where a reallocation can take some time rearranging OS memory pages), we can enable -XX:+AlwaysPreTouch. Pretouching writes “0” on all pages and forces the OS to allocate memory, not just reserve it. Pretouching causes a delay in JVM startup as it works in a single thread.
6.2. Thread Stack
Thread stack is the per-thread storage of all local variables for each executed method. We use the -Xss or XX:ThreadStackSize option to configure the stack size per thread. The default thread stack size is platform-dependent, but in most modern 64-bit operating systems, it's up to 1 MB.
6.3. Garbage Collection
We can set our application's GC algorithm with one of these flags: -XX:+UseSerialGC, -XX:+UseParallelGC, -XX:+UseParallelOldGC, -XX:+UseConcMarkSweepGC, or -XX:+UseG1GC.
If we choose G1 as the GC, we can optionally enable string deduplication by -XX:+UseStringDeduplication. It can save a significant percentage of memory. String deduplication only applies to long-lived instances. To circumvent this, we can configure the effective age of instances with -XX:StringDeduplicationAgeThreshold. The value of-XX:StringDeduplicationAgeThreshold indicates the number of GC cycles survival.
6.4. Code Cache
The JVM segments the code cache into three areas starting with Java 9. Therefore, the JVM offers specific options to tune each of them:
- -XX:NonNMethodCodeHeapSize configures the non-method segment, which is JVM internal related code. By default, it's around 5 MB.
- -XX:ProfiledCodeHeapSize configures the profiled-code segment, which is C1 compiled code with potentially short lifetimes. The default size is around 122 MB.
- -XX:NonProfiledCodeHeapSize sets the size of the non-profiled segment, which is C2 compiled code with potentially long lifetimes. The default size is around 122 MB.
The JVM starts by reserving memory, then parts of this “reserve” will be made available by modifying the memory mappings using malloc and mmap of glibc. The acts of reserving and releasing memory chunks can cause fragmentation. The fragmentation in allocated memory can lead to a lot of unused areas in memory.
Besides malloc, there are other allocators we can use, like jemalloc or tcmalloc. The jemalloc is a general-purpose malloc implementation that emphasizes fragmentation avoidance and scalable concurrency support, hence it often appears smarter than regular glibc's malloc. Furthermore, jemalloc also can be used for leak checking and heap profiling.
Like the heap, we also have options for configuring Metaspace size. To configure Metaspace's lower and upper bounds, we can use -XX:MetaspaceSize and -XX:MaxMetaspaceSize, respectively.
-XX:InitialBootClassLoaderMetaspaceSize is also useful to configure the initial boot class loader size.
There are -XX:MinMetaspaceFreeRatio and -XX:MaxMetaspaceFreeRatio options to configure minimum and maximum percentage of class metadata capacity free after GC.
We're also able to configure the maximum size of Metaspace expansion without full GC with -XX:MaxMetaspaceExpansion.
6.7. Other Non-Heap Memory Areas
There are also flags for tuning the usages of other areas of native memory.
We can use -XX:StringTableSize to specify the map size of the String Pool, where the map size indicates the maximum number of distinct interned strings. For JDK7+, the default map size is 600013, meaning we could have 600,013 distinct strings in the pool by default.
To control the memory usage of DirectByteBuffers, we can use -XX:MaxDirectMemorySize. With this option, we limit the amount of memory that can be reserved for all DirectByteBuffers.
For applications that need to load more classes, we can use -XX:PredictedLoadedClassCount. This option is available since JDK8 and allows us to set the bucket size of the system dictionary.
In this article, we've explored the different memory areas of a Java process and a few tools to monitor memory usage. We've seen that Java memory usage is not just limited to the heap, so then we used jcmd to inspect and track the memory usage of the JVM. Finally, we reviewed some JVM flags that can help us tune the memory usage of our Java application.
res – REST with Spring (eBook) (everywhere)