This has been a long-standing complaint with Java, but it's largely meaningless, and usually based on looking at the wrong information. The usual phrasing is something like "Hello World on Java takes 10 megabytes! Why does it need that?" Well, here's a way to make Hello World on a 64-bit JVM claim to take over 4 gigabytes ... at least by one form of measurement.
java -Xms1024m -Xmx4096m com.example.Hello
On Linux, the top command gives you several different numbers for memory. Here's what it says about the Hello World example:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2120 kgregory 20 0 4373m 15m 7152 S 0 0.2 0:00.10 java
The situation for Windows Task Manager is a bit more complicated. Under Windows XP, there are "Memory Usage" and "Virtual Memory Size" columns, but the official documentation is silent on what they mean. Windows Vista and Windows 7 add more columns, and they're actually documented. Of these, the "Working Set" measurement is the most useful; it roughly corresponds to the sum of RES and SHR on Linux.
The virtual memory consumed by a process is the total of everything that's in the process memory map. This includes data (eg, the Java heap), but also all of the shared libraries and memory-mapped files used by the program. On Linux, you can use the pmap command to see all of the things mapped into the process space (from here on out I'm only going to refer to Linux, because it's what I use; I'm sure there are equivalent tools for Windows). Here's an excerpt from the memory map of the "Hello World" program; the entire memory map is over 100 lines long, and it's not unusual to have a thousand-line list.
0000000040000000 36K r-x-- /usr/local/java/jdk-1.6-x64/bin/java 0000000040108000 8K rwx-- /usr/local/java/jdk-1.6-x64/bin/java 0000000040eba000 676K rwx-- [ anon ] 00000006fae00000 21248K rwx-- [ anon ] 00000006fc2c0000 62720K rwx-- [ anon ] 0000000700000000 699072K rwx-- [ anon ] 000000072aab0000 2097152K rwx-- [ anon ] 00000007aaab0000 349504K rwx-- [ anon ] 00000007c0000000 1048576K rwx-- [ anon ] ... 00007fa1ed00d000 1652K r-xs- /usr/local/java/jdk-1.6-x64/jre/lib/rt.jar ... 00007fa1ed1d3000 1024K rwx-- [ anon ] 00007fa1ed2d3000 4K ----- [ anon ] 00007fa1ed2d4000 1024K rwx-- [ anon ] 00007fa1ed3d4000 4K ----- [ anon ] ... 00007fa1f20d3000 164K r-x-- /usr/local/java/jdk-1.6-x64/jre/lib/amd64/libjava.so 00007fa1f20fc000 1020K ----- /usr/local/java/jdk-1.6-x64/jre/lib/amd64/libjava.so 00007fa1f21fb000 28K rwx-- /usr/local/java/jdk-1.6-x64/jre/lib/amd64/libjava.so ... 00007fa1f34aa000 1576K r-x-- /lib/x86_64-linux-gnu/libc-2.13.so 00007fa1f3634000 2044K ----- /lib/x86_64-linux-gnu/libc-2.13.so 00007fa1f3833000 16K r-x-- /lib/x86_64-linux-gnu/libc-2.13.so 00007fa1f3837000 4K rwx-- /lib/x86_64-linux-gnu/libc-2.13.so ...
A quick explanation of the format: each row starts with the virtual memory address of the segment. This is followed by the segment size, permissions, and the source of the segment. This last item is either a file or "anon", which indicates a block of memory allocated via mmap.
Starting from the top, we have
java). This is very small; all it does is load in the shared libraries where the real JVM code is stored.
-Xmxvalue; this allows it to have a contiguous heap. The
-Xmsvalue is used internally to say how much of the heap is "in use" when the program starts, and to trigger garbage collection as that limit is approached.
The shared libraries are particularly interesting: each shared library has at least two segments: a read-only segment containing the library code, and a read-write segment that contains global per-process data for the library (I don't know what the segment with no permissions is; I've only seen it on x64 Linux). The read-only portion of the library can be shared between all processes that use the library; for example,
libc has 1.5M of virtual memory space that can be shared.
The virtual memory map contains a lot of stuff. Some of it is read-only, some of it is shared, and some of it is allocated but never touched (eg, almost all of the 4Gb of heap in this example). But the operating system is smart enough to only load what it needs, so the virtual memory size is largely irrelevant.
Where virtual memory size is important is if you're running on a 32-bit operating system, where you can only allocate 2Gb (or, in some cases, 3Gb) of process address space. In that case you're dealing with a scarce resource, and might have to make tradeoffs, such as reducing your heap size in order to memory-map a large file or create lots of threads.
But, given that 64-bit machines are ubiquitous, I don't think it will be long before Virtual Memory Size is a completely irrelevant statistic.
Resident Set size is that portion of the virtual memory space that is actually in RAM. If your RSS grows to be a significant portion of your total physical memory, it might be time to start worrying. If your RSS grows to take up all your physical memory, and your system starts swapping, it's well past time to start worrying.
But RSS is also misleading, especially on a lightly loaded machine. The operating system doesn't expend a lot of effort to reclaiming the pages used by a process. There's little benefit to be gained by doing so, and the potential for an expensive page fault if the process touches the page in the future. As a result, the RSS statistic may include lots of pages that aren't in active use.
Unless you're swapping, don't get overly concerned about what the various memory statistics are telling you. With the caveat that an ever-growing RSS may indicate some sort of memory leak.
With a Java program, it's far more important to pay attention to what's happening in the heap. The total amount of space consumed is important, and there are some steps that you can take to reduce that. More important is the amount of time that you spend in garbage collection, and which parts of the heap are getting collected.
Accessing the disk (ie, a database) is expensive, and memory is cheap. If you can trade one for the other, do so.