The top 11 uses of space in a development boot image are:
[Lorg.jikesrvm.ia32.VM_Code; 19928 instances using 15,027,238bytes
we know why we have compiled code in the boot image
[I 77046 instances using 14,978,180bytes
This looks related to MMTk's free lists, although there are some int arrays used for edge counters and reference map information
[B 54862 instances using 3,024,994bytes
Most byte arrays are either going to be bytecodes or atom values
org.jikesrvm.classloader.VM_NormalMethod 20250 instances using 1,215,000bytes
We compile lots of methods and there data structures go in the boot image
org.jikesrvm.util.VM_HashMap$Bucket 58576 instances using 1,171,520bytes
Moving infrequently accessed fields into hash maps means we need space for the maps
org.jikesrvm.compilers.opt.runtimesupport.VM_OptCompiledMethod 19438 instances using 1,088,528bytes
org.jikesrvm.classloader.VM_MethodReference 25,674 using 924,264bytes
maybe we could be lazy about generating information like the return type? This could save some bytes here and the cost could be kept cheap if we processed the descriptor backwards.
[C 11889 instances using 920,498bytes
the number is just a little higher than the number of java.lang.Strings. We could just have have 1 giant char[] for all strings in the boot image, but this would take some engineering and its not clear we'd save much. We should already only be writing interned strings into the boot image. I did a quick experiment but generating a super string using string buffers considerably put up the boot image writer memory requirements.
org.jikesrvm.util.VM_HashSet$Bucket 42746 instances using 683,936bytes
see number 4 in the list
org.jikesrvm.classloader.VM_Atom 33970 instances using 679,400bytes
maybe some of these could be reloaded as Daniel describes
[J 31 instances using 459,252bytes
These appear to be statistics arrays that due to sloppy coding in MMTk are lying around in the boot image. Issue RVM-226 relates to this.
We currently keep around the byte codes for every method in the boot image. I haven't looked at how much we might be able to save, but just as an idea perhaps we can be more selective about these?
While we need them in some cases, there might be situations where having to resort to slurping them in from a jar file would be a better solution.