added a comment - - edited
In RVM-885 it was questioned what the overhead of oprofile was, here are the results of back-to-back runs with and without oprofile of DaCapo fop on 64bit x86:
Without oprofile:
===== DaCapo fop starting warmup =====
===== DaCapo fop completed warmup in 12206 msec =====
===== DaCapo fop starting warmup =====
===== DaCapo fop completed warmup in 6612 msec =====
===== DaCapo fop starting warmup =====
===== DaCapo fop completed warmup in 4578 msec =====
===== DaCapo fop starting warmup =====
===== DaCapo fop completed warmup in 7280 msec =====
===== DaCapo fop starting =====
===== DaCapo fop PASSED in 4526 msec =====
With oprofile:
===== DaCapo fop starting warmup =====
===== DaCapo fop completed warmup in 12689 msec =====
===== DaCapo fop starting warmup =====
===== DaCapo fop completed warmup in 6637 msec =====
===== DaCapo fop starting warmup =====
===== DaCapo fop completed warmup in 4524 msec =====
===== DaCapo fop starting warmup =====
===== DaCapo fop completed warmup in 7290 msec =====
===== DaCapo fop starting =====
===== DaCapo fop PASSED in 4515 msec =====
OProfile must log all compilations, and this causes a short but perceptible pause at boot time when the boot image is logged. As we must anonymously map the boot image there is also a potential disadvantage compared to the conventional lazy load approach (although trying to optimise this in the past hasn't proven massively fruitful). Other than that oprofile's overhead appears to be in the realms of noise.
Commit 0743909c31ead90b66396b89d8bc0f1e832c1eb3 adds support for OProfile that I've tested in 64bit x86 with a prototype build, example stats are below. Currently an opannotate will fail as it forks too many objdump processes and swamps the system on where it is run.
Overflow stats not available
warning: [vdso] (tgid:28214 range:0x7fff097ff000-0x7fff09800000) could not be found.
CPU: Core 2, speed 2659.97 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % image name symbol name
30523 6.5961 28214.jo org.jikesrvm.VM._assert(ZLjava/lang/String;Ljava/lang/String;)V
27091 5.8544 28214.jo org.jikesrvm.mm.mmtk.Assert._assert(Z)V
22701 4.9057 28214.jo org.jikesrvm.VM._assert(Z)V
14315 3.0935 28214.jo org.mmtk.utility.alloc.Allocator.alignAllocation(Lorg/vmmagic/unboxed/Address;IIIZ
)Lorg/vmmagic/unboxed/Address;
10551 2.2801 28214.jo org.mmtk.policy.immix.Block.getDefragStateAddress(Lorg/vmmagic/unboxed/Address;)Lo
rg/vmmagic/unboxed/Address;
10305 2.2269 28214.jo java.lang.String.<init>([BIII)V
9288 2.0071 28214.jo org.mmtk.utility.heap.Map.getChunkIndex(Lorg/vmmagic/unboxed/Address;)I
9097 1.9659 28214.jo org.jikesrvm.objectmodel.JavaHeader.getPointerInMemoryRegion(Lorg/vmmagic/unboxed/
ObjectReference;)Lorg/vmmagic/unboxed/Address;
6803 1.4701 28214.jo org.mmtk.utility.heap.Map.getSpaceForAddress(Lorg/vmmagic/unboxed/Address;)Lorg/mm
tk/policy/Space;
5987 1.2938 28214.jo org.mmtk.policy.Space.getSpaceForObject(Lorg/vmmagic/unboxed/ObjectReference;)Lorg
/mmtk/policy/Space;
5785 1.2501 28214.jo org.jikesrvm.mm.mmtk.ObjectModel.refToAddress(Lorg/vmmagic/unboxed/ObjectReference
;)Lorg/vmmagic/unboxed/Address;