Against X10 2.0.2
The following code (benchmark^BenchmarkForEach.x10 attached) is intended as a test of the multi-threaded scaling of foreach. It does not scale as expected with X10_NTHREADS. That is, performance is roughly the same on a multi-core system for X10_NTHREADS=1..4
This was compiled with -O -NO_CHECKS. When run on an Intel Q6600 @ 2.40GHz (quad core), the following timings were observed:
The [modified benchmark is intended to eliminate the overhead of foreach, by only creating as many activities as there are threads. However this does not scale well either. Looking at top, the CPU usage is as expected (100% for X10_NTHREADS=1, 200% for X10_NTHREADS=2 and so on); but the ops count only increases by a small fraction:
Against SVN HEAD
The base performance is better against SVN HEAD, but the scaling is similar. If the first benchmark is modified to change the foreach to an ateach statement, the performance against SVN HEAD improves markedly with the number of places.
(This cannot be said for X10 2.0.2 as it does not include Igor's fix to
All this seems to suggest a problem with the way threads of execution are utilized by foreach.