Details
-
Type:
Bug
-
Status:
Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: X10 2.0.2, X10 2.0.3, X10 2.0.4
-
Fix Version/s: X10 2.3.2
-
Component/s: Implementation Limitation
-
Labels:None
-
Environment:Linux x86-64, Intel Q6600 @ 2.40GHz (quad core)
-
Testcase included:yes
-
Number of attachments :
Description
Against X10 2.0.2
The following code (benchmark^BenchmarkForEach.x10
attached) is intended as a test of the multi-threaded scaling of foreach. It does not scale as expected with X10_NTHREADS. That is, performance is roughly the same on a multi-core system for X10_NTHREADS=1..4
val a = Array.make[Double](Dist.makeBlock([0..N-1], 0), ((i) : Point) => i as Double)
finish foreach ((i) in a) {
a(i) += 2.0;
}
This was compiled with -O -NO_CHECKS. When run on an Intel Q6600 @ 2.40GHz (quad core), the following timings were observed:
| X10_NTHREADS | kop/s |
|---|---|
| 1 | 339 |
| 2 | 281 |
| 3 | 346 |
| 4 | 360 |
The [modified benchmark
is intended to eliminate the overhead of foreach, by only creating as many activities as there are threads. However this does not scale well either. Looking at top, the CPU usage is as expected (100% for X10_NTHREADS=1, 200% for X10_NTHREADS=2 and so on); but the ops count only increases by a small fraction:
| X10_NTHREADS | Mop/s |
|---|---|
| 1 | 3.37 |
| 2 | 3.95 |
| 3 | 4.31 |
| 4 | 4.11 |
Against SVN HEAD
The base performance is better against SVN HEAD, but the scaling is similar. If the first benchmark is modified to change the foreach to an ateach statement, the performance against SVN HEAD improves markedly with the number of places.
| Places | X10_NTHREADS | kop/s |
|---|---|---|
| 1 | 1 | 561 |
| 1 | 2 | 426 |
| 1 | 3 | 483 |
| 1 | 4 | 457 |
| 2 | 1 | 846 |
| 4 | 1 | 1470 |
(This cannot be said for X10 2.0.2 as it does not include Igor's fix to XTENLANG-1143.)
All this seems to suggest a problem with the way threads of execution are utilized by foreach.
Issue Links
- relates to
-
XTENLANG-587
Array initilaization is sequential (doesn't execute the initialization closure in a separate async for each point)
-