jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • X10
  • XTENLANG-280

General sequential performance of Array library

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: X10 1.7.2 - C++ hosted, X10 1.7.3
  • Fix Version/s: X10 2.0.5
  • Component/s: XRX Runtime
  • Labels:
    None

Description

Opening this issue for ongoing general status and discussion of issues relating to sequential performance of array-related code (point, region, dist, array).

Some preliminary measurements:

.                   cpp-opt        x10-cpp-opt    
SeqRail2            a: 256  Mop/s  b: 146  Mop/s
SeqPseudoArray2a    c: 256  Mop/s  d: 71.5 Mop/s
SeqPseudoArray2b                   e: 2.15 Mop/s
SeqArray2                          f: 323  kop/s

Over and above the general performance issues in C++ generated code (a/b/d/e), there is an additional 6x performance difference between "best possible" generic array and actual library array (e/f).

A known issue is that array bounds checks cannot currently be disabled. Will measure performance impact of that by commenting out bounds checks locally.

The array design depends heavily on inlining; will investigate whether that may be inhibited by the same issues that cause the general C++ performance issues (a/b/d/e).

Issue Links

is depended upon by

Task - A task that needs to be done. XTENLANG-962 Array library redesign

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Bruce Lucas added a comment - 09/Dec/08 2:41 PM

After commenting out array bounds checking the performance of the actual x10.lang.Array code (f in the table above) becomes 2.13 Mop/s, essentially the same as SeqPseudoArray2b, which is the "best possible" generic array code (e in the table above).

In other words, it doesn't appear at the moment that there's anything that can be done in the array library beyond disabling bounds checking to improve performance. Will investigate a mechanism for disabling bounds checking.

Also I'm a little surprised that bounds checking has such an effect, because it should be reasonably optimal. Will investigate.

Show
Bruce Lucas added a comment - 09/Dec/08 2:41 PM After commenting out array bounds checking the performance of the actual x10.lang.Array code (f in the table above) becomes 2.13 Mop/s, essentially the same as SeqPseudoArray2b, which is the "best possible" generic array code (e in the table above). In other words, it doesn't appear at the moment that there's anything that can be done in the array library beyond disabling bounds checking to improve performance. Will investigate a mechanism for disabling bounds checking. Also I'm a little surprised that bounds checking has such an effect, because it should be reasonably optimal. Will investigate.
Hide
Permalink
David Grove added a comment - 09/Dec/08 2:50 PM

We should take a close look at the machine code sequence being used to implement bounds checking.

Bounds check for a 1-D, 0-indexed array should only take 3 instructions: load of array length, unsigned compare, conditional branch. In tight loops I've seen this take up to 10%, but more than that would be really surprising. It should also be the case in loops that the load of the array length is loop invariant, so it gets down to 2 instructions (compare, branch).

Show
David Grove added a comment - 09/Dec/08 2:50 PM We should take a close look at the machine code sequence being used to implement bounds checking. Bounds check for a 1-D, 0-indexed array should only take 3 instructions: load of array length, unsigned compare, conditional branch. In tight loops I've seen this take up to 10%, but more than that would be really surprising. It should also be the case in loops that the load of the array length is loop invariant, so it gets down to 2 instructions (compare, branch).
Hide
Permalink
Igor Peshansky added a comment - 18/Mar/09 6:20 PM

Retarget to 1.7.4.

Show
Igor Peshansky added a comment - 18/Mar/09 6:20 PM Retarget to 1.7.4.
Hide
Permalink
Bruce Lucas added a comment - 23/Mar/09 10:55 AM

Unassigning per Igor's request.

Show
Bruce Lucas added a comment - 23/Mar/09 10:55 AM Unassigning per Igor's request.
Hide
Permalink
Igor Peshansky added a comment - 11/May/09 1:47 PM

Defer to 1.7.5

Show
Igor Peshansky added a comment - 11/May/09 1:47 PM Defer to 1.7.5
Hide
Permalink
David Grove added a comment - 11/Jun/09 8:35 PM

Defer performance work to 2.0.

Show
David Grove added a comment - 11/Jun/09 8:35 PM Defer performance work to 2.0.
Hide
Permalink
David Grove added a comment - 25/Oct/09 9:01 PM

Move bulk of 2.0 performance items to 2.1 target.

Show
David Grove added a comment - 25/Oct/09 9:01 PM Move bulk of 2.0 performance items to 2.1 target.
Hide
Permalink
David Grove added a comment - 01/Jun/10 3:08 PM

defer all non-critical X10 issues to 2.1.0.

Show
David Grove added a comment - 01/Jun/10 3:08 PM defer all non-critical X10 issues to 2.1.0.
Hide
Permalink
David Grove added a comment - 13/Jul/10 8:39 PM

NO_CHECKS performance of Array is now identical to that of Rail.

with checking enabled, tight loops (eg SeqArray2a.x10) show about a 3x slow down, but I don't believe we can do any better (modulo array-bounds check elimination in the common optimizer) without giving up on allowing Arrays to be defined over user-defined Regions.

Show
David Grove added a comment - 13/Jul/10 8:39 PM NO_CHECKS performance of Array is now identical to that of Rail. with checking enabled, tight loops (eg SeqArray2a.x10) show about a 3x slow down, but I don't believe we can do any better (modulo array-bounds check elimination in the common optimizer) without giving up on allowing Arrays to be defined over user-defined Regions.
Hide
Permalink
David Grove added a comment - 16/Jul/10 10:23 AM

bulk close of 2.0.5 resolved issues.

Show
David Grove added a comment - 16/Jul/10 10:23 AM bulk close of 2.0.5 resolved issues.

People

  • Assignee:
    Unassigned
    Reporter:
    Bruce Lucas
Vote (0)
Watch (0)

Dates

  • Created:
    09/Dec/08 12:23 PM
    Updated:
    16/Jul/10 10:23 AM
    Resolved:
    13/Jul/10 8:39 PM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.