RVM
  1. RVM
  2. RVM-27

float/double to int/long conversions are overly expensive

    Details

    • Type: Improvement Improvement
    • Status: Closed Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.2
    • Labels:
      None
    • Environment:
      All Intel
    • Number of attachments :
      0

      Description

      We generate f2i, f2l, d2i and d2l as a syscall in the baseline compiler or as at least 4 floating point compares in the optimizing compiler. In the common case these should be single floating point operations. An attempt to get this working was reverted by revision 12542, we should endeavour to get the fast code working again.

        Activity

        Hide
        Ian Rogers added a comment -

        Bumping the priority on this as [df]2[il] is something we do quite a lot of. With SSE2 the conditional moves are gone, replaced by the baseline compilers syscalls.

        Show
        Ian Rogers added a comment - Bumping the priority on this as [df] 2 [il] is something we do quite a lot of. With SSE2 the conditional moves are gone, replaced by the baseline compilers syscalls.
        Ian Rogers made changes -
        Field Original Value New Value
        Priority Minor [ 4 ] Major [ 3 ]
        Fix Version/s 2.9.2 [ 13599 ]
        Hide
        Ian Rogers added a comment -

        So using my methodology of testing whether the newer conversion code is identical to the old I've found a large number of cases such as:

        class test {
        public static void main(String args[])

        { float x = Float.intBitsToFloat(0x7f8199d6); System.out.println((long)x); }

        }

        where the result from Sun's JVM is 0 and from the Jikes RVM is -9223372036854775808. This is a NaN case, so its likely we need to add explicit NaN tests to our current sysFloatToLong. I'm bumping the priority of this as its a case where the Jikes RVM is not only inefficient but also incorrect.

        Show
        Ian Rogers added a comment - So using my methodology of testing whether the newer conversion code is identical to the old I've found a large number of cases such as: class test { public static void main(String args[]) { float x = Float.intBitsToFloat(0x7f8199d6); System.out.println((long)x); } } where the result from Sun's JVM is 0 and from the Jikes RVM is -9223372036854775808. This is a NaN case, so its likely we need to add explicit NaN tests to our current sysFloatToLong. I'm bumping the priority of this as its a case where the Jikes RVM is not only inefficient but also incorrect.
        Ian Rogers made changes -
        Priority Major [ 3 ] Critical [ 2 ]
        Ian Rogers made changes -
        Assignee Ian Rogers [ ianrogers ]
        Hide
        Ian Rogers added a comment -

        r12916 adds support for hand crafted f2i and f2l operations in the baseline compiler as well as f2i in the opt compiler. f2l is more work in the opt compiler as the x87 control word needs storing during operation. d2i and d2l still need implementing for both compilers. They are very similar to d2i and f2l (the precision of the input is merely extended) but I hope to sanity check the assembler against our current approach as I've done with the f2[il] operations. This will take ~2^32 times longer.

        Show
        Ian Rogers added a comment - r12916 adds support for hand crafted f2i and f2l operations in the baseline compiler as well as f2i in the opt compiler. f2l is more work in the opt compiler as the x87 control word needs storing during operation. d2i and d2l still need implementing for both compilers. They are very similar to d2i and f2l (the precision of the input is merely extended) but I hope to sanity check the assembler against our current approach as I've done with the f2 [il] operations. This will take ~2^32 times longer.
        Hide
        Daniel Frampton added a comment -

        I hope I am misunderstanding... are you suggesting we exhaustively test 64 bit patterns? If your 32 bit case takes 1 seconds (seems unlikely) this will only take you 100 years or so on the same machine

        If you have working versions of float conversions I would suggest that if you structure the double version in the same way, and test the result in two ways. First statistically - by comparing the results of random patterns, and second by testing across where boundary conditions exist (maxlong/int, zero, etc).

        Show
        Daniel Frampton added a comment - I hope I am misunderstanding... are you suggesting we exhaustively test 64 bit patterns? If your 32 bit case takes 1 seconds (seems unlikely) this will only take you 100 years or so on the same machine If you have working versions of float conversions I would suggest that if you structure the double version in the same way, and test the result in two ways. First statistically - by comparing the results of random patterns, and second by testing across where boundary conditions exist (maxlong/int, zero, etc).
        Hide
        Ian Rogers added a comment -

        it'd be a shame to hold up release 3.0 by 100 years. I'll see if I can sanity check on the biggest multicore I can find. Experience with f2i suggests the boundary with maxint was problematic and that using equals in the equality was the most important fix. You're right in that the code so far gives a template on how to fix the doubles case, except for [fd]2l operations in the opt compiler I'm trying to do this as a background task. Hopefully first I can get d2[il] for the baseline compiler and d2i for the opt compiler done. [fd]2l for the opt compiler need interaction with saving a stack slot for the fpscw or detecting SSE3 support, neither of which is easy in complex operators currently. A cheap work around would be to reserve the stack slot in the BURS phase for use in complex operators. I'm happy if anyone wants to lend a hand.

        Show
        Ian Rogers added a comment - it'd be a shame to hold up release 3.0 by 100 years. I'll see if I can sanity check on the biggest multicore I can find. Experience with f2i suggests the boundary with maxint was problematic and that using equals in the equality was the most important fix. You're right in that the code so far gives a template on how to fix the doubles case, except for [fd] 2l operations in the opt compiler I'm trying to do this as a background task. Hopefully first I can get d2 [il] for the baseline compiler and d2i for the opt compiler done. [fd] 2l for the opt compiler need interaction with saving a stack slot for the fpscw or detecting SSE3 support, neither of which is easy in complex operators currently. A cheap work around would be to reserve the stack slot in the BURS phase for use in complex operators. I'm happy if anyone wants to lend a hand.
        Hide
        Ian Rogers added a comment -

        r13168 implements f2l in the opt compiler for SSE using x87. Playing around with the control word is a total mess. We need 2 versions (with and without truncation) and 2 stack slots to move between FPRs and GPRs, so we're playing with a total of 4 memory slots. Intel's opt guide explicitly state we should really be using the SSE3 FISTTP instruction - which removes all use of the control word. We should be able to using the same code with some subtle tweaks for precision for d2[il]. We should also be able to use this code for the x87 backend for f2[il] (the current FPR2INT code in BURS helpers is crazy - it tries to truncate using conditional moves and math, which is probably slower than a syscall).

        Show
        Ian Rogers added a comment - r13168 implements f2l in the opt compiler for SSE using x87. Playing around with the control word is a total mess. We need 2 versions (with and without truncation) and 2 stack slots to move between FPRs and GPRs, so we're playing with a total of 4 memory slots. Intel's opt guide explicitly state we should really be using the SSE3 FISTTP instruction - which removes all use of the control word. We should be able to using the same code with some subtle tweaks for precision for d2 [il] . We should also be able to use this code for the x87 backend for f2 [il] (the current FPR2INT code in BURS helpers is crazy - it tries to truncate using conditional moves and math, which is probably slower than a syscall).
        Hide
        Ian Rogers added a comment -

        When SSE3 support is added to the configuration and assembler we should use FISTTP but for now this issue is cloesd with r13171.

        Show
        Ian Rogers added a comment - When SSE3 support is added to the configuration and assembler we should use FISTTP but for now this issue is cloesd with r13171.
        Ian Rogers made changes -
        Resolution Fixed [ 1 ]
        Status Open [ 1 ] Closed [ 6 ]

          People

          • Assignee:
            Ian Rogers
            Reporter:
            Ian Rogers
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: