Issue Details (XML | Word | Printable)

Key: RVM-341
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Ian Rogers
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RVM

Improved copying in VM_Memory

Created: 10/Nov/07 07:59 AM   Updated: 10/Jun/08 04:03 PM
Component/s: Instruction Architecture: Intel, Runtime
Affects Version/s: None
Fix Version/s: 1000

Time Tracking:
Not Specified


 Description  « Hide
r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups:

http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report

most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Ian Rogers added a comment - 10/Nov/07 08:16 AM
There are particular more things we can do for Intel. Looking at:

http://cdrom.amd.com/devconn/events/AMD_block_prefetch_paper.pdf

we are using a 32bit copy loop with a performance of around 640MB/s (at 2001 bus speeds - DDR2100) whereas the best copy loop achieves 1976MB/s, and this is without using 128bit XMM registers.


Ian Rogers added a comment - 10/Nov/07 09:41 AM
It appears the best pair of copy instructions are movq to load and movntq to store. The movntq is an unordered store so at the end of the copy loop a sfence is necessary. Similarly (p)xor-ing a register and then using movntq is the best way to zero memory. We can assume SSE. It would be nice to know whether there's an advantage in interleaving SSE XMM register movq/ntq(s) with MMX MM register ones.

Ian Rogers added a comment - 11/Nov/07 04:24 AM
There's specific coverage of using non-temporal stores and prefetching in section 9.7 of the Intel optimization manual:
http://www.intel.com/design/processor/manuals/248966.pdf