RVM

SPECjvm98: Invalid runs should not be SUCCESSes

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 3.0.1
  • Labels:
    None
  • Number of attachments :
    0

Description

I noticed an inconsistency between the perf report and the stats report for 1903. The perf report showed that javac had failed for all 3 iterations (1, 3, 10), but the stats report and bottom line recorded them as "SUCCESS"es. These jobs were running to completion but were failing with an invalid checksum within the SPEC harness. Such runs should always be reported as failures.

http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1903/production/default/perf-jvm98/javac-3

Once fixed, we need to go back and establish when this regression on javac was introduced.

Activity

Hide
Ian Rogers added a comment -

I think that the checking of the results from the tests is only as good as the harness. I've seen that if you have a clean checkout that dies in VM booting then all the basic tests report that they pass. In any case the automated tests are a godsend.

Show
Ian Rogers added a comment - I think that the checking of the results from the tests is only as good as the harness. I've seen that if you have a clean checkout that dies in VM booting then all the basic tests report that they pass. In any case the automated tests are a godsend.
Hide
Peter Donald added a comment -

It looks like when the perf tests were created they didn't match the behaviour of the original SPECjvm* tests. The statistics are now extracted using a perl script that does not do the correct thing. The original tests matched the pattern "======= @{test} Finished in (\S*) secs" and if it failed to match it would report an error. The perl script does attempt to match a pattern but it does not fail when pattern matching fails and thus bad results can flow through.

I think the best approach is to either fix the perl scripts which I will let someone else do or replace the perl scripts with normal ant tasks as per the SPECjvm98 test. ie replace

<extractFinalSpecResult tag="@{tag}" execution="@{execution}" benchmark="@{benchmark}"/>

with

<extractStatistic tag="@{tag}" execution="@{execution}" key="time" pattern="======= @{benchmark} Finished in (\S*) secs"/>

I don't have anything to test this on but will look at it on the w/e if no one gets to it before then.

Show
Peter Donald added a comment - It looks like when the perf tests were created they didn't match the behaviour of the original SPECjvm* tests. The statistics are now extracted using a perl script that does not do the correct thing. The original tests matched the pattern "======= @{test} Finished in (\S*) secs" and if it failed to match it would report an error. The perl script does attempt to match a pattern but it does not fail when pattern matching fails and thus bad results can flow through. I think the best approach is to either fix the perl scripts which I will let someone else do or replace the perl scripts with normal ant tasks as per the SPECjvm98 test. ie replace <extractFinalSpecResult tag="@{tag}" execution="@{execution}" benchmark="@{benchmark}"/> with <extractStatistic tag="@{tag}" execution="@{execution}" key="time" pattern="======= @{benchmark} Finished in (\S*) secs"/> I don't have anything to test this on but will look at it on the w/e if no one gets to it before then.
Hide
Steve Blackburn added a comment -

I don't quite follow you. It looks to me that the perf tests are doing the right thing (flagging failure on "Invalid") and the others are not. If so, the problem would be with the pre-existing code, not the perf code. Right?

Show
Steve Blackburn added a comment - I don't quite follow you. It looks to me that the perf tests are doing the right thing (flagging failure on "Invalid") and the others are not. If so, the problem would be with the pre-existing code, not the perf code. Right?
Hide
Daniel Frampton added a comment -

The perf-jvm98 tests extract results differently than the normal jvm98 tests. This is because of the difficult in extracting the 'final' SPECjvm98 result when running multiple iterations, in combination with extracting MMTk statistics.

I will look at clearing this up.

Show
Daniel Frampton added a comment - The perf-jvm98 tests extract results differently than the normal jvm98 tests. This is because of the difficult in extracting the 'final' SPECjvm98 result when running multiple iterations, in combination with extracting MMTk statistics. I will look at clearing this up.
Hide
Peter Donald added a comment -

I think we are confusing the terms. The testing code located in the directory testing/tests/perf-jvm98 is wrong and does not extract results in the same way that testing/tests/SPECjvm98 does. When one of the tests starts failing by having invalid run the perf-jvm98 will not indicate that the test has failed but will instead mark the test as being successful but give it an obviously incorrect statistic value such as time="2.976 secs *NOT VALID*". (This is the result of a failed pattern match not causing the extraction script to fail).

When imported into cattrack that statistic can not be converted into a numeric and thus does not appear in either the statistics report or the performance report. The statistic report only explicitly marks that one statistics as failed (ie see javac-1-time) whereas the performance report only reports on that so making it more obvious.

Anyhoo it is probably pretty easy bit of code hackery on the perl script to get it failing in cases of failed matches.

Show
Peter Donald added a comment - I think we are confusing the terms. The testing code located in the directory testing/tests/perf-jvm98 is wrong and does not extract results in the same way that testing/tests/SPECjvm98 does. When one of the tests starts failing by having invalid run the perf-jvm98 will not indicate that the test has failed but will instead mark the test as being successful but give it an obviously incorrect statistic value such as time="2.976 secs *NOT VALID*". (This is the result of a failed pattern match not causing the extraction script to fail). When imported into cattrack that statistic can not be converted into a numeric and thus does not appear in either the statistics report or the performance report. The statistic report only explicitly marks that one statistics as failed (ie see javac-1-time) whereas the performance report only reports on that so making it more obvious. Anyhoo it is probably pretty easy bit of code hackery on the perl script to get it failing in cases of failed matches.
Show
David Grove added a comment - We need to fix this..(verified it is still happening. for example: http://jikesrvm.anu.edu.au/cattrack/results/cayenne.anu.edu.au/sanity-harmony/5282/development/default/SPECjvm98/_213_javac/default/Output.txt)
Hide
Daniel Frampton added a comment -

New results will now be correctly marked as failures. I am looking at doing a retrospective update to the database to fix old values

Show
Daniel Frampton added a comment - New results will now be correctly marked as failures. I am looking at doing a retrospective update to the database to fix old values
Hide
Daniel Frampton added a comment -

I ran a query against cattrack to search for these NOT VALID failures and only found the following since the start of 2008. Since we will pick up new failures, I am closing this issue.

id | start_time | variant | name
-------------------------------+----------
3043 | 2008-03-09 17:06:01 | perf | jack-3

Show
Daniel Frampton added a comment - I ran a query against cattrack to search for these NOT VALID failures and only found the following since the start of 2008. Since we will pick up new failures, I am closing this issue. id | start_time | variant | name -------------------------------+---------- 3043 | 2008-03-09 17:06:01 | perf | jack-3
Hide
Daniel Frampton added a comment -

This was actually two bugs. I fixed perf-jvm98 and have now also fixed the normal SPECjvm98 issue Dave recently pointed to.

Will kick off a cattrack query now to try and find when the latter has been occuring.

Show
Daniel Frampton added a comment - This was actually two bugs. I fixed perf-jvm98 and have now also fixed the normal SPECjvm98 issue Dave recently pointed to. Will kick off a cattrack query now to try and find when the latter has been occuring.
Hide
Daniel Frampton added a comment -

Here are the failures that have occurred for sanity.

id | start_time | variant | name
-------------------------------+-----------
5286 | 2008-09-15 12:00:33 | sanity | _228_jack
5236 | 2008-09-11 12:00:35 | sanity | _228_jack
5219 | 2008-09-10 12:00:37 | sanity | _213_javac
5161 | 2008-09-06 12:00:32 | sanity | _227_mtrt
5161 | 2008-09-06 12:00:32 | sanity | _213_javac
5130 | 2008-09-04 12:00:32 | sanity | _227_mtrt
5079 | 2008-08-31 12:00:34 | sanity | _228_jack
5061 | 2008-08-29 12:00:52 | sanity | _228_jack
4878 | 2008-08-19 12:00:55 | sanity | _213_javac
4878 | 2008-08-19 12:00:55 | sanity | _227_mtrt
4878 | 2008-08-19 12:00:55 | sanity | _228_jack
4628 | 2008-08-03 14:02:24 | sanity | _228_jack
4608 | 2008-08-01 14:04:06 | sanity | _228_jack
3731 | 2008-05-11 14:00:48 | sanity | _228_jack
3494 | 2008-04-19 03:01:28 | sanity | _228_jack
3436 | 2008-04-14 03:01:25 | sanity | _228_jack
3424 | 2008-04-13 03:01:23 | sanity | _228_jack
3038 | 2008-03-09 02:00:37 | sanity | _228_jack
2918 | 2008-02-27 02:01:18 | sanity | _228_jack
2667 | 2008-02-04 02:01:22 | sanity | _228_jack
2634 | 2008-02-01 02:01:08 | sanity | _213_javac
2492 | 2008-01-19 02:01:05 | sanity | _228_jack
2481 | 2008-01-18 02:01:09 | sanity | _213_javac
2409 | 2008-01-11 02:01:09 | sanity | _213_javac
1990 | 2007-11-28 02:01:38 | sanity | SPECjvm98
1982 | 2007-11-27 02:01:12 | sanity | SPECjvm98
1973 | 2007-11-26 02:01:07 | sanity | _228_jack
1962 | 2007-11-25 02:01:06 | sanity | SPECjvm98
1951 | 2007-11-24 02:01:10 | sanity | SPECjvm98
1940 | 2007-11-23 02:01:06 | sanity | SPECjvm98
1928 | 2007-11-22 02:01:08 | sanity | SPECjvm98
1916 | 2007-11-21 02:01:06 | sanity | SPECjvm98
1904 | 2007-11-20 02:22:15 | sanity | SPECjvm98
1892 | 2007-11-19 02:01:04 | sanity | SPECjvm98
1881 | 2007-11-18 02:01:08 | sanity | SPECjvm98
1870 | 2007-11-17 02:00:52 | sanity | SPECjvm98
1802 | 2007-11-11 02:01:08 | sanity | _213_javac
1720 | 2007-11-04 02:01:37 | sanity | _228_jack
1708 | 2007-11-03 02:01:32 | sanity | _228_jack
1696 | 2007-11-02 02:01:25 | sanity | _227_mtrt
1128 | 2007-09-20 03:01:00 | sanity | _228_jack
(41 rows)

Show
Daniel Frampton added a comment - Here are the failures that have occurred for sanity. id | start_time | variant | name -------------------------------+----------- 5286 | 2008-09-15 12:00:33 | sanity | _228_jack 5236 | 2008-09-11 12:00:35 | sanity | _228_jack 5219 | 2008-09-10 12:00:37 | sanity | _213_javac 5161 | 2008-09-06 12:00:32 | sanity | _227_mtrt 5161 | 2008-09-06 12:00:32 | sanity | _213_javac 5130 | 2008-09-04 12:00:32 | sanity | _227_mtrt 5079 | 2008-08-31 12:00:34 | sanity | _228_jack 5061 | 2008-08-29 12:00:52 | sanity | _228_jack 4878 | 2008-08-19 12:00:55 | sanity | _213_javac 4878 | 2008-08-19 12:00:55 | sanity | _227_mtrt 4878 | 2008-08-19 12:00:55 | sanity | _228_jack 4628 | 2008-08-03 14:02:24 | sanity | _228_jack 4608 | 2008-08-01 14:04:06 | sanity | _228_jack 3731 | 2008-05-11 14:00:48 | sanity | _228_jack 3494 | 2008-04-19 03:01:28 | sanity | _228_jack 3436 | 2008-04-14 03:01:25 | sanity | _228_jack 3424 | 2008-04-13 03:01:23 | sanity | _228_jack 3038 | 2008-03-09 02:00:37 | sanity | _228_jack 2918 | 2008-02-27 02:01:18 | sanity | _228_jack 2667 | 2008-02-04 02:01:22 | sanity | _228_jack 2634 | 2008-02-01 02:01:08 | sanity | _213_javac 2492 | 2008-01-19 02:01:05 | sanity | _228_jack 2481 | 2008-01-18 02:01:09 | sanity | _213_javac 2409 | 2008-01-11 02:01:09 | sanity | _213_javac 1990 | 2007-11-28 02:01:38 | sanity | SPECjvm98 1982 | 2007-11-27 02:01:12 | sanity | SPECjvm98 1973 | 2007-11-26 02:01:07 | sanity | _228_jack 1962 | 2007-11-25 02:01:06 | sanity | SPECjvm98 1951 | 2007-11-24 02:01:10 | sanity | SPECjvm98 1940 | 2007-11-23 02:01:06 | sanity | SPECjvm98 1928 | 2007-11-22 02:01:08 | sanity | SPECjvm98 1916 | 2007-11-21 02:01:06 | sanity | SPECjvm98 1904 | 2007-11-20 02:22:15 | sanity | SPECjvm98 1892 | 2007-11-19 02:01:04 | sanity | SPECjvm98 1881 | 2007-11-18 02:01:08 | sanity | SPECjvm98 1870 | 2007-11-17 02:00:52 | sanity | SPECjvm98 1802 | 2007-11-11 02:01:08 | sanity | _213_javac 1720 | 2007-11-04 02:01:37 | sanity | _228_jack 1708 | 2007-11-03 02:01:32 | sanity | _228_jack 1696 | 2007-11-02 02:01:25 | sanity | _227_mtrt 1128 | 2007-09-20 03:01:00 | sanity | _228_jack (41 rows)
Hide
David Grove added a comment -

As shown by the harmony sanity run 5311, we're now correctly detecting the _213_javac failure instead of reporting it as a success. Thanks Daniel!

Show
David Grove added a comment - As shown by the harmony sanity run 5311, we're now correctly detecting the _213_javac failure instead of reporting it as a success. Thanks Daniel!

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: