RVM
  1. RVM
  2. RVM-884

MMTk stats can be hard for humans to read

    Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.4
    • Component/s: MMTk
    • Labels:
      None
    • Patch Submitted:
      Yes
    • Number of attachments :
      2

      Description

      MMTk attempts to print its statistics in a single row like so:

      {{
      ============================ MMTk Statistics Totals ============================
      GC time.mu time.gc perf.mu perf.gc refType scan finalize prepare precopy stacks root forward release init finish L1I_MISSES.mu L1I_MISSES.gc
      7 6751.76 5881.21 0 0 99.35 5038.20 22.58 0.28 0.92 4.99 713.75 0.00 0.65 0.12 0.14 2330019756995 1443109010784
      Total time: 12632.98 ms
      ------------------------------ End MMTk Statistics -----------------------------
      }}

      Hopefully the problems with this approach are clear to see from a human readers perspective: i) Headers do not always line up with values, ii) It's even harder to read with many counters as the output becomes wider than your terminal

      Attached is a trivial patch that instead prints one statistic per line, like so:

      {{
      ============================ MMTk Statistics Totals ============================
      GC: 7
      time.mu: 6872.53
      time.gc: 5854.95
      perf.mu: 0
      perf.gc: 0
      refType: 101.65
      scan: 5008.27
      finalize: 22.64
      prepare: 0.30
      precopy: 0.89
      stacks: 4.79
      root: 715.20
      forward: 0.00
      release: 0.66
      init: 0.16
      finish: 0.18
      L1I_MISSES.mu: 1569810545957(SCALED)
      L1I_MISSES.gc: 951335255621(SCALED)
      Total time: 12727.48 ms
      ------------------------------ End MMTk Statistics -----------------------------
      }}

      As a human I certainly prefer the second output, however I have no idea how many scripts this change would break. If a consensus can be reached and more human readability is desired then perhaps this patch can be applied.

      This work is motivated by another patch that I am about to submit that increases the number of statistics MMTk reports (thus increasing problems of wide output)

      Kind regards
      Laurence

      1. makeMMTkStatsHumanReadable.patch
        3 kB
        Laurence Hellyer
      2. statementOfContribution.txt
        0.5 kB
        Laurence Hellyer

        Activity

        Hide
        Andreas Sewe added a comment -

        Although I agree that the current format for MMTk statistics is not very friendly to humans, especially when hardware performance counters or Immix collectors add their totals, there is one thing that makes it IMHO superior to the proposed format: AWK-friendliness. Newlines simply don't make for very good field separators.

        Show
        Andreas Sewe added a comment - Although I agree that the current format for MMTk statistics is not very friendly to humans, especially when hardware performance counters or Immix collectors add their totals, there is one thing that makes it IMHO superior to the proposed format: AWK-friendliness. Newlines simply don't make for very good field separators.
        Hide
        Laurence Hellyer added a comment -

        Andreas makes a good point about the new format not being AWK friendly. I guess a counter-point would be increased Grep friendliness which might allow more generic scripts that could be ported from one experiment to another (there is no guarantee that the Nth column will always be a particular counter, but grep'ing for a Counter name might be safer).

        As an alternative it would be trivial to rework the patch so that a command line argument specified the output format (with the default being the old output format). Having just thought of this option, this would be the one I would vote for...

        What do others think?

        Show
        Laurence Hellyer added a comment - Andreas makes a good point about the new format not being AWK friendly. I guess a counter-point would be increased Grep friendliness which might allow more generic scripts that could be ported from one experiment to another (there is no guarantee that the Nth column will always be a particular counter, but grep'ing for a Counter name might be safer). As an alternative it would be trivial to rework the patch so that a command line argument specified the output format (with the default being the old output format). Having just thought of this option, this would be the one I would vote for... What do others think?
        Hide
        Robin Garner added a comment -

        For machine readable stats there's already the xml option, so I don't think machine readability should be a major consideration here. That being said, the existing format has definitely outgrown its usefulness - it was fine when we had 6 or so counters, but not so much any more.

        I originally added the xml format with the intention that it would become the canonical machine-readable format, and chose xml largely because people around me were moving that way (rather than some deep abiding attachment to the format). So if there's a desire to provide better machine-readable I'd prefer to rework the xml output, and make the human readable form as easy on the eye as possible.

        Show
        Robin Garner added a comment - For machine readable stats there's already the xml option, so I don't think machine readability should be a major consideration here. That being said, the existing format has definitely outgrown its usefulness - it was fine when we had 6 or so counters, but not so much any more. I originally added the xml format with the intention that it would become the canonical machine-readable format, and chose xml largely because people around me were moving that way (rather than some deep abiding attachment to the format). So if there's a desire to provide better machine-readable I'd prefer to rework the xml output, and make the human readable form as easy on the eye as possible.
        Hide
        Steve Blackburn added a comment -

        I agree that the numbers are not very human readable, and could be improved. Andreas' point is key though---machine readability is the primary concern because individual runs are not particularly meaningful on their own, so in general the user will need to be pushing this through a script and aggregating/analyzing the data. However, I agree that when debugging human readability is useful. In such cases I generally resort to 'cut'

        I'm happy to see it improved. Any more thoughts on what the best solution might look like?

        Show
        Steve Blackburn added a comment - I agree that the numbers are not very human readable, and could be improved. Andreas' point is key though---machine readability is the primary concern because individual runs are not particularly meaningful on their own, so in general the user will need to be pushing this through a script and aggregating/analyzing the data. However, I agree that when debugging human readability is useful. In such cases I generally resort to 'cut' I'm happy to see it improved. Any more thoughts on what the best solution might look like?
        Hide
        Andreas Sewe added a comment -

        Laurence raised the point that there is no guarantee that the Nth column will always be a particular counter, but grep'ing for a Counter name might be safer. Getting AWK to sanity-check the current column-oriented format is unfortunately not straight-forward. This definitly should be improved, as you don't want to AWK, Perl, and the like to interpret a string as numerical 0. Such a bug easily messes up your aggregates, but may go undetected for quite some time.

        That being said, I agree with Steve that machine-readability is the primary goal here; you can always pipe the output through a small script to prettify it. It's the other direction that's hard. So what about the following (Tabs as \t)?

        MMTk Statistics:\tGC\t7time.mu\t6872.53\ttime.gc:\t5854.95 [...] L1I_MISSES.gc\t951335255621\t(SCALED)\tTotal time\t12727.48

        As the single statistics line starts with MMTk Statistics (the space is for humans; thus, AWK needs to asked to split on tabs only) it's easy to sanitze the VM's output with plain old grep. You could then process and sanity-check this with awk -F '\t':

        {{$4 != "time.mu"

        { exit 1 }

        { mutime += $5 }

        }}

        Writing a script that turns this into output as proposed by Laurence is easy, too. The only thing that makes this task not completely straight-forward is the presence of the (SCALED) attribute, which occupies its own, out-of-order column in the above example. If, however, all statistics would get a free-text attribute (which may of course be empty, hold a unit like ms, or whatever), we had tab-separated triples all the way.

        Show
        Andreas Sewe added a comment - Laurence raised the point that there is no guarantee that the Nth column will always be a particular counter, but grep'ing for a Counter name might be safer. Getting AWK to sanity-check the current column-oriented format is unfortunately not straight-forward. This definitly should be improved, as you don't want to AWK, Perl, and the like to interpret a string as numerical 0. Such a bug easily messes up your aggregates, but may go undetected for quite some time. That being said, I agree with Steve that machine-readability is the primary goal here; you can always pipe the output through a small script to prettify it. It's the other direction that's hard. So what about the following (Tabs as \t )? MMTk Statistics:\tGC\t7time.mu\t6872.53\ttime.gc:\t5854.95 [...] L1I_MISSES.gc\t951335255621\t(SCALED)\tTotal time\t12727.48 As the single statistics line starts with MMTk Statistics (the space is for humans; thus, AWK needs to asked to split on tabs only) it's easy to sanitze the VM's output with plain old grep . You could then process and sanity-check this with awk -F '\t' : {{$4 != "time.mu" { exit 1 } { mutime += $5 } }} Writing a script that turns this into output as proposed by Laurence is easy, too. The only thing that makes this task not completely straight-forward is the presence of the (SCALED) attribute, which occupies its own, out-of-order column in the above example. If, however, all statistics would get a free-text attribute (which may of course be empty, hold a unit like ms , or whatever), we had tab-separated triples all the way.
        Hide
        Steve Blackburn added a comment -

        I don't understand the need for "(SCALED)". If that adjective is really required, it can be placed in the name of the value.

        Show
        Steve Blackburn added a comment - I don't understand the need for "(SCALED)". If that adjective is really required, it can be placed in the name of the value.
        Hide
        Laurence Hellyer added a comment -

        Hi Steve,

        If you attempt to measure more events than are simultaneously supported by the underlying hardware then the kernel perf_events subsystem will multiplex the events onto the available hardware counters. The results reported by my patch are the raw event value scaled by the time it was permitted to run for. Scaled results cannot directly be compared together (as there is no guarantee that the events were running at the same time and therefore monitoring the same program behaviour). The "(SCALED)" flag is a warning that a particular counter value has been scaled in this way.

        For example, measuring just one event we only need to report:

        L1I_MISSES.mu: 1629940088073

        However, for example on a Core 2 Duo the following three events have been multiplexed and scaled appropriately:

        L1I_MISSES.mu: 1629940088073 (SCALED)
        LLC_MISSES.mu: 1546188225840 (SCALED)
        PERF_COUNT_HW_CACHE_L1D:MISS.mu 19434083 (SCALED)

        It is feasible that some events could by multiplexed at the same time as other's being run 100% of the time therefore the "(SCALED)" flag really needs to be reported on a per counter basis.

        As Daniel commented in RVM-885 some counters can be "CONTENDED" and not run at all. Rather than report a value of 0 for a counter that does not have any results (which I think would be miss leading) I choose to report it as:

        L1I_MISSES.mu: CONTENDED

        I look forward to your thoughts

        Laurence

        Show
        Laurence Hellyer added a comment - Hi Steve, If you attempt to measure more events than are simultaneously supported by the underlying hardware then the kernel perf_events subsystem will multiplex the events onto the available hardware counters. The results reported by my patch are the raw event value scaled by the time it was permitted to run for. Scaled results cannot directly be compared together (as there is no guarantee that the events were running at the same time and therefore monitoring the same program behaviour). The "(SCALED)" flag is a warning that a particular counter value has been scaled in this way. For example, measuring just one event we only need to report: L1I_MISSES.mu: 1629940088073 However, for example on a Core 2 Duo the following three events have been multiplexed and scaled appropriately: L1I_MISSES.mu: 1629940088073 (SCALED) LLC_MISSES.mu: 1546188225840 (SCALED) PERF_COUNT_HW_CACHE_L1D:MISS.mu 19434083 (SCALED) It is feasible that some events could by multiplexed at the same time as other's being run 100% of the time therefore the "(SCALED)" flag really needs to be reported on a per counter basis. As Daniel commented in RVM-885 some counters can be "CONTENDED" and not run at all. Rather than report a value of 0 for a counter that does not have any results (which I think would be miss leading) I choose to report it as: L1I_MISSES.mu: CONTENDED I look forward to your thoughts Laurence
        Hide
        Steve Blackburn added a comment -

        Hi Laurence,

        Yeah, I understand how the perf counters work

        I'm just talking about the formatting. We currently have a descriptive field. I was suggesting that you use that descriptive field to convey this information rather than introduce a new field to the format.

        --Steve

        Show
        Steve Blackburn added a comment - Hi Laurence, Yeah, I understand how the perf counters work I'm just talking about the formatting. We currently have a descriptive field. I was suggesting that you use that descriptive field to convey this information rather than introduce a new field to the format. --Steve
        Hide
        Richard Jones added a comment -

        I agree that machine-readability is the most important factor here. But I'd hope that we can come up with some format that is also easy for people to read. I think Laurence has a good point about the current stats being hard to read sometimes. Maybe they are also slightly tricky for a program to read (e.g. it must read the heading row and then use that to interpret subsequent rows). But as Steve says, cut works as well.

        The problem with Andreas's solution is that it overloads tabs, using them to separate records and fields. I'd argue for a format that distinguished these, e.g. by using =s, commas and tabs.

        Steve asked, why introduce a new field, e.g. ("SCALED"), rather than place it in the name of the value. I'd argue against that as SCALED is just another attribute like the value.

        For example (putting spaces around \t here just to make it a little easier to read),

        GC=7 \t time.mu=6872.53 \t time.gc=5854.95 \t L1I_MISSES.gc=951335255621,SCALED \t Total time=12727.48

        This format is easily parsable, e.g. with 3 lines of perl. It's also amenable to cut (assuming that the order is fixed). It's pretty human readable. although output may flow over several lines on the screen), and names and other attributes are tightly tied together.

        Richard

        Show
        Richard Jones added a comment - I agree that machine-readability is the most important factor here. But I'd hope that we can come up with some format that is also easy for people to read. I think Laurence has a good point about the current stats being hard to read sometimes. Maybe they are also slightly tricky for a program to read (e.g. it must read the heading row and then use that to interpret subsequent rows). But as Steve says, cut works as well. The problem with Andreas's solution is that it overloads tabs, using them to separate records and fields. I'd argue for a format that distinguished these, e.g. by using =s, commas and tabs. Steve asked, why introduce a new field, e.g. ("SCALED"), rather than place it in the name of the value. I'd argue against that as SCALED is just another attribute like the value. For example (putting spaces around \t here just to make it a little easier to read), GC=7 \t time.mu=6872.53 \t time.gc=5854.95 \t L1I_MISSES.gc=951335255621,SCALED \t Total time=12727.48 This format is easily parsable, e.g. with 3 lines of perl. It's also amenable to cut (assuming that the order is fixed). It's pretty human readable. although output may flow over several lines on the screen), and names and other attributes are tightly tied together. Richard
        Hide
        David Grove added a comment -

        bulk defer open issues to 3.1.2

        Show
        David Grove added a comment - bulk defer open issues to 3.1.2
        Hide
        David Grove added a comment -

        Bulk defer to 3.1.3; not essential to address for 3.1.2.

        Show
        David Grove added a comment - Bulk defer to 3.1.3; not essential to address for 3.1.2.
        Hide
        David Grove added a comment -

        bulk defer issues to 3.1.4

        Show
        David Grove added a comment - bulk defer issues to 3.1.4

          People

          • Assignee:
            Unassigned
            Reporter:
            Laurence Hellyer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: