SonarQube
  1. SonarQube
  2. SONAR-3060

Add new CPD algorithm based on suffix tree

    Details

    • Type: Improvement Improvement
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.13
    • Component/s: Duplications
    • Labels:
      None
    • Number of attachments :
      0

      Description

      Current algorithm has complexity O( |f|^2 * max(b[i]) ) , where f - size of file in blocks and b[i] - number of blocks from index with same hash as for block i from file. So in a worst case it might lead to O( f^3 ), e.g. if file contains a lot of identical blocks. In such case Sonar CPD might take a lot of time, whereas PMD CPD not.
      Suffix-tree will have complexity O( k ) , where k - total number of blocks (including from index). Also note that this is space-time tradeoff, i.e. suffix-tree will consume a bit more memory in comparison with previous algorithm, but still far less than PMD.

        Issue Links

          Activity

          Hide
          Evgeny Mandrikov added a comment -

          Algorithm was added in f0d96c1, but not used yet.

          Show
          Evgeny Mandrikov added a comment - Algorithm was added in f0d96c1 , but not used yet.
          Hide
          Evgeny Mandrikov added a comment -

          Now it's used - aef8d4f, but previous one was left in place in module sonar-duplications.
          Integration test added.

          Show
          Evgeny Mandrikov added a comment - Now it's used - aef8d4f , but previous one was left in place in module sonar-duplications. Integration test added.
          Hide
          Freddy Mallet added a comment -

          Manually tested

          Show
          Freddy Mallet added a comment - Manually tested
          Hide
          Evgeny Mandrikov added a comment -

          For the record: real-world example can be found in http://markmail.org/message/w4seaqru6mussdn6 , however should be noted that this example probably auto-generated rather than hand-written.

          Show
          Evgeny Mandrikov added a comment - For the record: real-world example can be found in http://markmail.org/message/w4seaqru6mussdn6 , however should be noted that this example probably auto-generated rather than hand-written.

            People

            • Assignee:
              Evgeny Mandrikov
              Reporter:
              Evgeny Mandrikov
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: