Maven 1.x LinkCheck Plugin
  1. Maven 1.x LinkCheck Plugin
  2. MPLINKCHECK-23

Improve linkcheck performance (2x+) getting rid of jtidy dependency via regexps

    Details

    • Type: Improvement Improvement
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.4
    • Fix Version/s: 1.4
    • Labels:
      None
    • Number of attachments :
      1

      Description

      At the moment, the linkcheck plugin uses jtidy and xpath for retreiving all links. IMHO regexps would work much faster/better than jtidy-xpath combination.

      The following regexp would be a replacement for the xpath expressions:

      <(?>link|a|img|script)[^>]?(?>href|src)\s?=\s*?[\"'](.?)[\"'][^>]?

      All tests pass with this regexp and in project ws-jaxme I am getting these results for maven-linkcheck-plugin:clearcache maven-linkcheck-plugin:report-real:

      with jtidy/xpath: Total time: 2 minutes 43 seconds
      with regexps: Total time: 1 minutes 10 seconds

      I am sure some regexp guru can improve the performance of this.

      I have a question, though. Are mailto links supposed to count as checkable? IMO no.

      PD: Also, IMO the createDocument method from LinkCheck should be on a try finally block.

      1. linkcheck.patch
        17 kB
        Ignacio G. Mac Dowell

        Issue Links

          Activity

          Hide
          Carlos Sanchez added a comment -

          Looks good I'll take a closer look when I have more time

          Show
          Carlos Sanchez added a comment - Looks good I'll take a closer look when I have more time
          Hide
          Carlos Sanchez added a comment -

          Created a MAVEN_LINKCHECK_1_4 branch for this issue

          Show
          Carlos Sanchez added a comment - Created a MAVEN_LINKCHECK_1_4 branch for this issue
          Hide
          Arnaud Heritier added a comment -

          I'm convinced.
          I tested your patch with the new plugin where I can view all counts.
          I compared results and it's good. You find more links and the parsing is faster.
          With the maven core doc I obtain :
          3 minutes 29 seconds without your patch
          2 minutes 5 seconds with your patch
          Furthermore it fixes MPLINKCHECK-20

          Show
          Arnaud Heritier added a comment - I'm convinced. I tested your patch with the new plugin where I can view all counts. I compared results and it's good. You find more links and the parsing is faster. With the maven core doc I obtain : 3 minutes 29 seconds without your patch 2 minutes 5 seconds with your patch Furthermore it fixes MPLINKCHECK-20
          Hide
          Arnaud Heritier added a comment -

          Applied. Thanks a lot.

          Show
          Arnaud Heritier added a comment - Applied. Thanks a lot.

            People

            • Assignee:
              Arnaud Heritier
              Reporter:
              Ignacio G. Mac Dowell
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: