Maven 1.x LinkCheck Plugin

Improve linkcheck performance (2x+) getting rid of jtidy dependency via regexps

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.3.4
  • Fix Version/s: 1.4
  • Component/s: None
  • Labels:
    None
  • Number of attachments :
    1

Description

At the moment, the linkcheck plugin uses jtidy and xpath for retreiving all links. IMHO regexps would work much faster/better than jtidy-xpath combination.

The following regexp would be a replacement for the xpath expressions:

<(?>link|a|img|script)[^>]?(?>href|src)\s?=\s*?[\"'](.*?)[\"'][^>]*?

All tests pass with this regexp and in project ws-jaxme I am getting these results for maven-linkcheck-plugin:clearcache maven-linkcheck-plugin:report-real:

with jtidy/xpath: Total time: 2 minutes 43 seconds
with regexps: Total time: 1 minutes 10 seconds

I am sure some regexp guru can improve the performance of this.

I have a question, though. Are mailto links supposed to count as checkable? IMO no.

PD: Also, IMO the createDocument method from LinkCheck should be on a try finally block.

Issue Links

Activity

Hide
Carlos Sanchez added a comment -

Looks good I'll take a closer look when I have more time

Show
Carlos Sanchez added a comment - Looks good I'll take a closer look when I have more time
Hide
Carlos Sanchez added a comment -

Created a MAVEN_LINKCHECK_1_4 branch for this issue

Show
Carlos Sanchez added a comment - Created a MAVEN_LINKCHECK_1_4 branch for this issue
Hide
Arnaud Heritier added a comment -

I'm convinced.
I tested your patch with the new plugin where I can view all counts.
I compared results and it's good. You find more links and the parsing is faster.
With the maven core doc I obtain :
3 minutes 29 seconds without your patch
2 minutes 5 seconds with your patch
Furthermore it fixes MPLINKCHECK-20

Show
Arnaud Heritier added a comment - I'm convinced. I tested your patch with the new plugin where I can view all counts. I compared results and it's good. You find more links and the parsing is faster. With the maven core doc I obtain : 3 minutes 29 seconds without your patch 2 minutes 5 seconds with your patch Furthermore it fixes MPLINKCHECK-20
Hide
Arnaud Heritier added a comment -

Applied. Thanks a lot.

Show
Arnaud Heritier added a comment - Applied. Thanks a lot.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: