History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: MPLINKCHECK-23
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Arnaud Heritier
Reporter: Ignacio G. Mac Dowell
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Maven 1.x LinkCheck Plugin

Improve linkcheck performance (2x+) getting rid of jtidy dependency via regexps

Created: 30/Aug/05 07:32 AM   Updated: 14/Jan/06 06:39 PM
Component/s: None
Affects Version/s: 1.3.4
Fix Version/s: 1.4

Time Tracking:
Not Specified

File Attachments: 1. Text File linkcheck.patch (17 kb)

Issue Links:
Related
 


 Description  « Hide
At the moment, the linkcheck plugin uses jtidy and xpath for retreiving all links. IMHO regexps would work much faster/better than jtidy-xpath combination.

The following regexp would be a replacement for the xpath expressions:

<(?>link|a|img|script)[^>]*?(?>href|src)\s*?=\s*?[\"'](.?)[\"'][^>]?

All tests pass with this regexp and in project ws-jaxme I am getting these results for maven-linkcheck-plugin:clearcache maven-linkcheck-plugin:report-real:

with jtidy/xpath: Total time: 2 minutes 43 seconds
with regexps: Total time: 1 minutes 10 seconds

I am sure some regexp guru can improve the performance of this.

I have a question, though. Are mailto links supposed to count as checkable? IMO no.

PD: Also, IMO the createDocument method from LinkCheck should be on a try finally block.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Carlos Sanchez - 30/Aug/05 12:11 PM
Looks good I'll take a closer look when I have more time

Carlos Sanchez - 27/Oct/05 05:33 PM
Created a MAVEN_LINKCHECK_1_4 branch for this issue

Arnaud Heritier - 14/Jan/06 06:38 PM
I'm convinced.
I tested your patch with the new plugin where I can view all counts.
I compared results and it's good. You find more links and the parsing is faster.
With the maven core doc I obtain :
3 minutes 29 seconds without your patch
2 minutes 5 seconds with your patch
Furthermore it fixes MPLINKCHECK-20

Arnaud Heritier - 14/Jan/06 06:39 PM
Applied. Thanks a lot.