History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: MNG-3379
Type: New Feature New Feature
Status: In Progress In Progress
Priority: Major Major
Assignee: Brett Porter
Reporter: Don Brown
Votes: 64
Watchers: 36
Operations

If you were logged in you would be able to see more operations.
Maven 2

Parallel resolution of artifacts

Created: 27/Jan/08 04:43 AM   Updated: 12/Sep/08 11:07 AM
Component/s: Artifacts and Repositories
Affects Version/s: 2.0.8
Fix Version/s: 2.0.11

Time Tracking:
Not Specified

File Attachments: 1. File parallel-resolution-2.diff (9 kb)
2. Text File parallel-resolution-3.diff (9 kb)
3. Text File parallel-resolution.diff (8 kb)

Issue Links:
Duplicate
 

Patch Submitted: Yes


 Description  « Hide
Artifacts should be resolved in parallel, grouped by group id's to get around the lack of synchronization in the local repository. The patch does the following:
  • Use a ThreadPoolExecutor to parallelize artifact resolution, but takes care not to resolve multiple artifacts from the same group id simultaneously. (requires Java 5)
  • Makes the http wagon the default instead of the poor performing http-client

Disadvantages:

  • Requires Java 5, but the backport jars could be substituted pretty easily
  • Breaks some plugins due to commons-logging being in the Maven uber jar (required by commons-httpclient), notably the apt plugin (maybe more should use the isolatedRealm setting?)
  • Screws up the progress monitor as multiple threads are updating it

Advantages:

  • Much faster when combined with the http wagon (WAGON-98). I was seeing 40% improvement on some test builds.


 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Chris Custine - 28/Jan/08 11:50 AM
I like the way this is going, but if I am not mistaken it appears that the patch ignores certain settings such as -Dmaven.repo.local and mirror settings in ~/.m2/settings.xml.

Don Brown - 28/Jan/08 06:45 PM
How so? The patch simply schedules the artifact resolution differently, but the same resolution code should work.

Don Brown - 28/Jan/08 06:46 PM
Updated patch that fixes a few concurrency issues, adds support for bigger builds, updates all wagon deps to rc1-SNAPSHOT

Don Brown - 29/Jan/08 06:13 AM
Updated patch (version 3) that works with Java 1.4 and eliminates problem with commons-logging (depends on WAGONHTTP-17)

Chris Custine - 29/Jan/08 09:34 AM
Hey Don, you are right, it isn't your patch that broke this. I was testing this patch against 2.0.9-SNAPSHOT from SVN and got your changes intermingled with some recent updates from SVN which broke maven.repo.local setting.

Brett Porter - 30/Jan/08 06:04 AM
Hey Chris - have you created a new issue for the problems you are experiencing?

Brett Porter - 03/Feb/08 04:26 PM
Don, can you clarify the 3rd patch? It seems that you are using the normal wagon-http, not the standalone one from WAGONHTTP-17 - is this because you just substituted it in your local repo, or did you actually get it working with the standard dependency?

Don Brown - 03/Feb/08 04:53 PM
I substituted it in my local repo. In my latest -db release, I changed the version to be more clear.

Brett Porter - 03/Feb/08 05:02 PM
do you have a test project that fails if you use standard wagon-http, but works with your version? The ones I've tested worked without making the substitution and I want to check the shade plugin is working as expected... you mentioned the apt plugin?

Don Brown - 03/Feb/08 05:07 PM
Try struts2-core - https://svn.apache.org/repos/asf/struts/struts2/trunk/core It uses the APT plugin, which depends on commons-logging. Another one I've heard of is the xdoclet plugin.

Brett Porter - 03/Feb/08 06:07 PM
that worked. I'll be integrating this on another branch since some modifications are still needed and I noticed your own branch has moved on already

Brett Porter - 03/Feb/08 06:14 PM
branch is at https://svn.apache.org/repos/asf/maven/sandbox/branches/maven/MNG-3379

Here are the changes I'd like to review:

  • formatting
  • are RuntimeExceptions appropriate? Seems more like they should be normal artifact resolution failures and the list of missing artifacts stuck together
  • may need a different progress monitor since this shows no progress any more (one that can be done in parallel also is great)
  • I'd like to investigate controlling this by settings (pool size)
  • needs to be able to be merged to Maven trunk somehow (and check compatibility with embedder)

Don Brown - 03/Feb/08 09:44 PM
  • are RuntimeExceptions appropriate? Seems more like they should be normal artifact resolution failures and the list of missing artifacts stuck together
    Agreed.
  • may need a different progress monitor since this shows no progress any more (one that can be done in parallel also is great)
    Yeah, this is the last biggish chunk of work still to do. The progress bar just goes crazy
  • I'd like to investigate controlling this by settings (pool size)
    Makes sense, but in my testing, it didn't see to help much after 5, but that could just be due to my particular multi-core box.

Brett Porter - 03/Feb/08 09:58 PM
on the last one - it was more to be able to limit the number if you don't want to soak your bandwidth too

Don Brown - 03/Feb/08 10:13 PM
Actually, Http Client will already limit your connections to two per server, so you shouldn't have to worry. That might be a good one to make configurable as well.

Brett Porter - 26/Feb/08 06:41 PM
might want to adjust the shading mechanism and do that in the distribution rather than the wagonhttp-17 method

John Casey - 06/Mar/08 08:54 AM
do we need to be concerned at all with file-locking in the repository with this? We have an issue open for concurrent writes into the local repo, MNG-2802, but I expect this to be more about concurrent Maven processes running on the same machine, rather than concurrent retrieval of artifacts within one process. This is mainly because the resolution process will probably not cause the type of file overlap that could cause problems, though in the case of metadata I can see how it might...

Don Brown - 06/Mar/08 02:41 PM
No, you don't need to be worried about file-locking. The retrieval algorithm only retrieves artifacts from different groups in parallel, but ones from the same group in serial, which is where you would need file-locking.

Jeremy Hanna - 01/Jun/08 09:46 PM
Are there any plans to patch this into a version of maven 2? I had hoped it would make it into 2.0.9. I was really looking forward to the speed improvements as a result of parallel/asynchronous downloads of dependencies.

Don Brown - 19/Jul/08 09:21 AM
I recreated the MNG-3379 branch but without the wagon changes. I also improved the console monitor so it now properly handles multiple files being retrieved simultaneously. What more needs to be done before the changes are moved into the 2.0.x branch?

Brett Porter - 20/Jul/08 10:19 PM
the branch for this issue looks ok to me - though I see you've made additional changes on a labeled branch as well.

Is the MNG-3379 branch the complete list of changes you want to submit?


Don Brown - 20/Jul/08 10:44 PM
Yes, the changes on the other branch mostly have to do with backporting the fix to the 2.0.9 tag due to Resource not having a proper hashCode implementation back then.

Brett Porter - 20/Jul/08 10:51 PM
got it - thanks

Brian Fox - 21/Jul/08 09:27 PM
This issue passed the core ITs for 2.0.9 (and most for 2.0.10), i would like to see some more ITs testing the parallel downloads specifically and then we could consider it for .11

Brett Porter - 22/Jul/08 12:48 AM
Brian - what kind of ITs do you envisage for this? Something that downloads a large number of artifacts from a file:// repository and check they all arrive safely?

Brian Fox - 23/Jul/08 09:48 PM
Brett, yes something that would exercise this code. I've been running this branch locally for a few days and haven't noticed any issues yet. Bumping it back for consideration in 2.0.11

John Casey - 12/Sep/08 11:07 AM - edited
Hi Don,

(reposting my email to you here, since I never got a reply)

I was wondering if you could help us get this feature ready for introduction into Maven 2.1.0. We're putting together a release plan for it right now at:

http://docs.codehaus.org/display/MAVEN/Maven+2.1.0+Release+Plan

As you can see, things have changed WRT the versioning in Maven's future. The former 2.1 code - on trunk - has been renamed to 3.0-SNAPSHOT since it contains extensive changes, and we've decided to introduce some intermediary releases that will incorporate less risky new features such as parallel downloads that don't fit on the 2.0.x codeline appropriately. We're also going to be putting 2.0.x in end-of-life mode very soon.

I've heard good things about your patch for parallel downloads, but since I'm trying to push to make sure we have adequate documentation and tests for everything new that goes into Maven, we're still coming up short on this feature.

Can you provide us with a small design document just laying out the problem you solved, including scoping (for instance discussion about groupId limitations, and maybe any known issues on CI systems where multiple Maven processes may be sharing a single local repository with large numbers of parallel downloads happening simultaneously) and a short blurb describing your solution.

Also, we need to create some tests that we can use to guard against regressions in the future. Since you know this code the best, it'd be very helpful if you could help us design some test cases that might verify key pieces of the code to make sure nothing is broken (for instance, nothing that might introduce threading problems or file-locking issues).

If you can provide some of this information, it will allow us to put this feature into 2.1.0. Otherwise, I'm not sure whether we have the resource-hours available to grok the code, reconstruct the design decisions you took, and come up with adequate tests.