Maven
  1. Maven
  2. MNG-3379

Parallel resolution of artifacts

    Details

    • Type: New Feature New Feature
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.8
    • Fix Version/s: 2.1.0
    • Labels:
      None
    • Patch Submitted:
      Yes
    • Number of attachments :
      3

      Description

      Artifacts should be resolved in parallel, grouped by group id's to get around the lack of synchronization in the local repository. The patch does the following:

      • Use a ThreadPoolExecutor to parallelize artifact resolution, but takes care not to resolve multiple artifacts from the same group id simultaneously. (requires Java 5)
      • Makes the http wagon the default instead of the poor performing http-client

      Disadvantages:

      • Requires Java 5, but the backport jars could be substituted pretty easily
      • Breaks some plugins due to commons-logging being in the Maven uber jar (required by commons-httpclient), notably the apt plugin (maybe more should use the isolatedRealm setting?)
      • Screws up the progress monitor as multiple threads are updating it

      Advantages:

      • Much faster when combined with the http wagon (WAGON-98). I was seeing 40% improvement on some test builds.
      1. parallel-resolution.diff
        8 kB
        Don Brown
      2. parallel-resolution-2.diff
        9 kB
        Don Brown
      3. parallel-resolution-3.diff
        9 kB
        Don Brown

        Issue Links

          Activity

          Hide
          Chris Custine added a comment -

          I like the way this is going, but if I am not mistaken it appears that the patch ignores certain settings such as -Dmaven.repo.local and mirror settings in ~/.m2/settings.xml.

          Show
          Chris Custine added a comment - I like the way this is going, but if I am not mistaken it appears that the patch ignores certain settings such as -Dmaven.repo.local and mirror settings in ~/.m2/settings.xml.
          Hide
          Don Brown added a comment -

          How so? The patch simply schedules the artifact resolution differently, but the same resolution code should work.

          Show
          Don Brown added a comment - How so? The patch simply schedules the artifact resolution differently, but the same resolution code should work.
          Hide
          Don Brown added a comment -

          Updated patch that fixes a few concurrency issues, adds support for bigger builds, updates all wagon deps to rc1-SNAPSHOT

          Show
          Don Brown added a comment - Updated patch that fixes a few concurrency issues, adds support for bigger builds, updates all wagon deps to rc1-SNAPSHOT
          Hide
          Don Brown added a comment -

          Updated patch (version 3) that works with Java 1.4 and eliminates problem with commons-logging (depends on WAGONHTTP-17)

          Show
          Don Brown added a comment - Updated patch (version 3) that works with Java 1.4 and eliminates problem with commons-logging (depends on WAGONHTTP-17 )
          Hide
          Chris Custine added a comment -

          Hey Don, you are right, it isn't your patch that broke this. I was testing this patch against 2.0.9-SNAPSHOT from SVN and got your changes intermingled with some recent updates from SVN which broke maven.repo.local setting.

          Show
          Chris Custine added a comment - Hey Don, you are right, it isn't your patch that broke this. I was testing this patch against 2.0.9-SNAPSHOT from SVN and got your changes intermingled with some recent updates from SVN which broke maven.repo.local setting.
          Hide
          Brett Porter added a comment -

          Hey Chris - have you created a new issue for the problems you are experiencing?

          Show
          Brett Porter added a comment - Hey Chris - have you created a new issue for the problems you are experiencing?
          Hide
          Brett Porter added a comment -

          Don, can you clarify the 3rd patch? It seems that you are using the normal wagon-http, not the standalone one from WAGONHTTP-17 - is this because you just substituted it in your local repo, or did you actually get it working with the standard dependency?

          Show
          Brett Porter added a comment - Don, can you clarify the 3rd patch? It seems that you are using the normal wagon-http, not the standalone one from WAGONHTTP-17 - is this because you just substituted it in your local repo, or did you actually get it working with the standard dependency?
          Hide
          Don Brown added a comment -

          I substituted it in my local repo. In my latest -db release, I changed the version to be more clear.

          Show
          Don Brown added a comment - I substituted it in my local repo. In my latest -db release, I changed the version to be more clear.
          Hide
          Brett Porter added a comment -

          do you have a test project that fails if you use standard wagon-http, but works with your version? The ones I've tested worked without making the substitution and I want to check the shade plugin is working as expected... you mentioned the apt plugin?

          Show
          Brett Porter added a comment - do you have a test project that fails if you use standard wagon-http, but works with your version? The ones I've tested worked without making the substitution and I want to check the shade plugin is working as expected... you mentioned the apt plugin?
          Hide
          Don Brown added a comment -

          Try struts2-core - https://svn.apache.org/repos/asf/struts/struts2/trunk/core It uses the APT plugin, which depends on commons-logging. Another one I've heard of is the xdoclet plugin.

          Show
          Don Brown added a comment - Try struts2-core - https://svn.apache.org/repos/asf/struts/struts2/trunk/core It uses the APT plugin, which depends on commons-logging. Another one I've heard of is the xdoclet plugin.
          Hide
          Brett Porter added a comment -

          that worked. I'll be integrating this on another branch since some modifications are still needed and I noticed your own branch has moved on already

          Show
          Brett Porter added a comment - that worked. I'll be integrating this on another branch since some modifications are still needed and I noticed your own branch has moved on already
          Hide
          Brett Porter added a comment -

          branch is at https://svn.apache.org/repos/asf/maven/sandbox/branches/maven/MNG-3379

          Here are the changes I'd like to review:

          • formatting
          • are RuntimeExceptions appropriate? Seems more like they should be normal artifact resolution failures and the list of missing artifacts stuck together
          • may need a different progress monitor since this shows no progress any more (one that can be done in parallel also is great)
          • I'd like to investigate controlling this by settings (pool size)
          • needs to be able to be merged to Maven trunk somehow (and check compatibility with embedder)
          Show
          Brett Porter added a comment - branch is at https://svn.apache.org/repos/asf/maven/sandbox/branches/maven/MNG-3379 Here are the changes I'd like to review: formatting are RuntimeExceptions appropriate? Seems more like they should be normal artifact resolution failures and the list of missing artifacts stuck together may need a different progress monitor since this shows no progress any more (one that can be done in parallel also is great) I'd like to investigate controlling this by settings (pool size) needs to be able to be merged to Maven trunk somehow (and check compatibility with embedder)
          Hide
          Don Brown added a comment -
          • are RuntimeExceptions appropriate? Seems more like they should be normal artifact resolution failures and the list of missing artifacts stuck together
            Agreed.
          • may need a different progress monitor since this shows no progress any more (one that can be done in parallel also is great)
            Yeah, this is the last biggish chunk of work still to do. The progress bar just goes crazy
          • I'd like to investigate controlling this by settings (pool size)
            Makes sense, but in my testing, it didn't see to help much after 5, but that could just be due to my particular multi-core box.
          Show
          Don Brown added a comment - are RuntimeExceptions appropriate? Seems more like they should be normal artifact resolution failures and the list of missing artifacts stuck together Agreed. may need a different progress monitor since this shows no progress any more (one that can be done in parallel also is great) Yeah, this is the last biggish chunk of work still to do. The progress bar just goes crazy I'd like to investigate controlling this by settings (pool size) Makes sense, but in my testing, it didn't see to help much after 5, but that could just be due to my particular multi-core box.
          Hide
          Brett Porter added a comment -

          on the last one - it was more to be able to limit the number if you don't want to soak your bandwidth too

          Show
          Brett Porter added a comment - on the last one - it was more to be able to limit the number if you don't want to soak your bandwidth too
          Hide
          Don Brown added a comment -

          Actually, Http Client will already limit your connections to two per server, so you shouldn't have to worry. That might be a good one to make configurable as well.

          Show
          Don Brown added a comment - Actually, Http Client will already limit your connections to two per server, so you shouldn't have to worry. That might be a good one to make configurable as well.
          Hide
          Brett Porter added a comment -

          might want to adjust the shading mechanism and do that in the distribution rather than the wagonhttp-17 method

          Show
          Brett Porter added a comment - might want to adjust the shading mechanism and do that in the distribution rather than the wagonhttp-17 method
          Hide
          John Casey added a comment -

          do we need to be concerned at all with file-locking in the repository with this? We have an issue open for concurrent writes into the local repo, MNG-2802, but I expect this to be more about concurrent Maven processes running on the same machine, rather than concurrent retrieval of artifacts within one process. This is mainly because the resolution process will probably not cause the type of file overlap that could cause problems, though in the case of metadata I can see how it might...

          Show
          John Casey added a comment - do we need to be concerned at all with file-locking in the repository with this? We have an issue open for concurrent writes into the local repo, MNG-2802 , but I expect this to be more about concurrent Maven processes running on the same machine, rather than concurrent retrieval of artifacts within one process. This is mainly because the resolution process will probably not cause the type of file overlap that could cause problems, though in the case of metadata I can see how it might...
          Hide
          Don Brown added a comment -

          No, you don't need to be worried about file-locking. The retrieval algorithm only retrieves artifacts from different groups in parallel, but ones from the same group in serial, which is where you would need file-locking.

          Show
          Don Brown added a comment - No, you don't need to be worried about file-locking. The retrieval algorithm only retrieves artifacts from different groups in parallel, but ones from the same group in serial, which is where you would need file-locking.
          Hide
          Jeremy Hanna added a comment -

          Are there any plans to patch this into a version of maven 2? I had hoped it would make it into 2.0.9. I was really looking forward to the speed improvements as a result of parallel/asynchronous downloads of dependencies.

          Show
          Jeremy Hanna added a comment - Are there any plans to patch this into a version of maven 2? I had hoped it would make it into 2.0.9. I was really looking forward to the speed improvements as a result of parallel/asynchronous downloads of dependencies.
          Hide
          Don Brown added a comment -

          I recreated the MNG-3379 branch but without the wagon changes. I also improved the console monitor so it now properly handles multiple files being retrieved simultaneously. What more needs to be done before the changes are moved into the 2.0.x branch?

          Show
          Don Brown added a comment - I recreated the MNG-3379 branch but without the wagon changes. I also improved the console monitor so it now properly handles multiple files being retrieved simultaneously. What more needs to be done before the changes are moved into the 2.0.x branch?
          Hide
          Brett Porter added a comment -

          the branch for this issue looks ok to me - though I see you've made additional changes on a labeled branch as well.

          Is the MNG-3379 branch the complete list of changes you want to submit?

          Show
          Brett Porter added a comment - the branch for this issue looks ok to me - though I see you've made additional changes on a labeled branch as well. Is the MNG-3379 branch the complete list of changes you want to submit?
          Hide
          Don Brown added a comment -

          Yes, the changes on the other branch mostly have to do with backporting the fix to the 2.0.9 tag due to Resource not having a proper hashCode implementation back then.

          Show
          Don Brown added a comment - Yes, the changes on the other branch mostly have to do with backporting the fix to the 2.0.9 tag due to Resource not having a proper hashCode implementation back then.
          Hide
          Brett Porter added a comment -

          got it - thanks

          Show
          Brett Porter added a comment - got it - thanks
          Hide
          brianfox brianfox added a comment -

          This issue passed the core ITs for 2.0.9 (and most for 2.0.10), i would like to see some more ITs testing the parallel downloads specifically and then we could consider it for .11

          Show
          brianfox brianfox added a comment - This issue passed the core ITs for 2.0.9 (and most for 2.0.10), i would like to see some more ITs testing the parallel downloads specifically and then we could consider it for .11
          Hide
          Brett Porter added a comment -

          Brian - what kind of ITs do you envisage for this? Something that downloads a large number of artifacts from a file:// repository and check they all arrive safely?

          Show
          Brett Porter added a comment - Brian - what kind of ITs do you envisage for this? Something that downloads a large number of artifacts from a file:// repository and check they all arrive safely?
          Hide
          brianfox brianfox added a comment -

          Brett, yes something that would exercise this code. I've been running this branch locally for a few days and haven't noticed any issues yet. Bumping it back for consideration in 2.0.11

          Show
          brianfox brianfox added a comment - Brett, yes something that would exercise this code. I've been running this branch locally for a few days and haven't noticed any issues yet. Bumping it back for consideration in 2.0.11
          Hide
          John Casey added a comment - - edited

          Hi Don,

          (reposting my email to you here, since I never got a reply)

          I was wondering if you could help us get this feature ready for introduction into Maven 2.1.0. We're putting together a release plan for it right now at:

          http://docs.codehaus.org/display/MAVEN/Maven+2.1.0+Release+Plan

          As you can see, things have changed WRT the versioning in Maven's future. The former 2.1 code - on trunk - has been renamed to 3.0-SNAPSHOT since it contains extensive changes, and we've decided to introduce some intermediary releases that will incorporate less risky new features such as parallel downloads that don't fit on the 2.0.x codeline appropriately. We're also going to be putting 2.0.x in end-of-life mode very soon.

          I've heard good things about your patch for parallel downloads, but since I'm trying to push to make sure we have adequate documentation and tests for everything new that goes into Maven, we're still coming up short on this feature.

          Can you provide us with a small design document just laying out the problem you solved, including scoping (for instance discussion about groupId limitations, and maybe any known issues on CI systems where multiple Maven processes may be sharing a single local repository with large numbers of parallel downloads happening simultaneously) and a short blurb describing your solution.

          Also, we need to create some tests that we can use to guard against regressions in the future. Since you know this code the best, it'd be very helpful if you could help us design some test cases that might verify key pieces of the code to make sure nothing is broken (for instance, nothing that might introduce threading problems or file-locking issues).

          If you can provide some of this information, it will allow us to put this feature into 2.1.0. Otherwise, I'm not sure whether we have the resource-hours available to grok the code, reconstruct the design decisions you took, and come up with adequate tests.

          Show
          John Casey added a comment - - edited Hi Don, (reposting my email to you here, since I never got a reply) I was wondering if you could help us get this feature ready for introduction into Maven 2.1.0. We're putting together a release plan for it right now at: http://docs.codehaus.org/display/MAVEN/Maven+2.1.0+Release+Plan As you can see, things have changed WRT the versioning in Maven's future. The former 2.1 code - on trunk - has been renamed to 3.0-SNAPSHOT since it contains extensive changes, and we've decided to introduce some intermediary releases that will incorporate less risky new features such as parallel downloads that don't fit on the 2.0.x codeline appropriately. We're also going to be putting 2.0.x in end-of-life mode very soon. I've heard good things about your patch for parallel downloads, but since I'm trying to push to make sure we have adequate documentation and tests for everything new that goes into Maven, we're still coming up short on this feature. Can you provide us with a small design document just laying out the problem you solved, including scoping (for instance discussion about groupId limitations, and maybe any known issues on CI systems where multiple Maven processes may be sharing a single local repository with large numbers of parallel downloads happening simultaneously) and a short blurb describing your solution. Also, we need to create some tests that we can use to guard against regressions in the future. Since you know this code the best, it'd be very helpful if you could help us design some test cases that might verify key pieces of the code to make sure nothing is broken (for instance, nothing that might introduce threading problems or file-locking issues). If you can provide some of this information, it will allow us to put this feature into 2.1.0. Otherwise, I'm not sure whether we have the resource-hours available to grok the code, reconstruct the design decisions you took, and come up with adequate tests.
          Hide
          John Casey added a comment -

          Starting consolidation of all 2.1.0-M* versions into 2.1.0-M1, which I'll then rename into 2.1.0

          Show
          John Casey added a comment - Starting consolidation of all 2.1.0-M* versions into 2.1.0-M1, which I'll then rename into 2.1.0
          Hide
          John Casey added a comment -

          When we put this in, let's make sure we've documented the opt-out setting prominently for users.

          Show
          John Casey added a comment - When we put this in, let's make sure we've documented the opt-out setting prominently for users.
          Hide
          Brett Porter added a comment -

          even better, configure number of parallel threads, just set it to 1 to turn it off?

          Show
          Brett Porter added a comment - even better, configure number of parallel threads, just set it to 1 to turn it off?
          Hide
          Benjamin Bentmann added a comment -

          +1, sounds like a natural config to me.

          Show
          Benjamin Bentmann added a comment - +1, sounds like a natural config to me.
          Hide
          Brett Porter added a comment -

          initial commit done, now adding configuration

          Show
          Brett Porter added a comment - initial commit done, now adding configuration
          Hide
          Paul Benedict added a comment -

          Way to go, Brett!

          Show
          Paul Benedict added a comment - Way to go, Brett!
          Hide
          John Casey added a comment -

          We need to follow up on the ConcurrentModificationException that's happening intermittently under stress. The discussion is here:

          http://www.nabble.com/Synchronization-issue-with-parallel-downloads-td22136382.html

          Show
          John Casey added a comment - We need to follow up on the ConcurrentModificationException that's happening intermittently under stress. The discussion is here: http://www.nabble.com/Synchronization-issue-with-parallel-downloads-td22136382.html
          Hide
          Brett Porter added a comment -

          I was able to reproduce this once, I'm looking into ways to test it more reliably, and also checking all the code that gets touched for sync issues.

          Show
          Brett Porter added a comment - I was able to reproduce this once, I'm looking into ways to test it more reliably, and also checking all the code that gets touched for sync issues.
          Hide
          Brett Porter added a comment -

          reviewing coverage details for artifact resolution, I've narrowed it down to just DefaultWagonManager that is not thread safe (but is expected to be)

          Show
          Brett Porter added a comment - reviewing coverage details for artifact resolution, I've narrowed it down to just DefaultWagonManager that is not thread safe (but is expected to be)
          Hide
          Brett Porter added a comment -

          the only potential problem I could find under current usage is the one triggered here, which is a bug in Plexus but can be externally synchronized. This wouldn't have tripped Don up as his branch was taken before the additional component configuration step was taken (for setting the HTTP headers).

          I'm working on a test case in Maven for this and will then synchronize it properly.

          Other potential problem spots: a number of configuration maps in DefaultWagonManager that are only modified on initialization (if they were modified during a multithreaded scenario there'd be a problem). Also, the repository list passed in to the resolution must not be modifiable as it isn't cloned.

          Show
          Brett Porter added a comment - the only potential problem I could find under current usage is the one triggered here, which is a bug in Plexus but can be externally synchronized. This wouldn't have tripped Don up as his branch was taken before the additional component configuration step was taken (for setting the HTTP headers). I'm working on a test case in Maven for this and will then synchronize it properly. Other potential problem spots: a number of configuration maps in DefaultWagonManager that are only modified on initialization (if they were modified during a multithreaded scenario there'd be a problem). Also, the repository list passed in to the resolution must not be modifiable as it isn't cloned.
          Hide
          Brett Porter added a comment -

          sync issue fixed. Added a test case that can reproduce in most scenarios though it does rely on timing as Plexus is not testable in that fashion at the moment

          Show
          Brett Porter added a comment - sync issue fixed. Added a test case that can reproduce in most scenarios though it does rely on timing as Plexus is not testable in that fashion at the moment
          Hide
          Brett Porter added a comment -

          Benjamin has identified an issue where exceptions in the resolution can cause the build to hang instead of fail correctly.

          I'll work on a test case and fix for this tomorrow.

          Show
          Brett Porter added a comment - Benjamin has identified an issue where exceptions in the resolution can cause the build to hang instead of fail correctly. I'll work on a test case and fix for this tomorrow.

            People

            • Assignee:
              Brett Porter
              Reporter:
              Don Brown
            • Votes:
              71 Vote for this issue
              Watchers:
              38 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: