Maven
  1. Maven
  2. MNG-3004

Allow build lifecycle to execute projects in parallel

    Details

    • Complexity:
      Intermediate
    • Patch Submitted:
      Yes
    • Number of attachments :
      6

      Description

      One of the great advantages with maven over scripted build environments is that it can calculate the dependencies of the build, and it could execute items that are independent of each other in parallel.

      Unfortunately it currently doesn't do this, which would be a big win over tools such as 'ant'. It also means that multicore machines have lots of idle capacity when running a serial build that could be utilised.

      I had a quick shot at seeing what might be required. Bear in mind this is the first time I have looked at maven internally, and I was just trying to feel my way around and build a POC. I got some of the way there, but my build threads don't seem to have the correct classpath - I think this is something to do with plexus / classworlds - but I don't know enough.

      It'd be great to get this feature in a future version, or a way of running my hack (figuring out why in a thread has not the plexus stuff) in the interim.

      1. MNG-3004.increased-testability.patch
        24 kB
        Kristian Rosenvold
      2. mng3004.patch
        7 kB
        Kristian Rosenvold
      3. MNG-3004-Resurrected-StringSearchModelInterpolatorTest.patch
        30 kB
        Kristian Rosenvold
      4. MNG3004-SSMI.patch
        8 kB
        Kristian Rosenvold
      5. mng3004v2_rev2.patch
        423 kB
        Kristian Rosenvold
      6. parallel-builds.patch
        39 kB
        Nigel Magnay

        Issue Links

          Activity

          Hide
          THURNER rupert added a comment -

          parallel compiling of files is the main intention of this feature? or does it go further and also would start tests which can be already run?

          Show
          THURNER rupert added a comment - parallel compiling of files is the main intention of this feature? or does it go further and also would start tests which can be already run?
          Hide
          Nigel Magnay added a comment -

          My original intent was so that all activities could run in parallel, to the granularity of a particular project.

          I.E if projects A B and C all depend on project X (and nothing else), then after 'mvn install' finishes on 'X', 'mvn install' can be executed, in parallel, on each of A, B and C.

          I stopped trying to get it to work when I realised that the biggest problem is that the local repository isn't safe to be accessed from multiple threads (or even multiple processes for that matter). I think that's a known issue too - I'm guessing that the ability to somehow lock/unlock repository objects (or perhaps use some kind of smarter local repository) is a dependency to getting this to work.

          Show
          Nigel Magnay added a comment - My original intent was so that all activities could run in parallel, to the granularity of a particular project. I.E if projects A B and C all depend on project X (and nothing else), then after 'mvn install' finishes on 'X', 'mvn install' can be executed, in parallel, on each of A, B and C. I stopped trying to get it to work when I realised that the biggest problem is that the local repository isn't safe to be accessed from multiple threads (or even multiple processes for that matter). I think that's a known issue too - I'm guessing that the ability to somehow lock/unlock repository objects (or perhaps use some kind of smarter local repository) is a dependency to getting this to work.
          Hide
          Nigel Magnay added a comment -
          Show
          Nigel Magnay added a comment - See also http://docs.codehaus.org/display/MAVEN/Local+repository+separation (is this being done?)
          Hide
          John Casey added a comment -

          We don't have code in place to do file locking in the local repository, nor do we have a design for this...I can't see how we'll get this done ahead of 2.2 at this point.

          Show
          John Casey added a comment - We don't have code in place to do file locking in the local repository, nor do we have a design for this...I can't see how we'll get this done ahead of 2.2 at this point.
          Hide
          Pi Song added a comment -

          Given an assumption that most artifacts aren't changed very often, can't we just say "run in parallel if all the required artifacts are in place?"
          Or even can be better if say "run in parallel if two (or more) projects don't share artifacts that have to be downloaded in common"

          Show
          Pi Song added a comment - Given an assumption that most artifacts aren't changed very often, can't we just say "run in parallel if all the required artifacts are in place?" Or even can be better if say "run in parallel if two (or more) projects don't share artifacts that have to be downloaded in common"
          Hide
          Hans-Peter Störr added a comment -

          Nigel: could you please explain the need for locking if you are doing this? I'd think that each of the builds of A, B and C writes to separate directories - so there is no need for locking. They will want to access directories of their dependencies, but of course those would have been built before. Where are the possible conflicts?

          Show
          Hans-Peter Störr added a comment - Nigel: could you please explain the need for locking if you are doing this? I'd think that each of the builds of A, B and C writes to separate directories - so there is no need for locking. They will want to access directories of their dependencies, but of course those would have been built before. Where are the possible conflicts?
          Hide
          Nigel Magnay added a comment -

          If A, B or C all rely on a downloadable artifact X (or, X-SNAPSHOT), then building them in parallel may result in a parallel update of X.

          You need to be able to fix it in place for the duration of your build.

          Show
          Nigel Magnay added a comment - If A, B or C all rely on a downloadable artifact X (or, X-SNAPSHOT), then building them in parallel may result in a parallel update of X. You need to be able to fix it in place for the duration of your build.
          Hide
          Hans-Peter Störr added a comment - - edited

          Good point. But I am not sure that locking is a good way to do this - locking has its dangers. Perhaps it is possible to make parallel updates harmless. Such as downoading the files under a different name and renaming it to the right name only afterwards.

          I think however that parallel builds can be done even now without locking / repository separation by executing a two phase strategy:
          1. Update and download all needed dependencies and plugins. (Singlethreaded, although you might do parallel downloads.)
          2. Execute a parallel build without any updates / downloads
          Perhaps a singlethreaded mvn validate and then a parallel build with maven in offline mode would do this? Maybe you could do a restricted version of parallel builds that only work in offline mode for now, and do the full implementation later when locking or something is done.

          I've heard hudson already does parallel builds with maven. Anyone knows how they handle this problem?

          Show
          Hans-Peter Störr added a comment - - edited Good point. But I am not sure that locking is a good way to do this - locking has its dangers. Perhaps it is possible to make parallel updates harmless. Such as downoading the files under a different name and renaming it to the right name only afterwards. I think however that parallel builds can be done even now without locking / repository separation by executing a two phase strategy: 1. Update and download all needed dependencies and plugins. (Singlethreaded, although you might do parallel downloads.) 2. Execute a parallel build without any updates / downloads Perhaps a singlethreaded mvn validate and then a parallel build with maven in offline mode would do this? Maybe you could do a restricted version of parallel builds that only work in offline mode for now, and do the full implementation later when locking or something is done. I've heard hudson already does parallel builds with maven. Anyone knows how they handle this problem?
          Hide
          Nigel Magnay added a comment -

          > I've heard hudson already does parallel builds with maven. Anyone knows how they handle this problem?

          It doesn't.

          Hudson initially only built in parallel with the native m2 support; compiling serially (like the commandline) was only added later when it was clear that for many projects, parallel compilation breaks, basically, all the time.

          What's even worse (from an m2 perspective) is that, because the local repository is not thread- or process- safe, there is a "Use Private Maven Repository" option so that each project has its own, private local repo. Fixes the problem, but this means for us we've something approaching 200Gb of duplicated jar files sitting around in various directories. Yuck.

          This is known about - see http://docs.codehaus.org/display/MAVEN/Local+repository+separation .

          The simple fact of the matter is that the local repository ought to be threadsafe (either inherently by design, or through some kind of use of locking) but it isn't. Hacks that don't address that fundamental issue aren't really fixing the problem, IMO.

          Show
          Nigel Magnay added a comment - > I've heard hudson already does parallel builds with maven. Anyone knows how they handle this problem? It doesn't. Hudson initially only built in parallel with the native m2 support; compiling serially (like the commandline) was only added later when it was clear that for many projects, parallel compilation breaks, basically, all the time. What's even worse (from an m2 perspective) is that, because the local repository is not thread- or process- safe, there is a "Use Private Maven Repository" option so that each project has its own, private local repo. Fixes the problem, but this means for us we've something approaching 200Gb of duplicated jar files sitting around in various directories. Yuck. This is known about - see http://docs.codehaus.org/display/MAVEN/Local+repository+separation . The simple fact of the matter is that the local repository ought to be threadsafe (either inherently by design, or through some kind of use of locking) but it isn't. Hacks that don't address that fundamental issue aren't really fixing the problem, IMO.
          Hide
          Wendy Smoak added a comment -

          Does the work on parallel artifact resolution in a single build (MNG-3379) help here?

          Show
          Wendy Smoak added a comment - Does the work on parallel artifact resolution in a single build ( MNG-3379 ) help here?
          Hide
          Hans-Peter Störr added a comment -

          How about implementing a restricted version of parallel builds that only works in offline mode for now? (See my suggestion above.)

          I might take a while and more than one release to get something like this right, so it might be better to do at least something now, without waiting for other stuff to happen. And it would be usable right now. If the locking / repository separation / whatever is done later, one can just remove the restriction and - boom - you are there.

          Show
          Hans-Peter Störr added a comment - How about implementing a restricted version of parallel builds that only works in offline mode for now? (See my suggestion above.) I might take a while and more than one release to get something like this right, so it might be better to do at least something now, without waiting for other stuff to happen. And it would be usable right now. If the locking / repository separation / whatever is done later, one can just remove the restriction and - boom - you are there.
          Hide
          Nicolas Frenay added a comment -

          I agree with Hans-Peter Störr idea.

          If someone is able to attack this problem now, it could be done in a way that it's only available if you're offline. When the thread-safe issue is fixed, this new feature will be ready to go.

          It also gives extra-time to test performance with parallelization, as it will probably require some "tweaking". Configurable maximum number of threads comes to my mind.

          Show
          Nicolas Frenay added a comment - I agree with Hans-Peter Störr idea. If someone is able to attack this problem now, it could be done in a way that it's only available if you're offline. When the thread-safe issue is fixed, this new feature will be ready to go. It also gives extra-time to test performance with parallelization, as it will probably require some "tweaking". Configurable maximum number of threads comes to my mind.
          Hide
          Kristian Rosenvold added a comment -

          The attached patch fixes a concurrency issue when running with the maven 3.0 MNG3004 branch. The main issue is that parallel threads will modify the protected/private status of the fields in the class, and concurrent threads will create problems for each other.

          If I can find out how to make a failing test I will do this too.

          Show
          Kristian Rosenvold added a comment - The attached patch fixes a concurrency issue when running with the maven 3.0 MNG3004 branch. The main issue is that parallel threads will modify the protected/private status of the fields in the class, and concurrent threads will create problems for each other. If I can find out how to make a failing test I will do this too.
          Hide
          Kristian Rosenvold added a comment - - edited

          The enclosed patch contains the StringSearchModelInterpolatorTest and its base class I ressurected from the maven 2.2.X build. The test also contains an upgraded concurrency test that fails most of the time unless the patch is applied.

          There are a large number of tests in the base class AbstractModelInterpolatorTest that failed. I have commented these tests out, since I have no idea if these should work on the 3.x line.

          It would be appreciated if someone took a quick look at the commented-out tests and made a quick assesment if these are still relevant and how they can be made to pass again.

          I suspect that some of them fail because I added new File(".") as a don't-know value to the interpolate method

          If MNG-3004-Resurrected-StringSearchModelInterpolatorTest.patch is applied before MNG3004-SSMI.patch, you can have a failing test

          Show
          Kristian Rosenvold added a comment - - edited The enclosed patch contains the StringSearchModelInterpolatorTest and its base class I ressurected from the maven 2.2.X build. The test also contains an upgraded concurrency test that fails most of the time unless the patch is applied. There are a large number of tests in the base class AbstractModelInterpolatorTest that failed. I have commented these tests out, since I have no idea if these should work on the 3.x line. It would be appreciated if someone took a quick look at the commented-out tests and made a quick assesment if these are still relevant and how they can be made to pass again. I suspect that some of them fail because I added new File(".") as a don't-know value to the interpolate method If MNG-3004 -Resurrected-StringSearchModelInterpolatorTest.patch is applied before MNG3004-SSMI.patch, you can have a failing test
          Hide
          Kristian Rosenvold added a comment -

          After reviewing Dan Fabulich's concurrency update to M3, I extracted some of the functionality into a separate class and wrote a fairly extensive unit test for that class. this patch applies to the MNG-3004 branch.

          Show
          Kristian Rosenvold added a comment - After reviewing Dan Fabulich's concurrency update to M3, I extracted some of the functionality into a separate class and wrote a fairly extensive unit test for that class. this patch applies to the MNG-3004 branch.
          Hide
          Kristian Rosenvold added a comment -

          Please accept this patch, that allows "weave" style execution of maven reactor builds.

          This patch is the result of a long community discussion and serves as a fully functional patch that is targeted at a specific subset of builds - typically enterprisey reactor builds.

          Show
          Kristian Rosenvold added a comment - Please accept this patch, that allows "weave" style execution of maven reactor builds. This patch is the result of a long community discussion and serves as a fully functional patch that is targeted at a specific subset of builds - typically enterprisey reactor builds.
          Hide
          Kristian Rosenvold added a comment -

          The enclosed mng39004v2_rev2.patch updates javadoc and removes a little cruft that had managed to creep into the patch.

          Show
          Kristian Rosenvold added a comment - The enclosed mng39004v2_rev2.patch updates javadoc and removes a little cruft that had managed to creep into the patch.
          Hide
          Brett Porter added a comment -

          this is a patch against trunk.

          Show
          Brett Porter added a comment - this is a patch against trunk.
          Hide
          Arnaud Heritier added a comment -

          Nothing new since december ?

          Show
          Arnaud Heritier added a comment - Nothing new since december ?
          Hide
          Kristian Rosenvold added a comment -

          r931884:

          MNG-3004 Concurrent building of multimodule projects

          This commit consists of a modularization of the DefaultLifecycleExecutor into smaller
          components. This is primarily done with the intention of extracting execution strategy
          somewhat from the rest of the logic.

          It also contains three differenent execution strategies:
          A) Linear ("Classic")
          B) Parallel (Based on Dan's original implemementation)
          C) Weave (Still experimental)

          The command line -T option activates parallel running, and the argument
          is in the form -T <[nThreads, a float][C][W]|auto>

          Notes:

          • The output demultiplexer is disabled in this commit, and output will appear intermingled.
          • The modello plugin has known problems
          • Surefire must be run in forkmode at least once.
          Show
          Kristian Rosenvold added a comment - r931884: MNG-3004 Concurrent building of multimodule projects This commit consists of a modularization of the DefaultLifecycleExecutor into smaller components. This is primarily done with the intention of extracting execution strategy somewhat from the rest of the logic. It also contains three differenent execution strategies: A) Linear ("Classic") B) Parallel (Based on Dan's original implemementation) C) Weave (Still experimental) The command line -T option activates parallel running, and the argument is in the form -T < [nThreads, a float] [C] [W] |auto> Notes: The output demultiplexer is disabled in this commit, and output will appear intermingled. The modello plugin has known problems Surefire must be run in forkmode at least once.
          Hide
          Paul Benedict added a comment -

          With concurrency, is the "downloading dependency" progress still readable? If X downloads are occurring, I hope the output isn't difficult to track.

          Show
          Paul Benedict added a comment - With concurrency, is the "downloading dependency" progress still readable? If X downloads are occurring, I hope the output isn't difficult to track.
          Hide
          Christian Gleissner added a comment -

          This feature works very well and resulted in a significant build time reduction, depending on the number of cores. Why was the output de-multiplexer commented for Maven 3 Beta 1 and when will this very useful feature be re-enabled?

          Also, when performing a parallel build, TeamCity 5's automatic collection of Maven Surefire results no longer finds all results. We are using an external invocation of Maven 3. This may be related to how TeamCity checks for the completion of modules by scanning the log file and could go away once the output de-multiplexing will be re-enabled.

          Show
          Christian Gleissner added a comment - This feature works very well and resulted in a significant build time reduction, depending on the number of cores. Why was the output de-multiplexer commented for Maven 3 Beta 1 and when will this very useful feature be re-enabled? Also, when performing a parallel build, TeamCity 5's automatic collection of Maven Surefire results no longer finds all results. We are using an external invocation of Maven 3. This may be related to how TeamCity checks for the completion of modules by scanning the log file and could go away once the output de-multiplexing will be re-enabled.
          Hide
          Kristian Rosenvold added a comment -

          Thank you, Christian, nice with some feedback.

          The output demultiplexer that is present in the code base was only intended as a stop-gap, and is architecturally not the proper solution. MNG-2727 is targeted at beta-2 and we are awaiting its completion. In the event that MNG-2727 does not resolve demultiplexing of the output in parallel build, there will be another issue (with a likely target of beta-2) to fix this.

          Regarding your team-city issue I will not be surprised if they're parsing the output somehow, so I can imagine it being slightly confused.

          Please also note that there are several other issues being fixed in the maven ecosystem related to stable running of parallel tests, and I'll try to keep this updated in the "relates to section" of this issue. The workaround until updated plugins are released will normally involve adding dependencies to newer library versions to your plugins at the moment.

          Show
          Kristian Rosenvold added a comment - Thank you, Christian, nice with some feedback. The output demultiplexer that is present in the code base was only intended as a stop-gap, and is architecturally not the proper solution. MNG-2727 is targeted at beta-2 and we are awaiting its completion. In the event that MNG-2727 does not resolve demultiplexing of the output in parallel build, there will be another issue (with a likely target of beta-2) to fix this. Regarding your team-city issue I will not be surprised if they're parsing the output somehow, so I can imagine it being slightly confused. Please also note that there are several other issues being fixed in the maven ecosystem related to stable running of parallel tests, and I'll try to keep this updated in the "relates to section" of this issue. The workaround until updated plugins are released will normally involve adding dependencies to newer library versions to your plugins at the moment.

            People

            • Assignee:
              Kristian Rosenvold
              Reporter:
              Nigel Magnay
            • Votes:
              44 Vote for this issue
              Watchers:
              40 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: