Archiva

remove the archiva database

Details

  • Type: Task Task
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 1.4-M1
  • Component/s: None
  • Labels:
    None
  • Number of attachments :
    0

Description

Issue Links

Activity

Hide
Brett Porter added a comment -

the first steps of this have been done

  • started a trivial metadata content repository implementation and consumer to generate metadata. This needs much more work to be useful, but gives the right API hooks (see r721955)
  • split scanning from the repository layer (see r722004) - this means some code that needed scanning need not depend on the repository-layer (and therefore archiva-model)
  • removed database cleanup consumers, relying on "events" from when Archiva notices a deletion (See note below)
  • removed some unused database consumers (see MRM-1281)
  • split the scheduler into modules to isolate database and indexer dependent code, so that the nexus index consumer need not have a dependency on the database (see r882207)

Note that currently:

  • detecting deleted files on the filesystem and updating the index / database is non-functional (see r723334)
Show
Brett Porter added a comment - the first steps of this have been done
  • started a trivial metadata content repository implementation and consumer to generate metadata. This needs much more work to be useful, but gives the right API hooks (see r721955)
  • split scanning from the repository layer (see r722004) - this means some code that needed scanning need not depend on the repository-layer (and therefore archiva-model)
  • removed database cleanup consumers, relying on "events" from when Archiva notices a deletion (See note below)
  • removed some unused database consumers (see MRM-1281)
  • split the scheduler into modules to isolate database and indexer dependent code, so that the nexus index consumer need not have a dependency on the database (see r882207)
Note that currently:
  • detecting deleted files on the filesystem and updating the index / database is non-functional (see r723334)
Hide
Brett Porter added a comment -

the next step is to remove the ProjectModelToDatabaseConsumer as this is the only remaining requirement for the second database scan.

The following relies on the project model directly:

  • the dependency-graph module (in turn, used only by the DependencyTree tag in the webapp, as it was not implemented for XMLRPC)
  • DefaultRepositoryBrowsing (see below)
  • DatabaseProjectModelResolver. This is primarily used in populating the model itself, and the dependency graph. If the above usages are removed, this can be removed.

The following utilise DefaultRepositoryBrowsing:

  • "used by" in the webapp
  • all tabs of the artifact page (ShowArtifactAction)
  • the "shared model" on the artifact parent page
  • to determine the artifact packaging in XMLRPC search (falls back to JAR - this code should be improved)
  • getDependencies in XMLRPC

The key step is to introduce a metadata content repository API that can be a single touchpoint for the application, rather than the current situation of spreading between the repository-layer and direct access to the database which makes the availability of the model unreliable, is not easily extensible by plugins, and causes the main need for slow pre-scanning.

I believe the next steps are to flesh out the new metadata content repository api to support querying the above information. While it could be made to continue to interface to the existing database using the same resolvers, I believe it is valuable to simplify it using existing APIs and then revisit functionality that might need to catch up. Therefore, it can have a dumb implementation to start with (perhaps going back to reading the POM and dependency tree on demand for most uses, and not implementing get used by initially), and be improved later.

The API can be tested by migrating browse and xmlrpc to it, then removing the dependency graph module and unprocessed database consumers, model code, database scheduler and corresponding UI. If this is successful, other uses of the database can be moved over (RSS, other browse and xmlrpc functionality, and reporting) and the tree / used by made to work again. Following that, uses of repository-layer and model can be moved over as well and deletion scanning revisited.

Show
Brett Porter added a comment - the next step is to remove the ProjectModelToDatabaseConsumer as this is the only remaining requirement for the second database scan. The following relies on the project model directly:
  • the dependency-graph module (in turn, used only by the DependencyTree tag in the webapp, as it was not implemented for XMLRPC)
  • DefaultRepositoryBrowsing (see below)
  • DatabaseProjectModelResolver. This is primarily used in populating the model itself, and the dependency graph. If the above usages are removed, this can be removed.
The following utilise DefaultRepositoryBrowsing:
  • "used by" in the webapp
  • all tabs of the artifact page (ShowArtifactAction)
  • the "shared model" on the artifact parent page
  • to determine the artifact packaging in XMLRPC search (falls back to JAR - this code should be improved)
  • getDependencies in XMLRPC
The key step is to introduce a metadata content repository API that can be a single touchpoint for the application, rather than the current situation of spreading between the repository-layer and direct access to the database which makes the availability of the model unreliable, is not easily extensible by plugins, and causes the main need for slow pre-scanning. I believe the next steps are to flesh out the new metadata content repository api to support querying the above information. While it could be made to continue to interface to the existing database using the same resolvers, I believe it is valuable to simplify it using existing APIs and then revisit functionality that might need to catch up. Therefore, it can have a dumb implementation to start with (perhaps going back to reading the POM and dependency tree on demand for most uses, and not implementing get used by initially), and be improved later. The API can be tested by migrating browse and xmlrpc to it, then removing the dependency graph module and unprocessed database consumers, model code, database scheduler and corresponding UI. If this is successful, other uses of the database can be moved over (RSS, other browse and xmlrpc functionality, and reporting) and the tree / used by made to work again. Following that, uses of repository-layer and model can be moved over as well and deletion scanning revisited.
Hide
Brett Porter added a comment -

the above has now been done. Along the way, several bugs have been fixed (usedBy and dependency tree now work more reliably and quickly) and some tests added.

While it works, the code is not production ready - it litters the home directory with property files and needs to be centralised. There is more testing to be added and there are probably some edge cases that remain problematic. In particular, Maven-1 formatted storage repositories is likely to not work.

Before moving to polishing, I'll continue proving the API by moving other archiva-database uses over to it, such as RSS, reporting, and any remaining bits of the webapp and XMLRPC.

Show
Brett Porter added a comment - the above has now been done. Along the way, several bugs have been fixed (usedBy and dependency tree now work more reliably and quickly) and some tests added. While it works, the code is not production ready - it litters the home directory with property files and needs to be centralised. There is more testing to be added and there are probably some edge cases that remain problematic. In particular, Maven-1 formatted storage repositories is likely to not work. Before moving to polishing, I'll continue proving the API by moving other archiva-database uses over to it, such as RSS, reporting, and any remaining bits of the webapp and XMLRPC.
Hide
Brett Porter added a comment -

this is now functionally complete on the branch. I'll spend some time fixing up some other known bugs (that may or may not have been pre-existing) and then look at what it takes to be production ready - both by reviewing things like error handling and logging, and performance of some of the more frequent operations.

Show
Brett Porter added a comment - this is now functionally complete on the branch. I'll spend some time fixing up some other known bugs (that may or may not have been pre-existing) and then look at what it takes to be production ready - both by reviewing things like error handling and logging, and performance of some of the more frequent operations.
Hide
Wendy Smoak added a comment -

I built the branch at r891346 and tried the Jetty bundle, but http://localhost:8080/archiva is 503 with a complaint about configuration in the wrapper log. Looks like it's trying to bind a sql datasource that (I assume) no longer exists.

Show
Wendy Smoak added a comment - I built the branch at r891346 and tried the Jetty bundle, but http://localhost:8080/archiva is 503 with a complaint about configuration in the wrapper log. Looks like it's trying to bind a sql datasource that (I assume) no longer exists.
Hide
Brett Porter added a comment -

thanks for testing it Wendy - I missed one change, should be good now

Show
Brett Porter added a comment - thanks for testing it Wendy - I missed one change, should be good now
Hide
Brett Porter added a comment -

merged to trunk

Show
Brett Porter added a comment - merged to trunk

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: