Archiva

discover the deletion of artifacts

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 1.0-beta-3
  • Component/s: repository scanning
  • Labels:
    None
  • Number of attachments :
    0

Description

currently, the discovery mechanism is geared to walking a source repository and insert into the target repository. However, there is no way to determine if an artifact has been removed. We may need to track this - potentially using metadata or the repository index.

Issue Links

Activity

Hide
Brett Porter added a comment -

It is better not to check the target at all but to record all the information inside the repository being discovered. The best alternative for this is the index - so when the indexer callback checks for deletions (which can be done using the technique described below with the listener), deletions found should be recorded in the root metadata for other processes to pick up, handle and clear as they come by.

This does mean that it may operate like this:
1) file is deleted
2) converter discovers no change
3) indexer discovers change,
4) converter finds change recorded for it, deletes from target
This delay should not be significant as long as the non-indexing discoverers regularly check for deletions.

This does require that we can enumerate all operations on a repository so that they can be added for each.

We will want to be able to only test deletions on a less frequent interval.


other thoughts in case this doesn't pan out:

I can see two possible solutions:
1) on an alternate run, the caller passes in a list of things it knows about, and the ones that have gone missing are passed back for deletion from the caller (indexer, converter, etc)
2) metadata is written to each artifact directory

The first seems the most efficient to me, but has a high memory requirement if there are a lot of artifacts (and would require caching to avoid having to read the entire index/rediscovering the entire target repository/etc).

Once we convert to a listener, a better way would be for the listener to have two methods:

  • processModifiedArtifact() - for new or changed artifacts
  • processUnmodifiedArtifact() - generally do nothing, but notes its existence
    at the completion of discovery, the caller can compare the list of discovered artifacts to its internal list and detect deletions. If it is easy to get a count of the target then this will be very quick when unchanged. The most problematic is the target repository on converter which will require a "reverse discovery" to find the old artifacts.
Show
Brett Porter added a comment - It is better not to check the target at all but to record all the information inside the repository being discovered. The best alternative for this is the index - so when the indexer callback checks for deletions (which can be done using the technique described below with the listener), deletions found should be recorded in the root metadata for other processes to pick up, handle and clear as they come by. This does mean that it may operate like this: 1) file is deleted 2) converter discovers no change 3) indexer discovers change, 4) converter finds change recorded for it, deletes from target This delay should not be significant as long as the non-indexing discoverers regularly check for deletions. This does require that we can enumerate all operations on a repository so that they can be added for each. We will want to be able to only test deletions on a less frequent interval.
other thoughts in case this doesn't pan out: I can see two possible solutions: 1) on an alternate run, the caller passes in a list of things it knows about, and the ones that have gone missing are passed back for deletion from the caller (indexer, converter, etc) 2) metadata is written to each artifact directory The first seems the most efficient to me, but has a high memory requirement if there are a lot of artifacts (and would require caching to avoid having to read the entire index/rediscovering the entire target repository/etc). Once we convert to a listener, a better way would be for the listener to have two methods:
  • processModifiedArtifact() - for new or changed artifacts
  • processUnmodifiedArtifact() - generally do nothing, but notes its existence at the completion of discovery, the caller can compare the list of discovered artifacts to its internal list and detect deletions. If it is easy to get a count of the target then this will be very quick when unchanged. The most problematic is the target repository on converter which will require a "reverse discovery" to find the old artifacts.
Hide
Maria Odea Ching added a comment -

Fixed in -r584735

These were the changes made:

  • added code for cleaning up the database of artifacts that are no longer existing in the repository
    (DatabaseCleanupRemoveArtifactConsumer and DatabaseCleanupRemoveProjectConsumer)
  • created tests for database cleanup of removed artifacts
  • updated some of the test cases (in archiva-database and archiva-scheduled modules) to reflect the changes in thedb cleanup consumers

The cleaning up of the index was not yet included here as i suspect the locking problem (same as with the repository purge) will occur. I'll open a separate jira for this.

Thanks!

Show
Maria Odea Ching added a comment - Fixed in -r584735 These were the changes made:
  • added code for cleaning up the database of artifacts that are no longer existing in the repository (DatabaseCleanupRemoveArtifactConsumer and DatabaseCleanupRemoveProjectConsumer)
  • created tests for database cleanup of removed artifacts
  • updated some of the test cases (in archiva-database and archiva-scheduled modules) to reflect the changes in thedb cleanup consumers
The cleaning up of the index was not yet included here as i suspect the locking problem (same as with the repository purge) will occur. I'll open a separate jira for this. Thanks!

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: