Issue Details (XML | Word | Printable)

Key: MRM-37
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Maria Odea Ching
Reporter: Brett Porter
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Archiva

discover the deletion of artifacts

Created: 29/Nov/05 09:10 PM   Updated: 15/Oct/07 06:20 AM
Component/s: repository scanning
Affects Version/s: None
Fix Version/s: 1.0-beta-3

Time Tracking:
Issue & Sub-Tasks
Issue Only
Not Specified

Issue Links:
Duplicate
 
Related
 
dependent
 

Sub-Tasks  All   Open   

 Description  « Hide
currently, the discovery mechanism is geared to walking a source repository and insert into the target repository. However, there is no way to determine if an artifact has been removed. We may need to track this - potentially using metadata or the repository index.

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Brett Porter added a comment - 24/Jul/06 01:44 AM
It is better not to check the target at all but to record all the information inside the repository being discovered. The best alternative for this is the index - so when the indexer callback checks for deletions (which can be done using the technique described below with the listener), deletions found should be recorded in the root metadata for other processes to pick up, handle and clear as they come by.

This does mean that it may operate like this:
1) file is deleted
2) converter discovers no change
3) indexer discovers change,
4) converter finds change recorded for it, deletes from target
This delay should not be significant as long as the non-indexing discoverers regularly check for deletions.

This does require that we can enumerate all operations on a repository so that they can be added for each.

We will want to be able to only test deletions on a less frequent interval.


other thoughts in case this doesn't pan out:

I can see two possible solutions:
1) on an alternate run, the caller passes in a list of things it knows about, and the ones that have gone missing are passed back for deletion from the caller (indexer, converter, etc)
2) metadata is written to each artifact directory

The first seems the most efficient to me, but has a high memory requirement if there are a lot of artifacts (and would require caching to avoid having to read the entire index/rediscovering the entire target repository/etc).

Once we convert to a listener, a better way would be for the listener to have two methods:

  • processModifiedArtifact() - for new or changed artifacts
  • processUnmodifiedArtifact() - generally do nothing, but notes its existence
    at the completion of discovery, the caller can compare the list of discovered artifacts to its internal list and detect deletions. If it is easy to get a count of the target then this will be very quick when unchanged. The most problematic is the target repository on converter which will require a "reverse discovery" to find the old artifacts.

Maria Odea Ching added a comment - 15/Oct/07 06:20 AM
Fixed in -r584735

These were the changes made:

  • added code for cleaning up the database of artifacts that are no longer existing in the repository
    (DatabaseCleanupRemoveArtifactConsumer and DatabaseCleanupRemoveProjectConsumer)
  • created tests for database cleanup of removed artifacts
  • updated some of the test cases (in archiva-database and archiva-scheduled modules) to reflect the changes in thedb cleanup consumers

The cleaning up of the index was not yet included here as i suspect the locking problem (same as with the repository purge) will occur. I'll open a separate jira for this.

Thanks!