jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • Archiva
  • MRM-294

Repository purge feature for snapshots

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: New Feature New Feature
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.0-alpha-1
  • Fix Version/s: 1.0-beta-1
  • Component/s: None
  • Labels:
    None

Description

We need a way to purge a repository of snapshots older than a certain date, (optionally retaining the most recent one) and fixing the metadata.

Issue Links

relates to

Sub-task - The sub-task of the issue MRM-275 add "remove old snapshots" Sheduler

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Maria Odea Ching added a comment - 25/Jul/07 1:55 AM

I've identified the following fields that can be configurable:
1. timestamp basis of which snapshots will be deleted
2. schedule of repository purge (MRM-275)
3. artifact count to be retained (MRM-275)

Configuration for this should be per repository? Or should it be applicable for all managed repositories?

Show
Maria Odea Ching added a comment - 25/Jul/07 1:55 AM I've identified the following fields that can be configurable: 1. timestamp basis of which snapshots will be deleted 2. schedule of repository purge (MRM-275) 3. artifact count to be retained (MRM-275) Configuration for this should be per repository? Or should it be applicable for all managed repositories?
Hide
Permalink
Barrie Treloar added a comment - 25/Jul/07 5:53 PM

Configuration for this should be per repository? Or should it be applicable for all managed repositories?

Good question.

I haven't been following archiva enough to know how remote repositories are handled differently than internal ones.
But remote repositories can be cleared (as they would be re-obtained if requested) and internal repositories don't need to have older snapshots around for very long.

So while per repository makes it easy to customise this feature, I can't really see what benefit that gives.
It feels like all snapshot repositories could have the same settings.

I'm going to err on the side of over-customization:

The snapshot retention policies should be defined separately to the repo and when not specified by a repo gets the default retention policy.

Show
Barrie Treloar added a comment - 25/Jul/07 5:53 PM
Configuration for this should be per repository? Or should it be applicable for all managed repositories?
Good question. I haven't been following archiva enough to know how remote repositories are handled differently than internal ones. But remote repositories can be cleared (as they would be re-obtained if requested) and internal repositories don't need to have older snapshots around for very long. So while per repository makes it easy to customise this feature, I can't really see what benefit that gives. It feels like all snapshot repositories could have the same settings. I'm going to err on the side of over-customization: The snapshot retention policies should be defined separately to the repo and when not specified by a repo gets the default retention policy.
Hide
Permalink
Maria Odea Ching added a comment - 26/Jul/07 2:06 AM

Thanks for your comments Barrie

Anyway, there was also a discussion regarding this issue and MRM-275 as well in the Archiva Dev List. I've summarized below what has been agreed upon in the thread. For reference purposes, the subject of the thread was "Repository purge (MRM-294 and MRM-275)". Please feel free to add more..

Summary:
1. Configuration will be for each repository - the repository purge will be implemented as a consumer, which will be executed during repository scanning. Configuration will be incorporated on the repository configuration page.
2. Repository purge is configurable in archiva.xml
3. Snapshot retention policies will also be for each repository.

For #3, these are the policies to be implemented:
a. any artifacts that are not in active development will be deleted entirely (eg, 1.0-SNAPSHOT when 1.0 is released)
b. time based** (e.g. any builds older than 1 month)
c. artifact count retention** (e.g. if user specified "5", then archiva would keep 5 in total of the latest.. like 1.1-20070506.121113-1, 1.1-20070506.121113-2, 1.1-20070506.121113-3, 1.1-20070506.121113-4, 1.1-20070506.121113-5 and NOT 1.1-SNAPSHOT, 1.2-SNAPSHOT, 1.3-SNAPSHOT, 1.4-SNAPSHOT, 1.5-SNAPSHOT)

**user will have the option to choose from either of these two criteria

Show
Maria Odea Ching added a comment - 26/Jul/07 2:06 AM Thanks for your comments Barrie Anyway, there was also a discussion regarding this issue and MRM-275 as well in the Archiva Dev List. I've summarized below what has been agreed upon in the thread. For reference purposes, the subject of the thread was "Repository purge (MRM-294 and MRM-275)". Please feel free to add more.. Summary: 1. Configuration will be for each repository - the repository purge will be implemented as a consumer, which will be executed during repository scanning. Configuration will be incorporated on the repository configuration page. 2. Repository purge is configurable in archiva.xml 3. Snapshot retention policies will also be for each repository. For #3, these are the policies to be implemented: a. any artifacts that are not in active development will be deleted entirely (eg, 1.0-SNAPSHOT when 1.0 is released) b. time based** (e.g. any builds older than 1 month) c. artifact count retention** (e.g. if user specified "5", then archiva would keep 5 in total of the latest.. like 1.1-20070506.121113-1, 1.1-20070506.121113-2, 1.1-20070506.121113-3, 1.1-20070506.121113-4, 1.1-20070506.121113-5 and NOT 1.1-SNAPSHOT, 1.2-SNAPSHOT, 1.3-SNAPSHOT, 1.4-SNAPSHOT, 1.5-SNAPSHOT) **user will have the option to choose from either of these two criteria
Hide
Permalink
Maria Odea Ching added a comment - 04/Aug/07 4:49 PM

Initial commit (-r562766):

  • Added RepositoryPurgeConsumer
  • Added these other classes used in repo purge: DaysOldRepositoryPurge, RetentionCountRepositoryPurge, ArtifactFilenameFilter and
    RepositoryPurgeException
  • Added tests and test data
  • Added new fields in configuration.mdo (daysOlder and retentionCount)
  • Added repository-purge as a consumer in default-archiva.xml (but currently commented out)
Show
Maria Odea Ching added a comment - 04/Aug/07 4:49 PM Initial commit (-r562766):
  • Added RepositoryPurgeConsumer
  • Added these other classes used in repo purge: DaysOldRepositoryPurge, RetentionCountRepositoryPurge, ArtifactFilenameFilter and RepositoryPurgeException
  • Added tests and test data
  • Added new fields in configuration.mdo (daysOlder and retentionCount)
  • Added repository-purge as a consumer in default-archiva.xml (but currently commented out)
Hide
Permalink
Maria Odea Ching added a comment - 05/Aug/07 10:14 AM

Fixes in -r562882:

  • Added the test cases which were left out during the initial commit
  • Added DefaultCleanupReleasedSnapshots (m2 implementation for cleaning up snapshots that were already released)
  • Updated the following files: archiva-core-consumers/pom.xml, RepositoryPurgeConsumer, DaysOldRepositoryPurge, RetentionCountRepositoryPurge and
    AbstractRepositoryPurge

Issue: The artifacts in the Browse Repository is not being updated even though the artifacts deleted from the file system during repository purge are also being deleted from the database. Anybody has any idea about this?

Show
Maria Odea Ching added a comment - 05/Aug/07 10:14 AM Fixes in -r562882:
  • Added the test cases which were left out during the initial commit
  • Added DefaultCleanupReleasedSnapshots (m2 implementation for cleaning up snapshots that were already released)
  • Updated the following files: archiva-core-consumers/pom.xml, RepositoryPurgeConsumer, DaysOldRepositoryPurge, RetentionCountRepositoryPurge and AbstractRepositoryPurge
Issue: The artifacts in the Browse Repository is not being updated even though the artifacts deleted from the file system during repository purge are also being deleted from the database. Anybody has any idea about this?
Hide
Permalink
Brett Porter added a comment - 05/Aug/07 11:56 PM

since you asked, here's my feedback

RepositoryPurgeConsumer:

  • it looks like index and repositoryLayout can be local variables instead (and repository field can be removed)
  • I saw: // @todo handle better injection of RepositoryPurge – the injection seems fine to me, but if you want to switch it to a component that's good too.

RepositoryPurge:

  • what about removing the setters on this interface, and passing them to the process method instead. As a component, this will be a singleton by default, so you can't use the setters (you can make the component not be a singleton, but there isn't any reason they need to retain state so I wouldn't worry).
  • since the process method always uses a specific repository configuration maybe it'd be better to pass that in directly instead of the general configuration object?

AbstractRepositoryPurge:

  • looks like 'index' can be removed
  • getFiles() calls System.out - looks like that should be an IOException?
  • abstract process() can be removed - it comes from the interface
  • need to create an issue for the commented out index purge (is it better to just omit this and let the normal indexing consumer pick up the deletion?)
  • need to create an issue for it still being on browse page (per comment above) - maybe it's a DAO caching issue?
  • the exceptions in updateDatabase() are getting swallowed!

DaysOldRepositoryPurge:

  • don't do new GregorianCalendar - use Calendar.getInstance() instead

DefaultCleanupReleasedSnapshots:

  • name is inconsistent with the others - should be CleanupReleasedSnapshotsPurge?
  • System.out in here, and exception is swallowed - is it intentional to log and swallow the exception?

A general thought here too, for later: it might be worth reviewing the exceptions that can occur in *Purge and see if we can recover better from each rather than bubbling it

Tests:

  • I think you can remove many of the components from the test XML files where the default suffice (just keep the registry and jdo factory)
  • the tests have a lot of bolierplate that can probably be turned into methods that generate test data

Missing tests:

  • no tests for the consumer or the Released Snapshots purge
  • days old test is only testing by file age - it should also test the metadata-driven snapshots

A general thought for Archiva in the long term, too... setting up the database to test this was probably a pain. We should have stub implementations of the indexer and dao's to avoid it.

Show
Brett Porter added a comment - 05/Aug/07 11:56 PM since you asked, here's my feedback RepositoryPurgeConsumer:
  • it looks like index and repositoryLayout can be local variables instead (and repository field can be removed)
  • I saw: // @todo handle better injection of RepositoryPurge – the injection seems fine to me, but if you want to switch it to a component that's good too.
RepositoryPurge:
  • what about removing the setters on this interface, and passing them to the process method instead. As a component, this will be a singleton by default, so you can't use the setters (you can make the component not be a singleton, but there isn't any reason they need to retain state so I wouldn't worry).
  • since the process method always uses a specific repository configuration maybe it'd be better to pass that in directly instead of the general configuration object?
AbstractRepositoryPurge:
  • looks like 'index' can be removed
  • getFiles() calls System.out - looks like that should be an IOException?
  • abstract process() can be removed - it comes from the interface
  • need to create an issue for the commented out index purge (is it better to just omit this and let the normal indexing consumer pick up the deletion?)
  • need to create an issue for it still being on browse page (per comment above) - maybe it's a DAO caching issue?
  • the exceptions in updateDatabase() are getting swallowed!
DaysOldRepositoryPurge:
  • don't do new GregorianCalendar - use Calendar.getInstance() instead
DefaultCleanupReleasedSnapshots:
  • name is inconsistent with the others - should be CleanupReleasedSnapshotsPurge?
  • System.out in here, and exception is swallowed - is it intentional to log and swallow the exception?
A general thought here too, for later: it might be worth reviewing the exceptions that can occur in *Purge and see if we can recover better from each rather than bubbling it Tests:
  • I think you can remove many of the components from the test XML files where the default suffice (just keep the registry and jdo factory)
  • the tests have a lot of bolierplate that can probably be turned into methods that generate test data
Missing tests:
  • no tests for the consumer or the Released Snapshots purge
  • days old test is only testing by file age - it should also test the metadata-driven snapshots
A general thought for Archiva in the long term, too... setting up the database to test this was probably a pain. We should have stub implementations of the indexer and dao's to avoid it.
Hide
Permalink
Brett Porter added a comment - 06/Aug/07 12:21 AM

Sorry, I've thought some more, and I've decided you made the right choice in not making *Purge plexus components

So, that in mind - my first comment on RepositoryPurge would be to pass the fields to the constructor instead of the process method. Also, the test cases should be updated to just use new SomethingPurge() instead of lookup().

Show
Brett Porter added a comment - 06/Aug/07 12:21 AM Sorry, I've thought some more, and I've decided you made the right choice in not making *Purge plexus components So, that in mind - my first comment on RepositoryPurge would be to pass the fields to the constructor instead of the process method. Also, the test cases should be updated to just use new SomethingPurge() instead of lookup().
Hide
Permalink
Maria Odea Ching added a comment - 06/Aug/07 3:26 AM

Applied above comments in -r563064.

Btw, some of the exceptions mentioned above are intended to be swallowed. I added a @todo for these to log the errors in the console.

I'll just open a separate jira issue for the missing tests.

Thanks!

Show
Maria Odea Ching added a comment - 06/Aug/07 3:26 AM Applied above comments in -r563064. Btw, some of the exceptions mentioned above are intended to be swallowed. I added a @todo for these to log the errors in the console. I'll just open a separate jira issue for the missing tests. Thanks!
Hide
Permalink
Brett Porter added a comment - 06/Aug/07 3:41 AM

don't forget to create an issue for the index purge and the browse problem too

Show
Brett Porter added a comment - 06/Aug/07 3:41 AM don't forget to create an issue for the index purge and the browse problem too

People

  • Assignee:
    Maria Odea Ching
    Reporter:
    Wendy Smoak
Vote (2)
Watch (3)

Dates

  • Created:
    26/Feb/07 12:55 PM
    Updated:
    06/Aug/07 3:41 AM
    Resolved:
    06/Aug/07 3:26 AM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.