Page MenuHomeSoftware Heritage

Add journal client implementation which updates content archiver db with new contents
AbandonedPublic

Authored by ardumont on Feb 25 2017, 1:30 AM.

Details

Reviewers
olasd
zack
Group Reviewers
Reviewers
Summary

This leverages the base class client (D180) to do thy bidding.

Note that this is a draft and some implementation needs improvment.

I mainly target the archiver storage functions (existing one + that
new one in this revision) which should be improved/refactored/merged +
one of the archiver director which is already able to cope with
missing entries... (T569)

Related T494

Diff Detail

Repository
rDSTO Storage manager
Branch
add-archiver-content-updater
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 735
Build 985: Software Heritage Python tests
Build 984: arc lint + arc unit

Event Timeline

Sum up

  • Refactor: Merge common behavior in director and content updater client
  • test: Remove impossible and commented test
  • content_archive_add: Use the right 'missing' status
  • Refactor: Reuse swh.scheduler.get_task function
  • archiver-storage: Improve content_archive_add function
  • Refactor: Unify the content_archive_add with swh.storage.content_add
  • swh.storage.archiver.updater: Add logging level to INFO
  • swh.storage.archiver.updater: Override configuration filename
swh/storage/archiver/director.py
206 ↗(On Diff #558)

# yesqa :)

That line should not be needed anyway.

swh/storage/archiver/storage.py
180 ↗(On Diff #558)

If possible this mtime should be pulled from the database (so you'll pass a list of content_id/mtime data), even though that's okay for a first pass (this can land as is).

In any case it doesn't really need to be coerced to an int, a float will do.

182–186 ↗(On Diff #558)

This is not useful as the status for random archives will always default to "missing". Cleans up some configuration as well which is always nice.

swh/storage/archiver/storage.py
180 ↗(On Diff #558)

Right.

182–186 ↗(On Diff #558)

Indeed.

Clean up revision after code separation between director/archiver.storage

  • Add journal client to update content archiver with new content
  • d/control: Add swh-journal dependency
  • Refactor: Merge common behavior in director and content updater client
  • Refactor: Unify the content_archive_add with swh.storage.content_add
  • swh.storage.archiver.updater: Add logging level to INFO
  • swh.storage.archiver.updater: Override configuration filename

Rebased on latest master + remove sources_missing option

  • Add journal client to update content archiver with new content
  • d/control: Add swh-journal dependency
  • Refactor: Merge common behavior in director and content updater client
  • Refactor: Unify the content_archive_add with swh.storage.content_add
  • swh.storage.archiver.updater: Add logging level to INFO
  • swh.storage.archiver.updater: Override configuration filename
  • swh.storage.archiver.updater: Remove sources_missing

Stabilize the git commit messages to what they actually do

Some commit remanipulation were done yesterday to untangle perimeters
(archiver + content updater). To be fair, they were related.

The code did not change though.

  • Add journal client to update content archiver with new content
  • d/control: Add swh-journal dependency to swh.storage.archiver
  • swh.storage.archiver.updater: Call directly content_archive_add
  • swh.storage.archiver.updater: Add logging level to INFO
  • swh.storage.archiver.updater: Add specific configuration filename

Merged through 4f1d48ce6a0a281f2553284b7078735ace740755..d2d45b28ed8214f4a4f5b0a3b566ce05a788b335