Page MenuHomeSoftware Heritage

Add journal client implementation which updates content archiver db with new contents
AbandonedPublic

Authored by ardumont on Feb 25 2017, 1:30 AM.

Details

Reviewers
olasd
zack
Group Reviewers
Reviewers
Summary

This leverages the base class client (D180) to do thy bidding.

Note that this is a draft and some implementation needs improvment.

I mainly target the archiver storage functions (existing one + that
new one in this revision) which should be improved/refactored/merged +
one of the archiver director which is already able to cope with
missing entries... (T569)

Related T494

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 708
Build 951: Software Heritage Python tests
Build 950: arc lint + arc unit

Event Timeline

Sum up

  • Refactor: Merge common behavior in director and content updater client
  • test: Remove impossible and commented test
  • content_archive_add: Use the right 'missing' status
  • Refactor: Reuse swh.scheduler.get_task function
  • archiver-storage: Improve content_archive_add function
  • Refactor: Unify the content_archive_add with swh.storage.content_add
  • swh.storage.archiver.updater: Add logging level to INFO
  • swh.storage.archiver.updater: Override configuration filename
swh/storage/archiver/director.py
206

# yesqa :)

That line should not be needed anyway.

swh/storage/archiver/storage.py
180

If possible this mtime should be pulled from the database (so you'll pass a list of content_id/mtime data), even though that's okay for a first pass (this can land as is).

In any case it doesn't really need to be coerced to an int, a float will do.

182–186

This is not useful as the status for random archives will always default to "missing". Cleans up some configuration as well which is always nice.

swh/storage/archiver/storage.py
180

Right.

182–186

Indeed.

Clean up revision after code separation between director/archiver.storage

  • Add journal client to update content archiver with new content
  • d/control: Add swh-journal dependency
  • Refactor: Merge common behavior in director and content updater client
  • Refactor: Unify the content_archive_add with swh.storage.content_add
  • swh.storage.archiver.updater: Add logging level to INFO
  • swh.storage.archiver.updater: Override configuration filename

Rebased on latest master + remove sources_missing option

  • Add journal client to update content archiver with new content
  • d/control: Add swh-journal dependency
  • Refactor: Merge common behavior in director and content updater client
  • Refactor: Unify the content_archive_add with swh.storage.content_add
  • swh.storage.archiver.updater: Add logging level to INFO
  • swh.storage.archiver.updater: Override configuration filename
  • swh.storage.archiver.updater: Remove sources_missing

Stabilize the git commit messages to what they actually do

Some commit remanipulation were done yesterday to untangle perimeters
(archiver + content updater). To be fair, they were related.

The code did not change though.

  • Add journal client to update content archiver with new content
  • d/control: Add swh-journal dependency to swh.storage.archiver
  • swh.storage.archiver.updater: Call directly content_archive_add
  • swh.storage.archiver.updater: Add logging level to INFO
  • swh.storage.archiver.updater: Add specific configuration filename

Merged through 4f1d48ce6a0a281f2553284b7078735ace740755..d2d45b28ed8214f4a4f5b0a3b566ce05a788b335