Page MenuHomeSoftware Heritage

Refactor and optimize the archiver
AbandonedPublic

Authored by qcampos on Jul 26 2016, 4:42 PM.

Details

Reviewers
olasd
zack
Group Reviewers
Reviewers
Maniphest Tasks
T240: content archiver
Summary
  • Minor optimizations of the code
  • Archiver's batches structure is now simplified in order to reduce the redundancy of the data and simplify the task creation.
  • Director's iterations over the contents are now correctly separated into simplified functions:
    • Batches from db are flattened into a continuous stream
    • Archiver's director iterate over it and yield filtered/transformed data
    • This continuous stream is batched separately from the previous step to avoid multiple functionality in a single function
  • Remove int() conversions around python's time API as the db contains float as well.
  • Perform a full refactoring of the archiver worker in order to simplify its code and behavior. Also, remove the redundant check of the potential modifications of the content status once the task have been scheduled.

Diff Detail

Repository
rDSTO Storage manager
Branch
T240
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 307
Build 452: Software Heritage Python tests
Build 451: arc lint + arc unit

Event Timeline

qcampos retitled this revision from to Refactor and optimize the archiver.
qcampos updated this object.
qcampos edited the test plan for this revision. (Show Details)
qcampos added a reviewer: olasd.
qcampos edited edge metadata.

Update Copyright date in licence header

zack added inline comments.
swh/storage/archiver/copier.py
32–33

Is this right? before this change self.url did actually contain an URL, while now it will contain a tuple (name, url) — assuming the docstring is correct, that is.
The tests pass, so I guess functionally this is not an issue. But either the unpacking is still needed, or self.url should be renamed to self.destination.

swh/storage/archiver/copier.py
32–33

The unpacking is no longer needed as destination now only contains the url (see worker.py, l.105+). I forgot to update the docstring though..

zack requested changes to this revision.Jul 27 2016, 12:04 PM
zack added a reviewer: zack.
zack added inline comments.
swh/storage/archiver/copier.py
32–33

Ah, I see. Please update the docstring then :-)
I'll happily leave the rest of the review to @olasd, has he has looked at the archiver code more than me.

This revision now requires changes to proceed.Jul 27 2016, 12:04 PM
qcampos edited edge metadata.

Update doctring to match the current implementation.

The archiver will evolve to a more symmetrical between storages & generic structure.

Previous modifications are no longer relevant due to Archiver specification's last evolutions.