Page MenuHomeSoftware Heritage

Arbitrary slicing on PathSlicingObjStorage
ClosedPublic

Authored by qcampos on Jun 14 2016, 5:54 PM.

Details

Summary

Allow the sha1 slicing of a content to be fully customizable.

For example, a content's sha1 : "abcdef1234567890" in a storage with slicing "0:2/0:5" will be stored at "root/ab/abcde/".

Also, make the required changes in the swh.storage package to follow those modifications.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

qcampos retitled this revision from to Arbitrary slicing on PathSlicingObjStorage.
qcampos updated this object.
qcampos edited the test plan for this revision. (Show Details)
qcampos added a reviewer: olasd.
qcampos edited edge metadata.

Correct a docstring that was not up-to-date.

qcampos edited edge metadata.

Move a constant into the superclass.

qcampos edited edge metadata.

Add an default argument that was missing.

olasd requested changes to this revision.Jun 15 2016, 5:37 PM
olasd edited edge metadata.

A few comments inline before merging :)

swh/storage/objstorage/objstorage_pathslicing.py
92

typo : 0:4 is only four characters long

122

You could instantiate the slice objects here (use slice(map(int)) instead of tuple(map(int))), and reuse them directly when constructing the path.

174–186

We should probably move that check at the instanciation of the storage rather than do it on each access: the length of an object id is constant.

187

hex_obj_id[bounds] for bounds in self.bounds instead of unpacking start, end and repacking them.

This revision now requires changes to proceed.Jun 15 2016, 5:37 PM

Didn't knew I could create a slice item. Thanks !

swh/storage/objstorage/objstorage_pathslicing.py
174–186

Do we have a way, at instantiation, to know the size of a hash given the ID_HASH_ALGO algorithm without hard-coding it?

qcampos edited edge metadata.

Correct a typo ;
and use a slice object instead of unpacking [start, stop] manually

swh/storage/objstorage/objstorage_pathslicing.py
174–186

Not really, no; we can add an ID_HASH_LENGTH variable next to ID_HASH_ALGO.

qcampos marked 3 inline comments as done.
qcampos edited edge metadata.

Put the hash lenght test at initialization instead of doing it each access.

swh/storage/objstorage/objstorage.py
7

That should be 40 ! :)

Correct the sha1 hexadecimal hash's length.

swh/storage/objstorage/objstorage.py
7

Woops! Thats better indeed.

olasd edited edge metadata.
This revision is now accepted and ready to land.Jun 16 2016, 3:09 PM
This revision was automatically updated to reflect the committed changes.