Page MenuHomeSoftware Heritage

First version of the directory cooker & cache
ClosedPublic

Authored by qcampos on Aug 19 2016, 3:11 PM.

Details

Summary

This first version does create a compressed folder of an archive
directory but is not linked to any API or notification system.

This diff is submitted for architecture and code review and will evolve.

Diff Detail

Repository
rDSTO Storage manager
Branch
D102
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 470
Build 693: Software Heritage Python tests
Build 692: arc lint + arc unit

Event Timeline

qcampos retitled this revision from to First version of the directory cooker & cache.
qcampos updated this object.
qcampos edited the test plan for this revision. (Show Details)
qcampos edited edge metadata.

Minor fixes

ardumont added inline comments.
swh/storage/vault/cache.py
18

What does the TempStorage is for?

It looks like an iterative step which could be removed now.
If not to be removed, please add a docstring ^^

swh/storage/vault/cooker.py
22

describes

38

must implement

69

You could use generator:
https://docs.python.org/3.4/library/itertools.html#itertools.tee

Something like this could work:

import itertools
datas = self.storage.directory_ls(dir_id, recursive=True)
file_data, dir_data = itertools.tee(datas, 2)

file_data = (entry for entry in datas if entry['type'] == 'file')
dir_data =  (entry for entry in datas if entry['type'] == 'dir')

If this work, you could even go further as to deal with absent, hidden and normal files here by creating 5 generators and do the filtering here.

file_hidden_data = (entry for entry in datas if entry['type'] == 'file' and entry['status'] == 'hidden')
file_absent_data = (entry for entry in datas if entry['type'] == 'file' and entry['status'] == 'absent')
...

Note:
data is invariable, so it should really be data (and not datas) ^^
70

s/If this work,/If this works/

84

You could use generator here using ( and )

and even possibly remove altogether the function (cf. previous comment).

106

method

124

beware the print ^^

ardumont added a reviewer: ardumont.

Some small nitpicks on typos.
I don't see any blockers.

This revision is now accepted and ready to land.Aug 26 2016, 6:25 PM
qcampos edited edge metadata.
qcampos marked 8 inline comments as done.

Do some corrections

This revision now requires review to proceed.Sep 12 2016, 11:36 AM
ardumont edited edge metadata.
ardumont added inline comments.
swh/storage/vault/cooker.py
74

You could probably remove the map here and only retrieves the list of names line 90.
You lose in symmetry in the resulting data (one is list of names, and the other is a list of dict) but that code is local enough to be noticed (by local enough, i mean position wise, it's side by side).

Also, i stand by my previous remark, you could inline the _split_data method directly here.
The _split_data function is not that much useful, i think.
And after that, the so-called symmetry problem is no longer relevant.

90

If entry is not useful, you could directly return entry['name'].

128

This could probably be improved later.
Since the storage api permits to retrieve multiple contents' data in one shot.
Anyway, this sounds reasonable for now.

This revision now requires changes to proceed.Sep 12 2016, 11:52 AM
qcampos edited edge metadata.
qcampos marked 2 inline comments as done.

Inline _split_data method of the cooker

qcampos added inline comments.
swh/storage/vault/cooker.py
128

Remember the performance issue we got on a remote storage? The current implementation is the reason. Working on a local storage made the execution time acceptable, but changing it is scheduled!

ardumont edited edge metadata.

Awesome ^^

swh/storage/vault/cooker.py
128

Thanks for the reminder ^^

This revision is now accepted and ready to land.Sep 12 2016, 12:58 PM
This revision was automatically updated to reflect the committed changes.
qcampos marked an inline comment as done.