Details

Reviewers

Group Reviewers

Maniphest Tasks

Commits

rDSTOCb11cfe34a03e: First version of the directory cooker & cache
R65:b11cfe34a03e: First version of the directory cooker & cache
rDVAUb84c5152a4ea: First version of the directory cooker & cache
rDSTOb11cfe34a03e: First version of the directory cooker & cache

Summary

This first version does create a compressed folder of an archive
directory but is not linked to any API or notification system.

This diff is submitted for architecture and code review and will evolve.

Diff Detail

Repository

rDSTO Storage manager

Branch

D102

Lint

No Linters Available

Unit

No Unit Test Coverage

Build Status

Buildable 468
Build 689: Software Heritage Python tests
Build 688: arc lint + arc unit

Event Timeline

qcampos updated this revision to Diff 349.Aug 19 2016, 3:11 PM

qcampos retitled this revision from to First version of the directory cooker & cache.

qcampos updated this object.

qcampos edited the test plan for this revision. (Show Details)

Herald added a reviewer: Reviewers. · View Herald TranscriptAug 19 2016, 3:11 PM

qcampos added a task: T531: Vault cookers.Aug 19 2016, 3:20 PM

Minor fixes

Herald edited edge metadata. · View Herald TranscriptAug 24 2016, 12:56 PM

qcampos added a child revision: D108: Http API to access the SWH vault.Aug 24 2016, 1:01 PM

ardumont added a subscriber: ardumont.Aug 26 2016, 6:22 PM

ardumont added inline comments.

swh/storage/vault/cache.py
18	What does the TempStorage is for? It looks like an iterative step which could be removed now. If not to be removed, please add a docstring ^^
swh/storage/vault/cooker.py
22	`describes`
38	`must implement`
69	You could use generator: https://docs.python.org/3.4/library/itertools.html#itertools.tee Something like this could work: import itertools datas = self.storage.directory_ls(dir_id, recursive=True) file_data, dir_data = itertools.tee(datas, 2) file_data = (entry for entry in datas if entry['type'] == 'file') dir_data = (entry for entry in datas if entry['type'] == 'dir') If this work, you could even go further as to deal with absent, hidden and normal files here by creating 5 generators and do the filtering here. file_hidden_data = (entry for entry in datas if entry['type'] == 'file' and entry['status'] == 'hidden') file_absent_data = (entry for entry in datas if entry['type'] == 'file' and entry['status'] == 'absent') ... Note: data is invariable, so it should really be data (and not datas) ^^
70	s/If this work,/If this works/
84	You could use generator here using `(` and `)` and even possibly remove altogether the function (cf. previous comment).
106	method
124	beware the print ^^

Some small nitpicks on typos.
I don't see any blockers.

This revision is now accepted and ready to land.Aug 26 2016, 6:25 PM

Do some corrections

This revision now requires review to proceed.Sep 12 2016, 11:36 AM

ardumont requested changes to this revision.Sep 12 2016, 11:52 AM

ardumont edited edge metadata.

ardumont added inline comments.

swh/storage/vault/cooker.py
74	You could probably remove the map here and only retrieves the list of names line 90. You lose in symmetry in the resulting data (one is list of names, and the other is a list of dict) but that code is local enough to be noticed (by local enough, i mean position wise, it's side by side). Also, i stand by my previous remark, you could inline the _split_data method directly here. The _split_data function is not that much useful, i think. And after that, the so-called symmetry problem is no longer relevant.
90	If entry is not useful, you could directly return entry['name'].
128	This could probably be improved later. Since the storage api permits to retrieve multiple contents' data in one shot. Anyway, this sounds reasonable for now.

This revision now requires changes to proceed.Sep 12 2016, 11:52 AM

Inline _split_data method of the cooker

qcampos marked an inline comment as done.Sep 12 2016, 12:54 PM

qcampos added inline comments.

swh/storage/vault/cooker.py
128	Remember the performance issue we got on a remote storage? The current implementation is the reason. Working on a local storage made the execution time acceptable, but changing it is scheduled!

Awesome ^^

swh/storage/vault/cooker.py
128	Thanks for the reminder ^^

This revision is now accepted and ready to land.Sep 12 2016, 12:58 PM

Closed by commit rDSTOb11cfe34a03e: First version of the directory cooker & cache (authored by qcampos). · Explain WhySep 12 2016, 2:22 PM

This revision was automatically updated to reflect the committed changes.

qcampos marked an inline comment as done.

First version of the directory cooker & cache
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 398

swh/storage/vault/cache.py

swh/storage/vault/conf.yaml

swh/storage/vault/cooker.py

First version of the directory cooker & cacheClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 398

swh/storage/vault/cache.py

swh/storage/vault/conf.yaml

swh/storage/vault/cooker.py

First version of the directory cooker & cache
ClosedPublic
Actions

Revision Contents
Changeset List