Details
Diff Detail
- Repository
- rDFUSE FUSE virtual file system
- Branch
- feature/add-history-by-hash-sharding
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 17082 Build 26365: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 26364: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D4416 (id=15649)
Rebasing onto 46a48a1907...
Current branch diff-target is up to date.
Changes applied before test
commit 994e11e59521084ea90636efa1fe16350d06cf18 Author: Thibault Allançon <haltode@gmail.com> Date: Thu Nov 5 12:19:06 2020 +0100 fs: history: add by-hash/ sharded directory
See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/145/ for more details.
LGTM, I've only noted down a couple of minor issues
swh/fuse/fs/entry.py | ||
---|---|---|
124–126 ↗ | (On Diff #15649) | minor: sharding["length"] is a constant from the point of view of this code, you can cache it outside the for loop to avoid looking it up twice each loop iteration |
146 ↗ | (On Diff #15649) | minor: you can move this prefix_len cached length to outside the for loop, and also the conf lookup of sharding > length just below, I think |
Build is green
Patch application report for D4416 (id=15656)
Rebasing onto 46a48a1907...
Current branch diff-target is up to date.
Changes applied before test
commit db41b75df1e39cf6df021dee77617003712a81f3 Author: Thibault Allançon <haltode@gmail.com> Date: Thu Nov 5 12:19:06 2020 +0100 fs: history: add by-hash/ sharded directory
See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/146/ for more details.
swh/fuse/fs/entry.py | ||
---|---|---|
124–126 ↗ | (On Diff #15649) | Also you should do: for i in range(0, sharding['depth']): name += basename[i * length: (i + 1) * length] |
Build is green
Patch application report for D4416 (id=15681)
Rebasing onto 46a48a1907...
Current branch diff-target is up to date.
Changes applied before test
commit 6eca12205996992a6031a337bd168ae32afe8387 Author: Thibault Allançon <haltode@gmail.com> Date: Fri Nov 6 10:41:10 2020 +0100 WIP: do not store history in memory commit db41b75df1e39cf6df021dee77617003712a81f3 Author: Thibault Allançon <haltode@gmail.com> Date: Thu Nov 5 12:19:06 2020 +0100 fs: history: add by-hash/ sharded directory
See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/147/ for more details.
Build is green
Patch application report for D4416 (id=15712)
Rebasing onto 46a48a1907...
Current branch diff-target is up to date.
Changes applied before test
commit 357e62ae5cae1ab60df15c3953a493f9c0eb8dde Author: Thibault Allançon <haltode@gmail.com> Date: Fri Nov 6 15:53:51 2020 +0100 WIP: remove cache asizeof commit 375f7b32bba64f19abf9efeef3cc10cfe5825428 Author: Thibault Allançon <haltode@gmail.com> Date: Fri Nov 6 10:41:10 2020 +0100 WIP: do not store history in memory commit db41b75df1e39cf6df021dee77617003712a81f3 Author: Thibault Allançon <haltode@gmail.com> Date: Thu Nov 5 12:19:06 2020 +0100 fs: history: add by-hash/ sharded directory
See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/148/ for more details.
I think I understand what your fill_direntry_cache function is trying to do: you want to avoid fetching the history multiple times by doing the request only once and writing the direntry cache of all the children recursively?
Would it be maybe better to instead have a small LRU cache for the API queries, and keep the direntry code simple and fully lazy?
swh/fuse/fs/artifact.py | ||
---|---|---|
234–237 | More pythonic (and probably more efficient) without string concatenation: parts = [ basename[i * sharding_length : (i + 1) * sharding_length] for i in range(sharding_depth) ] parts.append(str(swhid)) name = os.path.join(parts) |
Build is green
Patch application report for D4416 (id=15871)
Rebasing onto 46a48a1907...
Current branch diff-target is up to date.
Changes applied before test
commit 604665ab40ec3c5350bda1da57b6f11f5eabf077 Author: Thibault Allançon <haltode@gmail.com> Date: Fri Nov 6 15:53:51 2020 +0100 cache: replace asizeof() call with simpler heuristic The asizeof() introduced a very heavy overhead which made swhfs unusable when relying on many asizeof calls (eg: storing history exploration in the direntry cache with a history containing >1000 commits). The heuristic to calculate the size of a FuseEntry is based on experimental results from repositories with various history sizes (10K, 100k, ...) commit 2b1a725ad68dfa0944fb8b77667671f7fdc4c678 Author: Thibault Allançon <haltode@gmail.com> Date: Fri Nov 6 10:41:10 2020 +0100 fs: history: do not store history list in memory Pre-fill the direntry cache with all the sub-entries once the history has been retrieved. commit db41b75df1e39cf6df021dee77617003712a81f3 Author: Thibault Allançon <haltode@gmail.com> Date: Thu Nov 5 12:19:06 2020 +0100 fs: history: add by-hash/ sharded directory
See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/149/ for more details.