Details

Reviewers

seirl
zack

Group Reviewers

Reviewers

Commits

rDFUSE2b1a725ad68d: fs: history: do not store history list in memory
rDFUSE604665ab40ec: cache: replace asizeof() call with simpler heuristic
rDFUSEdb41b75df1e3: fs: history: add by-hash/ sharded directory

Diff Detail

Repository

rDFUSE FUSE virtual file system

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

haltode created this revision.Nov 5 2020, 12:21 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptNov 5 2020, 12:21 PM

Build is green

Patch application report for D4416 (id=15649)

Rebasing onto 46a48a1907...

Current branch diff-target is up to date.

Changes applied before test

commit 994e11e59521084ea90636efa1fe16350d06cf18
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Nov 5 12:19:06 2020 +0100

    fs: history: add by-hash/ sharded directory

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/145/ for more details.

Harbormaster completed remote builds in B16871: Diff 15649.Nov 5 2020, 12:23 PM

LGTM, I've only noted down a couple of minor issues

swh/fuse/fs/entry.py
123–125	minor: `sharding["length"]` is a constant from the point of view of this code, you can cache it outside the for loop to avoid looking it up twice each loop iteration
145	minor: you can move this `prefix_len` cached length to outside the for loop, and also the conf lookup of `sharding > length` just below, I think

This revision now requires changes to proceed.Nov 5 2020, 1:41 PM

Fix zack comments (cache outside of loop)

Build is green

Patch application report for D4416 (id=15656)

Rebasing onto 46a48a1907...

Current branch diff-target is up to date.

Changes applied before test

commit db41b75df1e39cf6df021dee77617003712a81f3
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Nov 5 12:19:06 2020 +0100

    fs: history: add by-hash/ sharded directory

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/146/ for more details.

Harbormaster completed remote builds in B16878: Diff 15656.Nov 5 2020, 1:57 PM

haltode marked 2 inline comments as done.Nov 5 2020, 1:58 PM

seirl added a subscriber: seirl.Nov 5 2020, 3:01 PM

seirl added inline comments.

swh/fuse/fs/entry.py
123–125	Also you should do: for i in range(0, sharding['depth']): name += basename[i * length: (i + 1) * length]

Do not store history in memory

Build is green

Patch application report for D4416 (id=15681)

Rebasing onto 46a48a1907...

Current branch diff-target is up to date.

Changes applied before test

commit 6eca12205996992a6031a337bd168ae32afe8387
Author: Thibault Allançon <haltode@gmail.com>
Date:   Fri Nov 6 10:41:10 2020 +0100

    WIP: do not store history in memory

commit db41b75df1e39cf6df021dee77617003712a81f3
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Nov 5 12:19:06 2020 +0100

    fs: history: add by-hash/ sharded directory

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/147/ for more details.

Harbormaster completed remote builds in B16895: Diff 15681.Nov 6 2020, 10:43 AM

clean duplicate code
remove cache asizeof

Build is green

Patch application report for D4416 (id=15712)

Rebasing onto 46a48a1907...

Current branch diff-target is up to date.

Changes applied before test

commit 357e62ae5cae1ab60df15c3953a493f9c0eb8dde
Author: Thibault Allançon <haltode@gmail.com>
Date:   Fri Nov 6 15:53:51 2020 +0100

    WIP: remove cache asizeof

commit 375f7b32bba64f19abf9efeef3cc10cfe5825428
Author: Thibault Allançon <haltode@gmail.com>
Date:   Fri Nov 6 10:41:10 2020 +0100

    WIP: do not store history in memory

commit db41b75df1e39cf6df021dee77617003712a81f3
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Nov 5 12:19:06 2020 +0100

    fs: history: add by-hash/ sharded directory

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/148/ for more details.

Harbormaster completed remote builds in B16924: Diff 15712.Nov 6 2020, 3:57 PM

zack resigned from this revision.Nov 6 2020, 5:33 PM

haltode mentioned this in D4371: WIP: archive + meta directory sharding.Nov 10 2020, 12:44 PM

I think I understand what your fill_direntry_cache function is trying to do: you want to avoid fetching the history multiple times by doing the request only once and writing the direntry cache of all the children recursively?
Would it be maybe better to instead have a small LRU cache for the API queries, and keep the direntry code simple and fully lazy?

swh/fuse/fs/artifact.py
235–238	More pythonic (and probably more efficient) without string concatenation: parts = [ basename[i * sharding_length : (i + 1) * sharding_length] for i in range(sharding_depth) ] parts.append(str(swhid)) name = os.path.join(parts)

Fix string construction
Rework commit messages

Build is green

Patch application report for D4416 (id=15871)

Rebasing onto 46a48a1907...

Current branch diff-target is up to date.

Changes applied before test

commit 604665ab40ec3c5350bda1da57b6f11f5eabf077
Author: Thibault Allançon <haltode@gmail.com>
Date:   Fri Nov 6 15:53:51 2020 +0100

    cache: replace asizeof() call with simpler heuristic
    
    The asizeof() introduced a very heavy overhead which made swhfs unusable
    when relying on many asizeof calls (eg: storing history exploration in
    the direntry cache with a history containing >1000 commits).
    
    The heuristic to calculate the size of a FuseEntry is based on
    experimental results from repositories with various history sizes (10K,
    100k, ...)

commit 2b1a725ad68dfa0944fb8b77667671f7fdc4c678
Author: Thibault Allançon <haltode@gmail.com>
Date:   Fri Nov 6 10:41:10 2020 +0100

    fs: history: do not store history list in memory
    
    Pre-fill the direntry cache with all the sub-entries once the history
    has been retrieved.

commit db41b75df1e39cf6df021dee77617003712a81f3
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Nov 5 12:19:06 2020 +0100

    fs: history: add by-hash/ sharded directory

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/149/ for more details.

Harbormaster completed remote builds in B17082: Diff 15871.Nov 13 2020, 4:01 PM

haltode marked an inline comment as done.Nov 13 2020, 4:02 PM

seirl accepted this revision.Nov 13 2020, 4:03 PM

This revision is now accepted and ready to land.Nov 13 2020, 4:03 PM

Closed by commit rDFUSEdb41b75df1e3: fs: history: add by-hash/ sharded directory (authored by haltode). · Explain WhyNov 13 2020, 4:06 PM

This revision was automatically updated to reflect the committed changes.

haltode added a commit: rDFUSEdb41b75df1e3: fs: history: add by-hash/ sharded directory.

haltode added a commit: rDFUSE2b1a725ad68d: fs: history: do not store history list in memory.

haltode added a commit: rDFUSE604665ab40ec: cache: replace asizeof() call with simpler heuristic.

fs: history: add by-hash/ sharded directory
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Patch application report for D4416 (id=15649)

Changes applied before test

Patch application report for D4416 (id=15656)

Changes applied before test

Patch application report for D4416 (id=15681)

Changes applied before test

Patch application report for D4416 (id=15712)

Changes applied before test

Patch application report for D4416 (id=15871)

Changes applied before test

Revision Contents
Changeset List

Diff 15872

swh/fuse/cli.py

swh/fuse/fs/artifact.py

swh/fuse/fs/entry.py

swh/fuse/fuse.py

swh/fuse/tests/test_revision.py

fs: history: add by-hash/ sharded directoryClosedPublicActions

Details

Diff Detail

Event Timeline

Patch application report for D4416 (id=15649)

Changes applied before test

Patch application report for D4416 (id=15656)

Changes applied before test

Patch application report for D4416 (id=15681)

Changes applied before test

Patch application report for D4416 (id=15712)

Changes applied before test

Patch application report for D4416 (id=15871)

Changes applied before test

Revision ContentsChangeset List

Diff 15872

swh/fuse/cli.py

swh/fuse/fs/artifact.py

swh/fuse/fs/entry.py

swh/fuse/fuse.py

swh/fuse/tests/test_revision.py

fs: history: add by-hash/ sharded directory
ClosedPublic
Actions

Revision Contents
Changeset List