Page MenuHomeSoftware Heritage

Add support for revision artifacts
ClosedPublic

Authored by haltode on Oct 8 2020, 2:08 PM.

Details

Summary

Closes T2663.

What is working:

  • Symlinks in directory entries
  • Submodules in directory entries (interpreted as symlink to rev)
  • Mounting revisions

What needs to be done:

  • Add unit tests (done, but only very basic unit tests, i want to rework the test framework data generation in another diff and add more unit tests)

Related to T1926.

Diff Detail

Repository
rDFUSE FUSE virtual file system
Branch
feature/add-revision-support
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 16061
Build 24703: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 24702: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D4200 (id=14788)

Rebasing onto ee058855d1...

Current branch diff-target is up to date.
Changes applied before test
commit 95fb3258cf9b3d8640d7efc0a53c20d87aa57d34
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    WIP: add support for revision artifacts

Link to build: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/37/
See console output for more information: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/37/console

  • tests: add missing join() after subprocess.run()

Build has FAILED

Patch application report for D4200 (id=14789)

Rebasing onto ee058855d1...

Current branch diff-target is up to date.
Changes applied before test
commit 1b6cae6b9161e676154871e1c211716568805bcd
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    WIP: add support for revision artifacts

commit 84227af536a90dc82a2033caed718081b7822d0b
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:20:16 2020 +0200

    tests: add missing join() after subprocess.run()

Link to build: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/38/
See console output for more information: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/38/console

swh/fuse/fuse.py
197

Not sure if we want to use directly the target field here or some getter/setter, because FuseEntry does not have such field (so mypy complains).

Build is green

Patch application report for D4200 (id=14791)

Rebasing onto ee058855d1...

Current branch diff-target is up to date.
Changes applied before test
commit 74607ba9d0b3f123bbb7ddc9a859884e780f46fd
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:26:57 2020 +0200

    tests: add missing join() after subprocess.run()

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/39/ for more details.

Move pytest warning commit to a separate diff

Build has FAILED

Patch application report for D4200 (id=14792)

Rebasing onto ee058855d1...

Current branch diff-target is up to date.
Changes applied before test
commit a3f5e88621fb79b49d6ef51802f4c4663bd248e9
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    WIP: add support for revision artifacts

Link to build: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/40/
See console output for more information: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/40/console

haltode marked an inline comment as not done.Oct 8 2020, 2:44 PM

Build has FAILED

Patch application report for D4200 (id=14795)

Rebasing onto 74607ba9d0...

Current branch diff-target is up to date.
Changes applied before test
commit 8f668ca2f793322a1926541afb83f27b1b9ee997
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    WIP: add support for revision artifacts

Link to build: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/42/
See console output for more information: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/42/console

Build has FAILED

Patch application report for D4200 (id=14798)

Rebasing onto 74607ba9d0...

Current branch diff-target is up to date.
Changes applied before test
commit 7229a8cf6bb8379d4d05d446d08da009cf04bbf0
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    WIP: add support for revision artifacts

Link to build: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/43/
See console output for more information: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/43/console

Build has FAILED

Patch application report for D4200 (id=14800)

Rebasing onto 74607ba9d0...

Current branch diff-target is up to date.
Changes applied before test
commit 3da472a18e405270632d0d239dc6874c01498971
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    WIP: add support for revision artifacts

Link to build: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/44/
See console output for more information: https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/44/console

zack added inline comments.
swh/fuse/fs/artifact.py
85

It looks like both Directory and Revision define an __aiter__ method, which is not defined in ArtifactEntry. That's understandable, as some artifacts are not iterable (e.g., content). But it poses the problem of where to document what one iterates on. Either you briefly describe it in both those classes (and do so also in the upcoming classes), or you introduce an intermediate class to distinguish between iterable artifact entries and singleton ones, and document the iterator in the former.

Maybe there is (or will be) some common logic to be factored out in that new intermediate class, dunno.

swh/fuse/fs/entry.py
54

this method is idempotent, right? i.e., if we call it multiple times it won't attempt to create an entry (say, under archive/) multiple times and the second time it's invoked it will just return what had been created the first time, right? (the code calling it seems to expect that property)

If that's the case, the name create_* is misleading, as it doesn't imply idempotency. I'm not sure I've a great alternative suggestion, but ensure_* sounds marginally better (other options could be get_*, init_*, none of which sounds perfect).

No matter how it's called, idempotency being an important property of the contract of using a method, we need a docstring here stating the method is idempotent.

swh/fuse/fs/mountpoint.py
21–23

Is the name property on the root entry ever needed/accessed? I guess/hope not.
Either way, it'd be better to make its dummy value much more clearly dummy, e.g. "DUMMY_ROOT_PATH". Unless it could be made explicitly None, which I doubt it can.

swh/fuse/fs/artifact.py
85

The __aiter__ is defined in the upper-level FuseEntry class, we could document there all common methods + examples of which one use them (eg: Content not having an __aiter__ but a content()).

swh/fuse/fs/entry.py
54

It is not idempotent, everytime you call create_child a new object is created. However, with the inode <-> entry mapping we re-use parts of the objects, and there is a TODO in the code about caching the entries of an iterable FuseEntry so we don't need to recreate the same child objects everytime. Maybe we could discuss/measure this memory/performance topic in a separate diff?

swh/fuse/fs/mountpoint.py
21–23

This property is never accessed on the root node indeed, we can make it None.

  • Add get_target() method for symlinks
  • Set Root node name to None

Build is green

Patch application report for D4200 (id=14849)

Rebasing onto 74607ba9d0...

Current branch diff-target is up to date.
Changes applied before test
commit 8cee03c105a1a78b25d9fd51b652f2d042b2c771
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    WIP: add support for revision artifacts

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/45/ for more details.

Fix style + commit description

haltode retitled this revision from WIP: add support for revision artifacts to Add support for revision artifacts.Oct 9 2020, 11:06 AM

Build is green

Patch application report for D4200 (id=14850)

Rebasing onto 74607ba9d0...

Current branch diff-target is up to date.
Changes applied before test
commit d264b560608117175e232c2a0012d247e7326893
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    fuse: add support for revision artifacts
    
    Closes T2663.
    
    - Add a `SymlinkEntry` class (+ rework fs/ class init using dataclasses)
    - Support symlinks, submodules, and mounting revisions
    - Basic unit tests for revision artifacts

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/46/ for more details.

Build is green

Patch application report for D4200 (id=14851)

Rebasing onto 74607ba9d0...

Current branch diff-target is up to date.
Changes applied before test
commit 4ce90f555f1a639fe9a8b08bc5fdc82dabda42b4
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    fuse: add support for revision artifacts
    
    Closes T2663.
    
    - Add a `SymlinkEntry` class (+ rework fs/ class init using dataclasses)
    - Support symlinks, submodules, and mounting revisions
    - Basic unit tests for revision artifacts

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/47/ for more details.

Rework parents/ directories and symlink, now a parents/ is always present and
parent/ is a symlink to parents/1/ if the commit has at least one parent.

Build is green

Patch application report for D4200 (id=14883)

Rebasing onto 74607ba9d0...

Current branch diff-target is up to date.
Changes applied before test
commit c0c992ed725d26404ffb63f478841e41d5ab39cb
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    fuse: add support for revision artifacts
    
    Closes T2663.
    
    - Add a `SymlinkEntry` class (+ rework fs/ class init using dataclasses)
    - Support symlinks, submodules, and mounting revisions
    - Basic unit tests for revision artifacts

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/50/ for more details.

Build is green

Patch application report for D4200 (id=14887)

Rebasing onto d759b9b3b7...

Current branch diff-target is up to date.
Changes applied before test
commit a6731c0e4cad887c167a62ea8ac937aff50f75f4
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    fuse: add support for revision artifacts
    
    Closes T2663.
    
    - Add a `SymlinkEntry` class (+ rework fs/ class init using dataclasses)
    - Support symlinks, submodules, and mounting revisions
    - Basic unit tests for revision artifacts

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/53/ for more details.

seirl requested changes to this revision.Oct 9 2020, 2:36 PM
seirl added a subscriber: seirl.
seirl added inline comments.
swh/fuse/fs/artifact.py
67

This should probably be cached so that if you call get_content() and size() you only fetch the content once?

268

Why do you create a list first and then yield it? Couldn't you just yield the elements one by one?

swh/fuse/fs/entry.py
51

Probably renaming that to get_relative_root_path() would be more explicit.

54

@zack Idempotency is not a concern here as this method doesn't have any side effects. It returns the newly created child.

swh/fuse/fs/mountpoint.py
31

Again, don't create an intermediate list

This revision now requires changes to proceed.Oct 9 2020, 2:36 PM
swh/fuse/fs/artifact.py
67

Hm sure, i didn't want to add a cache on-top of the already existing blob cache, but i can put the information in prefetch["length"]

  • Rename get_root_path to get_relative_root_path
  • Cache content size call when possible
  • Remove unnecessary intermediate lists and yield directly

Build is green

Patch application report for D4200 (id=14890)

Rebasing onto d759b9b3b7...

Current branch diff-target is up to date.
Changes applied before test
commit 6fb734ddef3b9375a70795e94d2ccfc098a18ad0
Author: Thibault Allançon <haltode@gmail.com>
Date:   Thu Oct 8 14:06:37 2020 +0200

    fuse: add support for revision artifacts
    
    Closes T2663.
    
    - Add a `SymlinkEntry` class (+ rework fs/ class init using dataclasses)
    - Support symlinks, submodules, and mounting revisions
    - Basic unit tests for revision artifacts

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/54/ for more details.

This revision is now accepted and ready to land.Oct 9 2020, 2:54 PM
This revision was automatically updated to reflect the committed changes.