Page MenuHomeSoftware Heritage

Add an overlay storage
Needs RevisionPublic

Authored by vlorentz on May 16 2022, 7:54 PM.

Details

Reviewers
douardda
Group Reviewers
Reviewers
Summary

I will use it to locally test incremental loads of large repositories
which are already loaded in production, with read-only access of the
production storage.

This may also be used in the future to access storages while
migrating/replicating.

Test Plan

This is fully tested when the backend storages are empty (by inheriting the usual tests);
and never tested when they are not (for now)

I also tried loading linux.git with this config:

storage:
  cls: pipeline
  steps:
    - cls: buffer
    - cls: filter
    - cls: overlay
      storages:
        - cls: memory
        - cls: remote
          url: http://webapp.internal.staging.swh.network:5002/

and it didn't use more than 4GB of RAM! (and my machined almost OOMed without the filter proxy), so it seems to work

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 29417
Build 45974: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45973: arc lint + arc unit

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.May 16 2022, 7:54 PM
Harbormaster failed remote builds in B29416: Diff 28314!

Build is green

Patch application report for D7838 (id=28314)

Rebasing onto cb12394c34...

Current branch diff-target is up to date.
Changes applied before test
commit cc02746dcbd0b96769befc231e16e7b2497d5def
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 16 19:52:29 2022 +0200

    Add an overlay storage
    
    I will use it to locally test incremental loads of large repositories
    which are already loaded in production, with read-only access of the
    production storage.
    
    This may also be used in the future to access storages while
    migrating/replicating.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1611/ for more details.

Build is green

Patch application report for D7838 (id=28315)

Rebasing onto cb12394c34...

Current branch diff-target is up to date.
Changes applied before test
commit 1760e539ea47de6f6a9f6a311aded4d564ca22d7
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 16 19:52:29 2022 +0200

    Add an overlay storage
    
    I will use it to locally test incremental loads of large repositories
    which are already loaded in production, with read-only access of the
    production storage.
    
    This may also be used in the future to access storages while
    migrating/replicating.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1612/ for more details.

Build is green

Patch application report for D7838 (id=28328)

Rebasing onto cb12394c34...

Current branch diff-target is up to date.
Changes applied before test
commit b606d636da8d4bfa446da408bacff6f4ea8bdd25
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon May 16 19:52:29 2022 +0200

    Add an overlay storage
    
    I will use it to locally test incremental loads of large repositories
    which are already loaded in production, with read-only access of the
    production storage.
    
    This may also be used in the future to access storages while
    migrating/replicating.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1613/ for more details.

douardda added inline comments.
swh/storage/proxies/overlay.py
51

wouldn't it make sense to give this a clear and specific argument name rather then being "the first of the 'storages' list"?

maybe something like:

storage:
  cls: overlay
  rw_storage: 
    cls: remote
    url: xxx
  ro_storages:
  - cls: remote
    url: xxx
  - cls: remote
    url: xxx
60

counter?

swh/storage/proxies/overlay.py
137

weird sentence, looks like a piece is missing.

swh/storage/proxies/overlay.py
83

why is this (specifically) needed?

135

not sure I would not prefer something without this Big Bad getattr. eg.

class OverlayProxyStorage:
  content_get_data = self._getter_optional("content_get_data")
  directory_get_entries = self._getter_optional("directory_get_entries")

  ...

probably not worth the effort however...

(not a surprise but) it really needs at least one test of the "overlay feature" for each method...

swh/storage/tests/storage_tests.py
563

what do we lose with these unsorted comparisions? can this be a problem?

douardda requested changes to this revision.Jun 1 2022, 1:40 PM
This revision now requires changes to proceed.Jun 1 2022, 1:40 PM
swh/storage/proxies/overlay.py
83

because other proxies have a storage attribute that is sometimes expected; and I don't want this to raise NotImplementedError

135

I'm not sure methods would be bound if we do it this way, so it would be more complicated

swh/storage/tests/storage_tests.py
563

nothing, this was already not guaranteed, and only worked in tests accidentally