Page MenuHomeSoftware Heritage

Non-incremental, non-delta Bazaar/Breezy loader
ClosedPublic

Authored by Alphare on Sep 24 2021, 6:33 PM.

Diff Detail

Repository
rDLDBZR BZR loader
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 24008
Build 37452: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Build has FAILED

Patch application report for D6344 (id=25353)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit 309742cc0cce40719236dc45daee0fe41a46220a
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.

Link to build: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/9/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/9/console

Build has FAILED

Patch application report for D6344 (id=25407)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit 309742cc0cce40719236dc45daee0fe41a46220a
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.

Link to build: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/10/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/10/console

Re-run CI now that core is up-to-date

Build has FAILED

Patch application report for D6344 (id=25435)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit 309742cc0cce40719236dc45daee0fe41a46220a
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.

Link to build: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/11/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/11/console

Update dependencies and re-run CI

swh-loader-core has been updated, this should work.

Build is green

Patch application report for D6344 (id=25450)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit ecf53c48c9a2104efa76c98941c4e30661833788
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.
    
    This also updates the requirements file to more explicitly match newer
    versions of the swh dependencies.

See https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/12/ for more details.

Alphare retitled this revision from WIP basic bzr loader to Non-incremental, non-delta Bazaar/Breezy loader.

Looks good. I trust you for the bzr part, so I didn't check it

Two generic comments:

  • Please let the logger handle string interpolation instead of using f-strings. eg. replace self.log.debug(f"Using local directory '{self.directory}'") with self.log.debug("Using local directory '%s'", self.directory). This avoids interpolating strings when the log level is higher than the message being logged.
  • Could you provide the exact instructions to recreate tarballs used in tests?

and some specific comments below:

swh/loader/bzr/loader.py
28–42

This is going to be confusing, because swh.model.model also has types Content and Directory.

Could you replace from swh.model.from_disk import Content, DentryPerms, Directory with from swh.model import from_disk?

80–88

Could you add tests for this?

199–200

this should probably match some sort of delimiter after the format version (eg. if a future version uses "Bazaar repository format 2abc")

Also, it looks like this would fail with "needs upgrade" in the presence of repositories newer than this.

226–228

We would probably want to store a dangling branch for this.

238–240

ditto

251

what is a branch nick?

267

brand-new method, added last week :)

269

We need to discuss this format. Could you open a task and ping olasd and I?

We are working on changes to TimestampWithTimezone, which makes it more extensible.

272

another task for this. We could settle on GitHub's ad-hoc Co-authored-by format: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors but this needs a discussion

290–302
350

what does it mean when entry.has_text() is falsy?

396

should be faster with a regular list, since you're only using it as a stack

456

dead code?

459

could you add a docstring?

vlorentz added inline comments.
swh/loader/bzr/loader.py
269

actually, I just did: T3886

I'm answering your questions now so you can get a change to come back to me before I start working on this again tomorrow.

Looks good. I trust you for the bzr part, so I didn't check it

Two generic comments:

  • Please let the logger handle string interpolation instead of using f-strings. eg. replace self.log.debug(f"Using local directory '{self.directory}'") with self.log.debug("Using local directory '%s'", self.directory). This avoids interpolating strings when the log level is higher than the message being logged.

Sure, I should have thought of that, it's not the first time either, sorry about that. :)

  • Could you provide the exact instructions to recreate tarballs used in tests?

I will have to retrace my steps, but yes, it's probably a good idea. Would a shell script work?

swh/loader/bzr/loader.py
199–200

this should probably match some sort of delimiter after the format version (eg. if a future version uses "Bazaar repository format 2abc")

Sure, good point

Also, it looks like this would fail with "needs upgrade" in the presence of repositories newer than this.

This is on purpose since it's very unlikely that Breezy will get a new repository format. In the case that it does, we will need to update this code anyway. Should the exception carry more detail in its docstring to encourage checking for a newer format if brz upgrade does work?

226–228

I'm not 100% sure how to do that. Is the target just the empty string as git_objects.py explains?

251

The name associated with the branch in the clone source. It's usually the name of the project, but not always. It's not really meaningful in our case since we're keeping metadata about the project including its name anyway and we only have one branch per repository.

272

Done in T3887 and T3888

350

It means it's from an old tree that did not have the "weave index" for all inventory entries (something to help with merges IIUC), basically this is for backwards compat and the text is functionally empty.

456

Ah this will only be useful with the incremental loader which (most likely) will arrive in the next patch. I should probably remove the whole storage fallback for now.

I will have to retrace my steps, but yes, it's probably a good idea. Would a shell script work?

Yes, perfect.

swh/loader/bzr/loader.py
199–200

I would expect something like this:

if format is older:
    raise RepositoryNeedsUpgrade()
elif format is newer:
    assert False, "oh no, we need to update this loader for format XXX"
226–228

er, I think so

Arcanist is bugged with the new PHP 8.1, so I'll wait until I have addressed everything to try to re-send, in the mean time, some questions

swh/loader/bzr/loader.py
226–228

Looking at the code in swh-model/swh/model.git_objects.py, dangling branches are not exposed as a TargetType variant but hardcoded bytes, so there appears to be no easy way for me to do it from the outside. Is there?

267

This does not seems to be valid, and I'm not sure exactly how to use the new method to be honest.

swh/loader/bzr/loader.py
226–228

SnapshotBranch(target=b"", target_type=TargetType.REVISION)

267

why not?

swh/loader/bzr/loader.py
226–228

I had tried this, and it gives the following error: `swh.storage.exc.StorageArgumentException: value for domain sha1_git violates check constraint "sha1_git_check"
CONTEXT: COPY tmp_release, line 1, column target: "\x"`

swh/loader/bzr/loader.py
226–228

heh, alright, it's not worth the trouble then

olasd added inline comments.
swh/loader/bzr/loader.py
226–228

A dangling branch is stored in the Snapshot as a None instead of the SnapshotBranch object.

swh/loader/bzr/loader.py
226–228

Aaah, that's what I was missing, thanks

Alphare marked 7 inline comments as done.

Incorporate feedback

Build is green

Patch application report for D6344 (id=25572)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit d98c7c1e1f7b795886a5da49f6c118da91c80007
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.
    
    This also updates the requirements file to more explicitly match newer
    versions of the swh dependencies.

See https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/13/ for more details.

Build is green

Patch application report for D6344 (id=25586)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit 12fdab467acd1888a34664d201fd51acdb730f58
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.
    
    This also updates the requirements file to more explicitly match newer
    versions of the swh dependencies.

See https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/14/ for more details.

Fix mixed up tarballs and associated tests

Build is green

Patch application report for D6344 (id=25608)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit 0114e255c83190063bc2b6e46e8ccc77fa0048a8
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.
    
    This also updates the requirements file to more explicitly match newer
    versions of the swh dependencies.

See https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/15/ for more details.

mypy.ini
15–17

This shouldn't be needed as long as they're installed in the venv in which mypy runs

22

Please add a trailing newline

Build has FAILED

Patch application report for D6344 (id=25653)

Could not rebase; Attempt merge onto 347c7b8375...

Updating 347c7b8..ed5c1cc
Fast-forward
 .gitignore                                         |  14 +
 .pre-commit-config.yaml                            |   1 -
 swh/foo/loader.py => conftest.py                   |  10 +-
 mypy.ini                                           |   3 +
 requirements-swh.txt                               |   6 +-
 requirements-test.txt                              |   4 +
 requirements.txt                                   |   1 +
 setup.py                                           |  13 +-
 swh/foo/tests/__init__.py                          |   0
 swh/foo/tests/test_loader.py                       |   3 -
 swh/loader/__init__.py                             |   3 +
 swh/loader/bzr/__init__.py                         |  16 +
 swh/loader/bzr/loader.py                           | 609 +++++++++++++++++++++
 swh/{foo => loader/bzr}/py.typed                   |   0
 swh/{foo => loader/bzr/tests}/__init__.py          |   0
 swh/loader/bzr/tests/conftest.py                   |  39 ++
 swh/loader/bzr/tests/data/broken-tags.sh           |   9 +
 swh/loader/bzr/tests/data/broken-tags.tgz          | Bin 0 -> 980 bytes
 swh/loader/bzr/tests/data/empty.sh                 |   6 +
 swh/loader/bzr/tests/data/empty.tgz                | Bin 0 -> 962 bytes
 swh/loader/bzr/tests/data/ghosts.py                |  13 +
 swh/loader/bzr/tests/data/ghosts.tgz               | Bin 0 -> 2451 bytes
 .../bzr/tests/data/metadata-and-type-changes.sh    |  38 ++
 .../bzr/tests/data/metadata-and-type-changes.tgz   | Bin 0 -> 13831 bytes
 swh/loader/bzr/tests/data/needs-upgrade.sh         |   7 +
 swh/loader/bzr/tests/data/needs-upgrade.tgz        | Bin 0 -> 880 bytes
 swh/loader/bzr/tests/data/no-branch.sh             |  10 +
 swh/loader/bzr/tests/data/no-branch.tgz            | Bin 0 -> 882 bytes
 swh/loader/bzr/tests/data/nominal.sh               |  36 ++
 swh/loader/bzr/tests/data/nominal.tgz              | Bin 0 -> 11366 bytes
 swh/loader/bzr/tests/data/renames.sh               |  17 +
 swh/loader/bzr/tests/data/renames.tgz              | Bin 0 -> 5068 bytes
 swh/loader/bzr/tests/test_loader.py                | 410 ++++++++++++++
 tox.ini                                            |   4 +-
 34 files changed, 1256 insertions(+), 16 deletions(-)
 create mode 100644 .gitignore
 rename swh/foo/loader.py => conftest.py (52%)
 delete mode 100644 swh/foo/tests/__init__.py
 delete mode 100644 swh/foo/tests/test_loader.py
 create mode 100644 swh/loader/__init__.py
 create mode 100644 swh/loader/bzr/__init__.py
 create mode 100644 swh/loader/bzr/loader.py
 rename swh/{foo => loader/bzr}/py.typed (100%)
 rename swh/{foo => loader/bzr/tests}/__init__.py (100%)
 create mode 100644 swh/loader/bzr/tests/conftest.py
 create mode 100644 swh/loader/bzr/tests/data/broken-tags.sh
 create mode 100644 swh/loader/bzr/tests/data/broken-tags.tgz
 create mode 100644 swh/loader/bzr/tests/data/empty.sh
 create mode 100644 swh/loader/bzr/tests/data/empty.tgz
 create mode 100644 swh/loader/bzr/tests/data/ghosts.py
 create mode 100644 swh/loader/bzr/tests/data/ghosts.tgz
 create mode 100644 swh/loader/bzr/tests/data/metadata-and-type-changes.sh
 create mode 100644 swh/loader/bzr/tests/data/metadata-and-type-changes.tgz
 create mode 100644 swh/loader/bzr/tests/data/needs-upgrade.sh
 create mode 100644 swh/loader/bzr/tests/data/needs-upgrade.tgz
 create mode 100644 swh/loader/bzr/tests/data/no-branch.sh
 create mode 100644 swh/loader/bzr/tests/data/no-branch.tgz
 create mode 100644 swh/loader/bzr/tests/data/nominal.sh
 create mode 100644 swh/loader/bzr/tests/data/nominal.tgz
 create mode 100644 swh/loader/bzr/tests/data/renames.sh
 create mode 100644 swh/loader/bzr/tests/data/renames.tgz
 create mode 100644 swh/loader/bzr/tests/test_loader.py
Changes applied before test
commit ed5c1ccfbe291166a0c9b394210c6d6560d5018c
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Feb 2 10:31:57 2022 +0100

    Add incremental support for the bzr loader

commit 0114e255c83190063bc2b6e46e8ccc77fa0048a8
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.
    
    This also updates the requirements file to more explicitly match newer
    versions of the swh dependencies.

Link to build: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/19/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/19/console

Build has FAILED

Patch application report for D6344 (id=25657)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit 5444526a2c35d7169de6031fd0394808ed307c19
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.
    
    This also updates the requirements file to more explicitly match newer
    versions of the swh dependencies.

Link to build: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/20/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/20/console

Build has FAILED

Patch application report for D6344 (id=25658)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit 04b9a34cf7419e1d51bb05952f44713d420c18c9
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.
    
    This also updates the requirements file to more explicitly match newer
    versions of the swh dependencies.

Link to build: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/21/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/21/console

Build is green

Patch application report for D6344 (id=25663)

Rebasing onto 347c7b8375...

Current branch diff-target is up to date.
Changes applied before test
commit 04b9a34cf7419e1d51bb05952f44713d420c18c9
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Fri Sep 24 17:08:27 2021 +0200

    Add a non-optimized non-incremental Bazaar loader
    
    The work for optimizing the loader has been paved with issues, in
    the mean time, this works and seems robust enough on a test run
    on all launchpad projects.
    
    The incremental logic will be added in a future patch.
    
    This also updates the requirements file to more explicitly match newer
    versions of the swh dependencies.

See https://jenkins.softwareheritage.org/job/DLDBZR/job/tests-on-diff/24/ for more details.

This revision is now accepted and ready to land.Feb 7 2022, 2:29 PM