Page MenuHomeSoftware Heritage

model: Add Directory.from_possibly_duplicated_entries factory
ClosedPublic

Authored by vlorentz on Jul 6 2022, 11:37 AM.

Details

Reviewers
ardumont
Group Reviewers
Reviewers
Maniphest Tasks
Restricted Maniphest Task
Commits
rDMOD0f7a1cbecaec: model: Add Directory.from_possibly_duplicated_entries factory
Summary

It will be used by swh.storage.backfiller (so indirectly, swh.scrubber)
to load directories from the postgresql database, whose schema accidentally
allowed directories with duplicate entries -- without corrupting the
shape of the directory too much.

This will be required to resolve T4381.

Diff Detail

Repository
rDMOD Data model
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 30268
Build 47318: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 47317: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8080 (id=29177)

Rebasing onto 4073b824dd...

Current branch diff-target is up to date.
Changes applied before test
commit 37541d8dc7ec82f53cba35444a5fd5dfd80db5d9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 6 11:37:06 2022 +0200

    model: Add Directory.from_possibly_duplicated_entries factory
    
    It will be used by swh.storage.backfiller (so indirectly, swh.scrubber)
    to load directories from the postgresql database, whose schema accidentally
    allowed directories with duplicate entries -- without corrupting the
    shape of the directory too much.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/485/ for more details.

add test_directory_from_possibly_duplicated_entries__preserve_manifest

Build is green

Patch application report for D8080 (id=29179)

Rebasing onto 4073b824dd...

Current branch diff-target is up to date.
Changes applied before test
commit 0f7a1cbecaecdfae124c749e76a78025b4088260
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 6 11:37:06 2022 +0200

    model: Add Directory.from_possibly_duplicated_entries factory
    
    It will be used by swh.storage.backfiller (so indirectly, swh.scrubber)
    to load directories from the postgresql database, whose schema accidentally
    allowed directories with duplicate entries -- without corrupting the
    shape of the directory too much.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/486/ for more details.

Thanks.

lgtm, one typo inline.

Implementation wise, i just don't get the raw manifest computation you did (which
works)! (the one done when no raw manifest is provided).

swh/model/model.py
1037

I don't get what this does and trying it out in ipython does not help either:

In [1]:  type("", (), {})()
Out[1]: <__main__. at 0x7f3a7fa8dca0>
In [5]: type?
Init signature: type(self, /, *args, **kwargs)
Docstring:
type(object_or_name, bases, dict)
type(object) -> the object's type
type(name, bases, dict) -> a new type                      # <--- ok but still...
Type:           type
Subclasses:     ABCMeta, EnumMeta, NamedTupleMeta, _TypedDictMeta, _ABC, MetaHasDescriptors, LexerMeta, StyleMeta, _NormalizerMeta, CachedMetaClass, ...
1056

makes sense, yes.

1078
This revision is now accepted and ready to land.Jul 8 2022, 10:03 AM
swh/model/model.py
1037

it creates a new empty object of anonymous type that you can assign attributes to