Page MenuHomeSoftware Heritage

model: Add Directory.from_possibly_duplicated_entries factory
ClosedPublic

Authored by vlorentz on Jul 6 2022, 11:37 AM.

Details

Reviewers
ardumont
Group Reviewers
Reviewers
Maniphest Tasks
Restricted Maniphest Task
Commits
rDMOD0f7a1cbecaec: model: Add Directory.from_possibly_duplicated_entries factory
Summary

It will be used by swh.storage.backfiller (so indirectly, swh.scrubber)
to load directories from the postgresql database, whose schema accidentally
allowed directories with duplicate entries -- without corrupting the
shape of the directory too much.

This will be required to resolve T4381.

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8080 (id=29177)

Rebasing onto 4073b824dd...

Current branch diff-target is up to date.
Changes applied before test
commit 37541d8dc7ec82f53cba35444a5fd5dfd80db5d9
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 6 11:37:06 2022 +0200

    model: Add Directory.from_possibly_duplicated_entries factory
    
    It will be used by swh.storage.backfiller (so indirectly, swh.scrubber)
    to load directories from the postgresql database, whose schema accidentally
    allowed directories with duplicate entries -- without corrupting the
    shape of the directory too much.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/485/ for more details.

add test_directory_from_possibly_duplicated_entries__preserve_manifest

Build is green

Patch application report for D8080 (id=29179)

Rebasing onto 4073b824dd...

Current branch diff-target is up to date.
Changes applied before test
commit 0f7a1cbecaecdfae124c749e76a78025b4088260
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Wed Jul 6 11:37:06 2022 +0200

    model: Add Directory.from_possibly_duplicated_entries factory
    
    It will be used by swh.storage.backfiller (so indirectly, swh.scrubber)
    to load directories from the postgresql database, whose schema accidentally
    allowed directories with duplicate entries -- without corrupting the
    shape of the directory too much.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/486/ for more details.

Thanks.

lgtm, one typo inline.

Implementation wise, i just don't get the raw manifest computation you did (which
works)! (the one done when no raw manifest is provided).

swh/model/model.py
1036

I don't get what this does and trying it out in ipython does not help either:

In [1]:  type("", (), {})()
Out[1]: <__main__. at 0x7f3a7fa8dca0>
In [5]: type?
Init signature: type(self, /, *args, **kwargs)
Docstring:
type(object_or_name, bases, dict)
type(object) -> the object's type
type(name, bases, dict) -> a new type                      # <--- ok but still...
Type:           type
Subclasses:     ABCMeta, EnumMeta, NamedTupleMeta, _TypedDictMeta, _ABC, MetaHasDescriptors, LexerMeta, StyleMeta, _NormalizerMeta, CachedMetaClass, ...
1055

makes sense, yes.

1077
This revision is now accepted and ready to land.Jul 8 2022, 10:03 AM
swh/model/model.py
1036

it creates a new empty object of anonymous type that you can assign attributes to