Page MenuHomeSoftware Heritage

Add a model based using 'attrs' and Hypothesis strategies to generate it.
ClosedPublic

Authored by vlorentz on Apr 5 2019, 7:15 PM.

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

olasd requested changes to this revision.Apr 8 2019, 11:26 AM
olasd added a subscriber: olasd.

Thanks for putting this together!

The model.py file is missing a model for snapshots. Apart from that, the changes requested are really marginal.

I think we could also remove the validators/fields module as they're not used at all.

The test for the hypothesis strategy is also quite minimalistic ;)

swh/model/model.py
12

20 bytes ;)

42

DateWithTimezone? TimestampWithTimezone? It's not really just a date. (I know the revision/release fields are poorly named)

81–82

Both of these fields are nullable

84

This should probably be (validated as) an Enum of some sort.

104

Metadata is a nullable dict.

116

In the future, we should add a validator to prevent b'/' and b'\x00'

121

Technically, valid permissions should all be > 512

In arbitrary loaders, we normalize this way (swh.model.from_disk.DentryPerms) :

  • plain files have a perms of 0o100644
  • executable files have a perms of 0o100755
  • directories have a perms of 0o40000
  • symlinks have perms of 0o120000
  • submodules have perms of 0o160000

And during its early history git has also accepted files with arbitrary integer perms (so anything between 0o000000 and 0o107777), which we've ended up loading to keep the proper directory ids. [tangent ahead] There's also the case of the rugged (ruby git library used by... GitHub) bug that would zero-pad permissions on trees (so serializing them as 040000 instead of 40000), which means some directories actually have two valid ids. It's great. [/tangent]

Anyway, doesn't sound like it's /that/ easy to validate directory entry perms :P.

This revision now requires changes to proceed.Apr 8 2019, 11:26 AM
swh/model/model.py
42

ardumont suggested I rename it to GitDate

swh/model/model.py
42

It's really not a git date, git only supports second granularity.

swh/model/model.py
42

Ok. What about TzTimestamp?

swh/model/model.py
42

I'd prefer the acronym to be expanded, but apart from that, really, *shrug*

vlorentz marked 5 inline comments as done.

Apply @olasd's comments.

Allow arbitrary ints as perms.

Use attrs' validation in tests.

Make objects() generate a single object instead of a list.

This revision is now accepted and ready to land.Apr 8 2019, 2:17 PM
  • rebase
  • fix size of sha1_git on my version of hypothesis
  • fix target_type in branch_targets_object
This revision was automatically updated to reflect the committed changes.