Page MenuHomeSoftware Heritage

Tag model entities with their "type"
ClosedPublic

Authored by douardda on May 13 2020, 5:10 PM.

Details

Summary

this aims at preventing constant usage of isinstance() based dispatch
code when writing generic code handling model entities.

For example, the "object_type" argument of JournalWriter.write_addition() has
become superflous now we only pass model entities, etc.

This idea comes olasd's reading of mypy doc:

https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

Diff Detail

Repository
rDMOD Data model
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D3152 (id=11193)

Rebasing onto 091498e3c7...

Current branch diff-target is up to date.
Changes applied before test
commit bbb1ada2922f3d8fccc03bed3aad516c01f9b6b7
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/54/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/54/console

Great! You know how little I like isinstance ;)

But I think you would need to add tag: Union[Literal["content"], Literal["directory"], ...] to BaseModel for it to work without having to repeat the Union type everywhere.

And I'm not a big fan of the name. What about object_type, as we already use this name in other places?

Build is green

Patch application report for D3152 (id=11193)

Rebasing onto 091498e3c7...

Current branch diff-target is up to date.
Changes applied before test
commit bbb1ada2922f3d8fccc03bed3aad516c01f9b6b7
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/55/ for more details.

Great! You know how little I like isinstance ;)

+1

But I think you would need to add tag: Union[Literal["content"], Literal["directory"], ...] to BaseModel for it to work without having to repeat the Union type everywhere.

When using this form of object type dispatching, we really should use an explicit Union of the supported BaseModel subclasses. Because there's very few cases where we actually support BaseModels themselves.

And I'm not a big fan of the name. What about object_type, as we already use this name in other places?

Yeah, I agree with that.

I like the idea too (except the naming, should be object_type instead of tag).

One thing to know for production deployment is that Literal and Final have been added in typing-extensions 3.7.4
and version packaged in buster is 3.7.2.

This means typing-extensions package from buster-backports must be used.

swh/model/model.py
104

I would rather use Final from typing-extensions to avoid repeating the literal string.

tag: Final = "person"
ardumont added inline comments.
swh/model/model.py
104

That's even more than that, that will even make it unmodifiable [1]. We can also adds its type Final[str].

So that suits us even better ;)

[1] Quoting the linked documentation:

You can use the typing_extensions. Final qualifier to indicate that a name or attribute should not be reassigned, redefined, or overridden. This is often useful for module and class level constants as a way to prevent unintended modification. Mypy will prevent further assignments to final names...

I like the general idea and i think @anlambert's proposal improvment is even better ;)

swh/model/model.py
104

The advantage of Literal is it's a PEP (and has been integrated in py3.8).

I have no strong opinion on this myself, I'll do whatever we decide.

And I'm not a big fan of the name. What about object_type, as we already use this name in other places?

yes I also agree, but not a big fan of object_type either. However I have no better candidate to propose, I'm afraid. entity_type maybe ?

swh/model/model.py
104

Final is also a PEP and has also been integrated in Python 3.8 ;-)

And I'm not a big fan of the name. What about object_type, as we already use this name in other places?

yes I also agree, but not a big fan of object_type either. However I have no better candidate to propose, I'm afraid. entity_type maybe ?

I'm not sure "entity" a better name than "object". Plus, we already use "object" everywhere, so we would have to change that.

If we really want to move away from "object", I think it should be "artifact", to be consistent with https://docs.softwareheritage.org/devel/swh-model/data-model.html#software-artifacts

Hmm actually, "artifact" isn't great either, because we'll want to use it to mean extrinsic metadata , which aren't artifacts. Ugh...

object_type is used in the swh.model.identifiers.PersistentId in charge of
resolving SWHIDs now. It's also what we used in the journal "currently".

As we do not seem to find a better name anyway, the least friction path would
be consistency. My 2 cents.

If someone has better suggestions though, i'll comply with the team decision.

Naming is so hard...

I think we're pretty much set with object_type in the current lingo. It's very unspecific, but then again there's a bunch of different things that end up as model objects. So I suggest we keep using that.

I agree with the suggestion of using Final instead of Literal[foo].

From PEP 586:

The Final qualifier serves as a shorthand for declaring that a variable is effectively Literal.

We should also have a test checking that no two objects types have the same object_type value. This test should check all BaseModel subclasses (instead of just stuff generated by strategies.objects).

rename tag as object_type and use a Final instead of a Literal.

Build is green

Patch application report for D3152 (id=11204)

Rebasing onto cce3036634...

First, rewinding head to replay your work on top of it...
Applying: Tag model entities with their "object_type"
Changes applied before test
commit 9b82c6036596d7a03b5035275fd7d2fff1e9e093
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/56/ for more details.

Looks good.

Do you plan to add the test olasd mentioned? [1]

We should also have a test checking that no two objects types have the same object_type value. This test should check all BaseModel subclasses (instead of just stuff generated by strategies.objects).

[1] D3152#76613

Looks good.

Do you plan to add the test olasd mentioned? [1]

We should also have a test checking that no two objects types have the same object_type value. This test should check all BaseModel subclasses (instead of just stuff generated by strategies.objects).

[1] D3152#76613

I'm on it

add the test suggested by olasd

Build is green

Patch application report for D3152 (id=11205)

Rebasing onto cce3036634...

First, rewinding head to replay your work on top of it...
Applying: Tag model entities with their "object_type"
Changes applied before test
commit 573cf20638c9bd8528da1476a6a149ff6ed562b5
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/57/ for more details.

assert object_type is unset is the class has subclasses

same, but with the actual chuck

Build is green

Patch application report for D3152 (id=11206)

Rebasing onto cce3036634...

First, rewinding head to replay your work on top of it...
Applying: Tag model entities with their "object_type"
Changes applied before test
commit e84d636591b3d7eb9f35921661ddc2675ffc37f9
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/58/ for more details.

Build has FAILED

Patch application report for D3152 (id=11207)

Rebasing onto cce3036634...

First, rewinding head to replay your work on top of it...
Applying: Tag model entities with their "object_type"
Changes applied before test
commit c919677105f9048c28749ddede877feaaafe34c3
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/59/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/59/console

Looks good (remains to unstuck the build ;)

This revision is now accepted and ready to land.May 15 2020, 10:44 AM

Update the diff to make it work (see T2422)

Build is green

Patch application report for D3152 (id=11866)

Rebasing onto 661b7c2c76...

Current branch diff-target is up to date.
Changes applied before test
commit 0678fc766e535c0beb1b292e63de04d1b05c37e4
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/80/ for more details.

improve commit message + use '_' in DiskBakedContent.object_type

instead of '-', for consistency.

Build is green

Patch application report for D3152 (id=11870)

Rebasing onto 661b7c2c76...

Current branch diff-target is up to date.
Changes applied before test
commit 82818999d215e2ffc027174eafb4c0e906b714db
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions
    
    This comes with a refactoring of from_dict.DiskBackedContent to make
    it *not* inherit from model.Content: object_type being Final, it cannot
    be overloaded.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/81/ for more details.

due to changes in from_disk, I'd rather have a new review on this.

I don't see any issues with this ;)
I'll let others review as well.

swh/model/from_disk.py
49

Is that to exclude it from the comparison computation?

swh/model/model.py
1

copyright header ¯\_(ツ)_/¯

This revision is now accepted and ready to land.Jun 24 2020, 5:00 PM
swh/model/from_disk.py
49

the eq=False? yes it is. I've introduced this a few days ago because this ctime attribute really is just a helper metadata, not an "intrisinc property" of a Content.

See a95646fc

swh/model/model.py
1

thx

Build is green

Patch application report for D3152 (id=11872)

Rebasing onto 661b7c2c76...

Current branch diff-target is up to date.
Changes applied before test
commit e632abed41c5cd71507c4222e1e85987556b232c
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions
    
    This comes with a refactoring of from_dict.DiskBackedContent to make
    it *not* inherit from model.Content: object_type being Final, it cannot
    be overloaded.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/82/ for more details.