Page MenuHomeSoftware Heritage

Tag model entities with their "type"
AcceptedPublic

Authored by douardda on Wed, May 13, 5:10 PM.

Details

Summary

this aims at preventing constant usage of isinstance() based dispatch
code when writing generic code handling model entities.

For example, the "object_type" argument of JournalWriter.write_addition() has
become superflous now we only pass model entities, etc.

This idea comes olasd's reading of mypy doc:

https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

Diff Detail

Event Timeline

douardda created this revision.Wed, May 13, 5:10 PM

Build has FAILED

Patch application report for D3152 (id=11193)

Rebasing onto 091498e3c7...

Current branch diff-target is up to date.
Changes applied before test
commit bbb1ada2922f3d8fccc03bed3aad516c01f9b6b7
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/54/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/54/console

Great! You know how little I like isinstance ;)

But I think you would need to add tag: Union[Literal["content"], Literal["directory"], ...] to BaseModel for it to work without having to repeat the Union type everywhere.

And I'm not a big fan of the name. What about object_type, as we already use this name in other places?

Build is green

Patch application report for D3152 (id=11193)

Rebasing onto 091498e3c7...

Current branch diff-target is up to date.
Changes applied before test
commit bbb1ada2922f3d8fccc03bed3aad516c01f9b6b7
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/55/ for more details.

olasd added a subscriber: olasd.Wed, May 13, 5:37 PM

Great! You know how little I like isinstance ;)

+1

But I think you would need to add tag: Union[Literal["content"], Literal["directory"], ...] to BaseModel for it to work without having to repeat the Union type everywhere.

When using this form of object type dispatching, we really should use an explicit Union of the supported BaseModel subclasses. Because there's very few cases where we actually support BaseModels themselves.

And I'm not a big fan of the name. What about object_type, as we already use this name in other places?

Yeah, I agree with that.

I like the idea too (except the naming, should be object_type instead of tag).

One thing to know for production deployment is that Literal and Final have been added in typing-extensions 3.7.4
and version packaged in buster is 3.7.2.

This means typing-extensions package from buster-backports must be used.

swh/model/model.py
94

I would rather use Final from typing-extensions to avoid repeating the literal string.

tag: Final = "person"
ardumont added inline comments.
swh/model/model.py
94

That's even more than that, that will even make it unmodifiable [1]. We can also adds its type Final[str].

So that suits us even better ;)

[1] Quoting the linked documentation:

You can use the typing_extensions. Final qualifier to indicate that a name or attribute should not be reassigned, redefined, or overridden. This is often useful for module and class level constants as a way to prevent unintended modification. Mypy will prevent further assignments to final names...

I like the general idea and i think @anlambert's proposal improvment is even better ;)

douardda added inline comments.Thu, May 14, 10:46 AM
swh/model/model.py
94

The advantage of Literal is it's a PEP (and has been integrated in py3.8).

I have no strong opinion on this myself, I'll do whatever we decide.

And I'm not a big fan of the name. What about object_type, as we already use this name in other places?

yes I also agree, but not a big fan of object_type either. However I have no better candidate to propose, I'm afraid. entity_type maybe ?

anlambert added inline comments.Thu, May 14, 11:31 AM
swh/model/model.py
94

Final is also a PEP and has also been integrated in Python 3.8 ;-)

vlorentz added a comment.EditedThu, May 14, 11:38 AM

And I'm not a big fan of the name. What about object_type, as we already use this name in other places?

yes I also agree, but not a big fan of object_type either. However I have no better candidate to propose, I'm afraid. entity_type maybe ?

I'm not sure "entity" a better name than "object". Plus, we already use "object" everywhere, so we would have to change that.

If we really want to move away from "object", I think it should be "artifact", to be consistent with https://docs.softwareheritage.org/devel/swh-model/data-model.html#software-artifacts

Hmm actually, "artifact" isn't great either, because we'll want to use it to mean extrinsic metadata , which aren't artifacts. Ugh...

object_type is used in the swh.model.identifiers.PersistentId in charge of
resolving SWHIDs now. It's also what we used in the journal "currently".

As we do not seem to find a better name anyway, the least friction path would
be consistency. My 2 cents.

If someone has better suggestions though, i'll comply with the team decision.

Naming is so hard...

I think we're pretty much set with object_type in the current lingo. It's very unspecific, but then again there's a bunch of different things that end up as model objects. So I suggest we keep using that.

I agree with the suggestion of using Final instead of Literal[foo].

From PEP 586:

The Final qualifier serves as a shorthand for declaring that a variable is effectively Literal.

We should also have a test checking that no two objects types have the same object_type value. This test should check all BaseModel subclasses (instead of just stuff generated by strategies.objects).

douardda updated this revision to Diff 11204.EditedFri, May 15, 9:57 AM

rename tag as object_type and use a Final instead of a Literal.

so be it :-)

Build is green

Patch application report for D3152 (id=11204)

Rebasing onto cce3036634...

First, rewinding head to replay your work on top of it...
Applying: Tag model entities with their "object_type"
Changes applied before test
commit 9b82c6036596d7a03b5035275fd7d2fff1e9e093
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/56/ for more details.

Looks good.

Do you plan to add the test olasd mentioned? [1]

We should also have a test checking that no two objects types have the same object_type value. This test should check all BaseModel subclasses (instead of just stuff generated by strategies.objects).

[1] D3152#76613

Looks good.

Do you plan to add the test olasd mentioned? [1]

We should also have a test checking that no two objects types have the same object_type value. This test should check all BaseModel subclasses (instead of just stuff generated by strategies.objects).

[1] D3152#76613

I'm on it

douardda updated this revision to Diff 11205.Fri, May 15, 10:17 AM

add the test suggested by olasd

Build is green

Patch application report for D3152 (id=11205)

Rebasing onto cce3036634...

First, rewinding head to replay your work on top of it...
Applying: Tag model entities with their "object_type"
Changes applied before test
commit 573cf20638c9bd8528da1476a6a149ff6ed562b5
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/57/ for more details.

douardda updated this revision to Diff 11206.Fri, May 15, 10:20 AM

assert object_type is unset is the class has subclasses

douardda updated this revision to Diff 11207.Fri, May 15, 10:21 AM

same, but with the actual chuck

Build is green

Patch application report for D3152 (id=11206)

Rebasing onto cce3036634...

First, rewinding head to replay your work on top of it...
Applying: Tag model entities with their "object_type"
Changes applied before test
commit e84d636591b3d7eb9f35921661ddc2675ffc37f9
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/58/ for more details.

Build has FAILED

Patch application report for D3152 (id=11207)

Rebasing onto cce3036634...

First, rewinding head to replay your work on top of it...
Applying: Tag model entities with their "object_type"
Changes applied before test
commit c919677105f9048c28749ddede877feaaafe34c3
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed May 13 17:03:01 2020 +0200

    Tag model entities with their "object_type"
    
    this aims at preventing constant usage of isinstance() based dispatch
    code when writing generic code handling model entities.
    
    For example, the "object_type" argument of JournalWriter.write_addition() has
    become superflous now we only pass model entities, etc.
    
    This idea comes olasd's reading of mypy doc:
    
      https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/59/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/59/console

ardumont accepted this revision.Fri, May 15, 10:44 AM

Looks good (remains to unstuck the build ;)

This revision is now accepted and ready to land.Fri, May 15, 10:44 AM