Details

Reviewers

olasd

Group Reviewers

Reviewers

Maniphest Tasks

T2423: Extract the `extra_headers` away from `Revision.metadata` into a top-level immutable object

Commits

rDMODa7d9aca2b0df: Extract the extra_headers from metadata on the Revision model class

Summary

Add a new extra_headers attribute on Revision and use it for computing
the revision's id instead of extract it from the metadata field.

Add a post init hook to Revision to initialize this new attribute from
given metadata, if any, for bw compat.

Related to T2423.

Diff Detail

Repository

rDMOD Data model

Branch

master

Lint

Lint Skipped

Unit

Unit Tests Skipped

Build Status

Buildable 13355
Build 20412: Phabricator diff pipeline on jenkins	Jenkins console · Jenkins
Build 20411: arc lint + arc unit

Event Timeline

douardda created this revision.Jul 1 2020, 3:18 PM

Herald added a reviewer: Reviewers. · View Herald TranscriptJul 1 2020, 3:18 PM

Build is green

Patch application report for D3389 (id=12022)

Rebasing onto e632abed41...

Current branch diff-target is up to date.

Changes applied before test

commit f480e1f8635df16aa52a50f2c0403f2000261215
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 1 15:13:59 2020 +0200

    Extract the extra_headers from metadata on the Revision model class
    
    Add a new extra_headers attribute on Revision and use it for computing
    the revision's id instead of extract it from the metadata field.
    
    Add a post init hook to Revision to initialize this new attribute from
    given metadata, if any, for bw compat.
    
    Related to T2423.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/84/ for more details.

Harbormaster completed remote builds in B13239: Diff 12022.Jul 1 2020, 3:19 PM

I'm not sure about popping the extra_headers off of the incoming metadata field right now. This feels like something we want to do long term, but if you do that right now, it means you're putting upgrades of swh.model and swh.storage (with support with the new field) and all loaders in lockstep of one another. If you upgrade swh.model now, you'll lose the extra_headers until swh.storage will be able to store them.

For now I'd be more comfortable just populating the new field, without touching the old one.

I guess the next steps would be:

updating the storage backends to be able to store and retrieve data in the new field.
deploying that.
updating loaders to use the new field for writes and reads (svn loader).
then you can probably pop the extra_headers off of the metadata dict.

Finally, we can

migrate the remaining data in the postgres storage
stop mangling the metadata dict in swh.model

Apart from that, I'll echo the comment I made on IRC about making the "extra header value" type stricter: we should stick to bytes at some point; but that can only happen once all the legacy data is migrated.

improve bw-compat support, tests and hypothesis strategies

Build has FAILED

Patch application report for D3389 (id=12025)

Rebasing onto 40a40f508b...

First, rewinding head to replay your work on top of it...
Applying: Extract the extra_headers from metadata on the Revision model class

Changes applied before test

commit 5673b8383b7f432fbd9121f5814e2567d87e970b
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 1 15:13:59 2020 +0200

    Extract the extra_headers from metadata on the Revision model class
    
    Add a new extra_headers attribute on Revision and use it for computing
    the revision's id instead of extract it from the metadata field.
    
    Add a post init hook to Revision to initialize this new attribute from
    given metadata, if any, for bw compat.
    
    Also amend the revision_d hyptothesis strategy to generate extra_headers.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/85/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/85/console

Harbormaster failed remote builds in B13244: Diff 12025!Jul 1 2020, 4:05 PM

restrict extra_headers to (bytes, bytes) only

we actually do not support anything else, really (for values at least).

Build has FAILED

Patch application report for D3389 (id=12026)

Rebasing onto 40a40f508b...

First, rewinding head to replay your work on top of it...
Applying: Extract the extra_headers from metadata on the Revision model class

Changes applied before test

commit 6090209793a330063ece83b8423cb983bd48cbb5
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 1 15:13:59 2020 +0200

    Extract the extra_headers from metadata on the Revision model class
    
    Add a new extra_headers attribute on Revision and use it for computing
    the revision's id instead of extract it from the metadata field.
    
    Only accept (bytes, bytes) as extra_header.
    
    Add a post init hook to Revision to initialize this new attribute from
    given metadata, if any, for bw compat.
    
    Also amend the revision_d hyptothesis strategy to generate extra_headers.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/86/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/86/console

Harbormaster failed remote builds in B13245: Diff 12026!Jul 1 2020, 5:23 PM

make mypy happy (hopefully)

Build has FAILED

Patch application report for D3389 (id=12027)

Rebasing onto 40a40f508b...

First, rewinding head to replay your work on top of it...
Applying: Extract the extra_headers from metadata on the Revision model class

Changes applied before test

commit 210f4974b8bef8a2d53e7c1b3f434c28dc46aaf2
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 1 15:13:59 2020 +0200

    Extract the extra_headers from metadata on the Revision model class
    
    Add a new extra_headers attribute on Revision and use it for computing
    the revision's id instead of extract it from the metadata field.
    
    Only accept (bytes, bytes) as extra_header.
    
    Add a post init hook to Revision to initialize this new attribute from
    given metadata, if any, for bw compat.
    
    Also amend the revision_d hyptothesis strategy to generate extra_headers.

Link to build: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/87/
See console output for more information: https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/87/console

Harbormaster failed remote builds in B13246: Diff 12027!Jul 1 2020, 5:41 PM

more mypy vs. attrs-strict fighting

I give up

Build is green

Patch application report for D3389 (id=12028)

Rebasing onto 40a40f508b...

First, rewinding head to replay your work on top of it...
Applying: Extract the extra_headers from metadata on the Revision model class

Changes applied before test

commit e0457febc0dbe8695a3ac87dade9e60e03ac69bf
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 1 15:13:59 2020 +0200

    Extract the extra_headers from metadata on the Revision model class
    
    Add a new extra_headers attribute on Revision and use it for computing
    the revision's id instead of extract it from the metadata field.
    
    Only accept (bytes, bytes) as extra_header.
    
    Add a post init hook to Revision to initialize this new attribute from
    given metadata, if any, for bw compat.
    
    Also amend the revision_d hyptothesis strategy to generate extra_headers.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/88/ for more details.

Harbormaster completed remote builds in B13247: Diff 12028.Jul 1 2020, 6:03 PM

rebase

Build is green

Patch application report for D3389 (id=12083)

Rebasing onto 8863b5c186...

Current branch diff-target is up to date.

Changes applied before test

commit e7fe9c6b92c08933a69f1587e4f7564a0a491660
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 1 15:13:59 2020 +0200

    Extract the extra_headers from metadata on the Revision model class
    
    Add a new extra_headers attribute on Revision and use it for computing
    the revision's id instead of extract it from the metadata field.
    
    Only accept (bytes, bytes) as extra_header.
    
    Add a post init hook to Revision to initialize this new attribute from
    given metadata, if any, for bw compat.
    
    Also amend the revision_d hyptothesis strategy to generate extra_headers.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/97/ for more details.

Harbormaster completed remote builds in B13304: Diff 12083.Jul 3 2020, 12:00 PM

Accepted, with the following caveat: I'm still not sure what the plan is to deploy this now? In the current state of this diff, as soon as we deploy this on workers, data loss will occur (as the new extra_headers field isn't supported by swh.storage).

Now that I think of it, introducing a new field in swh.model needs a coordinated deployment anyway so maybe it doesn't matter (as long as the swh.storage update to support the new field occurs before the next deployment).

swh/model/hypothesis_strategies.py
234	I'm not sure we should allow empty keys. Of course, that will make validation more annoying in the model.
swh/model/tests/test_model.py
444–455	Shouldn't these two tests be split out to separate unit tests?

This revision is now accepted and ready to land.Jul 3 2020, 2:56 PM

In D3389#83915, @olasd wrote:

Accepted, with the following caveat: I'm still not sure what the plan is to deploy this now? In the current state of this diff, as soon as we deploy this on workers, data loss will occur (as the new extra_headers field isn't supported by swh.storage).

Now that I think of it, introducing a new field in swh.model needs a coordinated deployment anyway so maybe it doesn't matter (as long as the swh.storage update to support the new field occurs before the next deployment).

Yes this needs a counterpart in storage (wip), and needs a coordinated deployment.

swh/model/hypothesis_strategies.py
234	I wasn't sure about this. But each time (almost) we forbid a "degenerated" case, we can find a degenerated repo that somewhat managed to have it...
swh/model/tests/test_model.py
444–455	could do that, sure

douardda mentioned this in D3426: Extract revision's extra_header as a top level attribute.Jul 6 2020, 10:44 AM

In D3389#84078, @douardda wrote:

In D3389#83915, @olasd wrote:

Accepted, with the following caveat: I'm still not sure what the plan is to deploy this now? In the current state of this diff, as soon as we deploy this on workers, data loss will occur (as the new extra_headers field isn't supported by swh.storage).

Now that I think of it, introducing a new field in swh.model needs a coordinated deployment anyway so maybe it doesn't matter (as long as the swh.storage update to support the new field occurs before the next deployment).

Yes this needs a counterpart in storage (wip), and needs a coordinated deployment.

D3426

split the 2 test functions in parts, and fix the bwcompat hook

passing extra_headers via metadata bypassed type validation.

Build is green

Patch application report for D3389 (id=12133)

Rebasing onto 1ff05161e7...

Current branch diff-target is up to date.

Changes applied before test

commit a7d9aca2b0df15c64d4e42e263422df9908ae42b
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Jul 1 15:13:59 2020 +0200

    Extract the extra_headers from metadata on the Revision model class
    
    Add a new extra_headers attribute on Revision and use it for computing
    the revision's id instead of extract it from the metadata field.
    
    Only accept (bytes, bytes) as extra_header.
    
    Add a post init hook to Revision to initialize this new attribute from
    given metadata, if any, for bw compat.
    
    Also amend the revision_d hyptothesis strategy to generate extra_headers.

See https://jenkins.softwareheritage.org/job/DMOD/job/tests-on-diff/105/ for more details.

Harbormaster completed remote builds in B13355: Diff 12133.Jul 6 2020, 12:02 PM

douardda mentioned this in T2423: Extract the `extra_headers` away from `Revision.metadata` into a top-level immutable object.Jul 6 2020, 1:09 PM

Closed by commit rDMODa7d9aca2b0df: Extract the extra_headers from metadata on the Revision model class (authored by douardda). · Explain WhyJul 6 2020, 2:12 PM

This revision was automatically updated to reflect the committed changes.

douardda added a commit: rDMODa7d9aca2b0df: Extract the extra_headers from metadata on the Revision model class.

Extract the extra_headers from metadata on the Revision model class
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Patch application report for D3389 (id=12022)

Changes applied before test

Patch application report for D3389 (id=12025)

Changes applied before test

Patch application report for D3389 (id=12026)

Changes applied before test

Patch application report for D3389 (id=12027)

Changes applied before test

Patch application report for D3389 (id=12028)

Changes applied before test

Patch application report for D3389 (id=12083)

Changes applied before test

Patch application report for D3389 (id=12133)

Changes applied before test

Revision Contents
Changeset List

Diff 12133

swh/model/hypothesis_strategies.py

swh/model/identifiers.py

swh/model/model.py

swh/model/tests/test_identifiers.py

swh/model/tests/test_model.py

Extract the extra_headers from metadata on the Revision model classClosedPublicActions

Details

Diff Detail

Event Timeline

Patch application report for D3389 (id=12022)

Changes applied before test

Patch application report for D3389 (id=12025)

Changes applied before test

Patch application report for D3389 (id=12026)

Changes applied before test

Patch application report for D3389 (id=12027)

Changes applied before test

Patch application report for D3389 (id=12028)

Changes applied before test

Patch application report for D3389 (id=12083)

Changes applied before test

Patch application report for D3389 (id=12133)

Changes applied before test

Revision ContentsChangeset List

Diff 12133

swh/model/hypothesis_strategies.py

swh/model/identifiers.py

swh/model/model.py

swh/model/tests/test_identifiers.py

swh/model/tests/test_model.py

Extract the extra_headers from metadata on the Revision model class
ClosedPublic
Actions

Revision Contents
Changeset List