Page MenuHomeSoftware Heritage

Extract revision's extra_header as a top level attribute
ClosedPublic

Authored by douardda on Jul 6 2020, 10:44 AM.

Details

Summary

Follows swh.model's evolution for the Revision model class.

In Postgresql, store the extra_headers as a bytea[][].

In Cassandra, store them as msgpack dumped blob.

This needs (unpublished) D3389 to pass.

Related to T2423.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D3426 (id=12127)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit 36b6fcc5d6267e084feb415d9de17013055604b6
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    
    In Cassandra, store them as msgpack dumped blob.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/410/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/410/console

bump dep swh-model>= 0.4.0

Build has FAILED

Patch application report for D3426 (id=12144)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit 580ebe29ed224cb5c17ffb90e4604797cfc2b3c4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    
    In Cassandra, store them as msgpack dumped blob.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/411/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/411/console

bump deps for swh-journal and swh-core

Build has FAILED

Patch application report for D3426 (id=12147)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit a2eb6f9c91a5b9a9b3e6acd4cb523647c4e675f6
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    
    In Cassandra, store them as msgpack dumped blob.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/412/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/412/console

Build is green

Patch application report for D3426 (id=12147)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit a2eb6f9c91a5b9a9b3e6acd4cb523647c4e675f6
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    
    In Cassandra, store them as msgpack dumped blob.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/413/ for more details.

Build is green

Patch application report for D3426 (id=12147)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit a2eb6f9c91a5b9a9b3e6acd4cb523647c4e675f6
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    
    In Cassandra, store them as msgpack dumped blob.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/414/ for more details.

Looks nice.

Few comments:

  • I'm not convinced about the Cassandra storage of these. Have you tried a list <frozen list<blob>> or somesuch?
  • Would it be possible to test the backwards-compatibility feature? That probably means monkey-patching Revision.to_dict() to put extra_headers back in metadata on the way _in_ to the database, but that would make me much more confident in the diff.
In D3426#84268, @olasd wrote:

Looks nice.

Few comments:

  • I'm not convinced about the Cassandra storage of these. Have you tried a list <frozen list<blob>> or somesuch?

no idea how to do such a thing, let me have a look

  • Would it be possible to test the backwards-compatibility feature? That probably means monkey-patching Revision.to_dict() to put extra_headers back in metadata on the way _in_ to the database, but that would make me much more confident in the diff.

I'll try to.

use frozen<list<list<blob>>> in cassandra's schema for extra_headers

Build is green

Patch application report for D3426 (id=12175)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit 6674c506ca304ced3da7e5ed137c6aafac25e7f6
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    
    In Cassandra, store them as frozen<list<list<blob>>>.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/415/ for more details.

ardumont added a subscriber: ardumont.

For the sake of the missing redefinitions in sql upgrade (and some missing test cases).

Otherwise looks good.

sql/upgrades/158.sql
13

You are missing the redefinition of swh_revision_log, swh_revision_add, swh_content_list_by_object_id.

swh/storage/cassandra/converters.py
48

Use the same conditional as in the prior method above?

if not extra_headers and metadata and "extra_headers" in metadata:
58

That's probably what olasd mentioned already but here we go.
Could you add the necessary use cases test around revision_from_db and revision_to_db?

We can probably just unit-test those...

swh/storage/sql/40-swh-func.sql
456

indentation is off.

This revision now requires changes to proceed.Jul 7 2020, 10:52 AM

add a test for "true" bw compat, aka old data in pg still works

(spoil: was not)

Build is green

Patch application report for D3426 (id=12213)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit a0d93247d39befb4f56370079a61bfccb5844022
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    Ensure data present in postgres with extra_headers in the metadata field
    are properly supported by the pg-backed storage.
    Get rid of the (now useless) git_headers_to_db() converter function.
    
    In Cassandra, store them as frozen<list<list<blob>>>.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/416/ for more details.

ardumont removed a reviewer: ardumont.
ardumont removed a reviewer: Reviewers.
This revision is now accepted and ready to land.Jul 7 2020, 3:40 PM

Build is green

Patch application report for D3426 (id=12214)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit aecaec198dfee661086c6c621519f06b717a37a9
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    Ensure data present in postgres with extra_headers in the metadata field
    are properly supported by the pg-backed storage.
    Get rid of the (now useless) git_headers_to_db() converter function.
    
    In Cassandra, store them as frozen<list<list<blob>>>.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/417/ for more details.

anlambert added a subscriber: anlambert.
anlambert added inline comments.
swh/storage/sql/30-swh-schema.sql
20

You need to upgrade dbversion here.

This revision now requires changes to proceed.Jul 7 2020, 4:16 PM
sql/upgrades/158.sql
14–39

You can use this instead.

alter type revision_entry add attribute (extra_headers bytea[][]);
swh/storage/sql/30-swh-schema.sql
20

jeez, thx

swh/storage/sql/30-swh-schema.sql
20

"given enough eyeballs..."

fix sql schema and simplify the migration script

as reported by anlambert

This revision is now accepted and ready to land.Jul 7 2020, 4:51 PM

Build is green

Patch application report for D3426 (id=12222)

Rebasing onto 80108480a6...

Current branch diff-target is up to date.
Changes applied before test
commit 5ab70237d6c3b8ed357f2ae03589353b8d6e5052
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 3 11:45:06 2020 +0200

    Extract revision's extra_header as a top level attribute
    
    Follows swh.model's evolution for the Revision model class.
    
    In Postgresql, store the extra_headers as a bytea[][].
    Ensure data present in postgres with extra_headers in the metadata field
    are properly supported by the pg-backed storage.
    Get rid of the (now useless) git_headers_to_db() converter function.
    
    In Cassandra, store them as frozen<list<list<blob>>>.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/418/ for more details.