Details
- Reviewers
ardumont - Group Reviewers
Reviewers - Commits
- rDSTOe645e6370951: Implement extrinsic origin metadata specification.
Diff Detail
- Repository
- rDSTO Storage manager
- Branch
- impl-origin-metadata-spec
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 11998 Build 18194: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 18193: arc lint + arc unit
Event Timeline
sql/upgrades/148.sql | ||
---|---|---|
93 | I'm commenting here that the changes above should be all identified in swh-deposit before landing. | |
swh/storage/cassandra/cql.py | ||
801 | I'm not sure I remember correctly, so I'll ask @ardumont to join. Would that be a change that will affect indexers? I see that this is defined in a Cassandra file, for which I do not know the scope. |
Could you do a test for a registry and a forge, for the same origin?
I can give you real life examples, if needed..
I can't test the diff now, but I read it and added some comments..
I could try testing it later on this week...
swh/storage/cassandra/schema.py | ||
---|---|---|
193 | Why do we need authority_type in the PRIMARY KEY? isn't the url enough ? | |
197 | every typo? | |
swh/storage/interface.py | ||
1151 | and? | |
1170 | is the and at the end intentional? | |
1241–1242 | is it a fixed set? | |
swh/storage/tests/storage_data.py | ||
328–329 | above the type is only deposit | |
329–331 | this looks like a wrong url (I don't know if this is a good or bad thing for a test) |
swh/storage/cassandra/cql.py | ||
---|---|---|
801 | The tool table used by the indexer are in their dedicated storage db. |
apply @moranegg's comments
sql/upgrades/148.sql | ||
---|---|---|
37 | the name is not needed as an identifier, and everything else can go in the metadata if we need it, but I don't think we do. | |
swh/storage/cassandra/schema.py | ||
193 | I don't know if it's needed, but that's how we wrote the specification. | |
swh/storage/interface.py | ||
1170 | no, that's a typo | |
1241–1242 | According to the spec, yes. We can extend it later anyway. | |
swh/storage/tests/storage_data.py | ||
328–329 | indeed | |
329–331 | It should be a valid URI in practice; but in tests it doesn't really matter. Changed to a more obvious url. |
Build is green
Patch application report for D2988 (id=10832)
Rebasing onto bca643acab...
First, rewinding head to replay your work on top of it... Applying: Add a wrapper to manage a sorted list. Applying: Implement extrinsic origin metadata specification.
Changes applied before test
commit f12240bf1531751ab4af443906c8ba83d4e3b978 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Apr 9 11:46:21 2020 +0200 Implement extrinsic origin metadata specification. https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html commit e9369b031669ae77ee627ee38ac12dda32b7a3b4 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Mar 24 17:22:09 2020 +0100 Add a wrapper to manage a sorted list. For now this is only used by sorted_sha1, but we'll need it for origin_metadata soon.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/111/ for more details.
Build is green
Patch application report for D2988 (id=10843)
Rebasing onto bca643acab...
First, rewinding head to replay your work on top of it... Applying: Add a wrapper to manage a sorted list. Applying: Implement extrinsic origin metadata specification.
Changes applied before test
commit 73cef28bb09655e7f4e3bc5c7974f79d66162181 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Apr 9 11:46:21 2020 +0200 Implement extrinsic origin metadata specification. https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html commit 0c38297e5b67c1f020d742797d3df82a7c7ce7d7 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue Mar 24 17:22:09 2020 +0100 Add a wrapper to manage a sorted list. For now this is only used by sorted_sha1, but we'll need it for origin_metadata soon.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/113/ for more details.
swh/storage/cassandra/storage.py | ||
---|---|---|
1020 | don't we have some conversion to do on that field? (in metadata_fetcher_get below, you do a json.loads so ¯\_(ツ)_/¯) |
swh/storage/cassandra/storage.py | ||
---|---|---|
1020 | origin metadata are blobs (because we get them from random sources), metadata fetcher metadatas are jsonb (because we write them at SWH) |
Build is green
Patch application report for D2988 (id=10847)
Rebasing onto fe56005555...
Current branch diff-target is up to date.
Changes applied before test
commit 0acfafc733995d858c5dc6e9c9fafffca49a5f52 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Apr 9 11:46:21 2020 +0200 Implement extrinsic origin metadata specification. https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/116/ for more details.
swh/storage/tests/test_retry.py | ||
---|---|---|
551 | can you please merge the string, that's black doing some joke on us? |
@ardumont After this diff lands as is, the pattern in loader with the extrinsic field, needs to change terminology to authority and fetcher.
(instead of provider and tool, to stay coherent)
This is why I wanted to have a last chat with @zack regarding the alignment of the terminology.
Also @vlorentz, can we identify all the modules that are affected by this modification and terminology?
@ardumont After this diff lands as is, the pattern in leader with extrinsic, needs to change terminology to authority and fetcher.
I did not grasp what you meant ;)
Can you please reformulate?
This is why I wanted to have a last chat with @zack regarding the alignment of the terminology.
sure thing.
Also @vlotentz, can we identify all the modules that are affected by this modification and terminology?
typo @vlorentz ^
Build is green
Patch application report for D2988 (id=10855)
Rebasing onto fe56005555...
Current branch diff-target is up to date.
Changes applied before test
commit 3dc9a2f52938ea20c4fd3b4078f7cb6fd1ac5c59 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Apr 9 11:46:21 2020 +0200 Implement extrinsic origin metadata specification. https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/117/ for more details.
I have updated my first comment with maybe a more clear and without typos sentence.
Sorry about that.
I'm referring to the loader-tar and the new pattern of including extrinsic metadata to the revision.
Which needs to be coherent in terms of terminology.
So we need to discuss that.
@ardumont After this diff lands as is, the pattern in loader with the extrinsic field, needs to change terminology to authority and fetcher.
(instead of provider and tool, to stay coherent)
Yes, indeed. When this lands and the storage get tagged.
Thus why i mention earlier the loader-core's deposit test will notice it and fail at that point.
That will be our cue to migrate that calls to storage accordingly.
Which makes me realize, @vlorentz don't we need an extra sql statement to migrate the data?
Which makes me realize, @vlorentz don't we need an extra sql statement to migrate the data?
What data?
The ones present in the table you are renaming, dropping columns, etc... ?
(i was talking about the pg storage heh ;)
There's the migration.
And this:
-- migrate metadata_jsonb (a jsonb) to metadata (a bytea) with an external process
is because I don't think there is a postgresql function to do it.
Build is green
Patch application report for D2988 (id=11114)
Rebasing onto b0b767b91c...
First, rewinding head to replay your work on top of it... Applying: Implement extrinsic origin metadata specification. Using index info to reconstruct a base tree... M docs/extrinsic-metadata-specification.rst M swh/storage/cassandra/cql.py M swh/storage/cassandra/schema.py M swh/storage/cassandra/storage.py M swh/storage/db.py M swh/storage/in_memory.py M swh/storage/sql/30-swh-schema.sql M swh/storage/sql/60-swh-indexes.sql M swh/storage/storage.py M swh/storage/tests/conftest.py M swh/storage/tests/storage_data.py M swh/storage/tests/test_retry.py M swh/storage/tests/test_storage.py Falling back to patching base and 3-way merge... Auto-merging swh/storage/tests/test_storage.py Auto-merging swh/storage/tests/test_retry.py CONFLICT (content): Merge conflict in swh/storage/tests/test_retry.py Auto-merging swh/storage/tests/storage_data.py Auto-merging swh/storage/tests/conftest.py Auto-merging swh/storage/storage.py Auto-merging swh/storage/sql/60-swh-indexes.sql Auto-merging swh/storage/sql/30-swh-schema.sql CONFLICT (content): Merge conflict in swh/storage/sql/30-swh-schema.sql Auto-merging swh/storage/in_memory.py Auto-merging swh/storage/db.py Auto-merging swh/storage/cassandra/storage.py Auto-merging swh/storage/cassandra/schema.py CONFLICT (content): Merge conflict in swh/storage/cassandra/schema.py Auto-merging swh/storage/cassandra/cql.py Auto-merging docs/extrinsic-metadata-specification.rst Patch failed at 0001 Implement extrinsic origin metadata specification. Resolve all conflicts manually, mark them as resolved with "git add/rm <conflicted_files>", then run "git rebase --continue". You can instead skip this commit: run "git rebase --skip". To abort and get back to the state before "git rebase", run "git rebase --abort".
Rebase failed (ret=1)!
Could not rebase; Attempt merge onto b0b767b91c...
Already up to date.
Changes applied before test
commit 5b7825b2f6506ece6f6b7fc5b582ea9cbd9f7ad3 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Apr 9 11:46:21 2020 +0200 Implement extrinsic origin metadata specification. https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/161/ for more details.
convert_to(metadata_jsonb::text, 'utf-8'); should turn the jsonb field metadata_jsonb into a utf-8 encoded bytea version of this string.
Build is green
Patch application report for D2988 (id=11227)
Rebasing onto 6d24ed721a...
Current branch diff-target is up to date.
Changes applied before test
commit ab45de192802e92ec24845ac95c87464475a0b91 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Apr 9 11:46:21 2020 +0200 Implement extrinsic origin metadata specification. https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/177/ for more details.
For the sake of the migration to amend 149.sql [1]
[1] P675
sql/upgrades/149.sql | ||
---|---|---|
74 ↗ | (On Diff #11227) | This currently fails on staging with latest 148. Trying out olasd's proposition works: update origin_metadata set metadata=convert_to(metadata_jsonb::text, 'utf-8'); |
96 ↗ | (On Diff #11227) | Currently failing with: psql:149.sql:100: ERROR: function name "swh_origin_metadata_get_by_origin" is not unique As indeed, there are 2 functions matching this: swh=> \df swh_origin_metadata_get_by_origin List of functions Schema | Name | Result data type | Argument data types | Type --------+-----------------------------------+---------------------------------+---------------------+------ public | swh_origin_metadata_get_by_origin | SETOF origin_metadata_signature | origin integer | func public | swh_origin_metadata_get_by_origin | SETOF origin_metadata_signature | origin text | func (2 rows) Same for the next one swh_origin_metadata_get_by_provider_type: swh=> \df swh_origin_metadata_get_by_provider_type List of functions Schema | Name | Result data type | Argument data types | Type --------+------------------------------------------+---------------------------------+-------------------------------------+------ public | swh_origin_metadata_get_by_provider_type | SETOF origin_metadata_signature | origin integer, type text | func public | swh_origin_metadata_get_by_provider_type | SETOF origin_metadata_signature | origin_url text, provider_type text | func (2 rows) For the drop instruction to work, we need to provide the function signatures. Assuming we want to drop all functions, that means: drop function swh_origin_metadata_get_by_origin(text); drop function swh_origin_metadata_get_by_provider_type(text, text); drop function swh_origin_metadata_get_by_origin(int); drop function swh_origin_metadata_get_by_provider_type(int, text); Note: If we need to drop all functions as well, the func.sql file needs to be |
Build is green
Patch application report for D2988 (id=11233)
Rebasing onto 6d24ed721a...
Current branch diff-target is up to date.
Changes applied before test
commit eb41db1ee9fc615d674d89c46c35518993293856 Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Apr 9 11:46:21 2020 +0200 Implement extrinsic origin metadata specification. https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/181/ for more details.
fixup of the rebase: alter index metadata_provider_type_url to change its name, instead of creating metadata_authority_type_url which is a duplicate.
Build is green
Patch application report for D2988 (id=11236)
Rebasing onto 6d24ed721a...
Current branch diff-target is up to date.
Changes applied before test
commit e645e637095152070399f927f02b927d70e1d11c Author: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Thu Apr 9 11:46:21 2020 +0200 Implement extrinsic origin metadata specification. https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/183/ for more details.