Details
- Reviewers
- None
- Group Reviewers
Reviewers - Maniphest Tasks
- T2681: Add validation step after metadata ingestion of a metadata-only deposit
tox
Diff Detail
- Repository
- rDDEP Push deposit
- Branch
- add-validation-step
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 16312 Build 25112: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 25111: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D4255 (id=15029)
Could not rebase; Attempt merge onto a8e6b830bb...
Updating a8e6b830..acf202c5 Fast-forward swh/deposit/api/deposit_update.py | 24 +++++ swh/deposit/cli/admin.py | 65 ++++++------ swh/deposit/tests/cli/conftest.py | 12 +++ swh/deposit/tests/cli/test_admin.py | 189 +++++++++++++++++++++++++++++++++++ swh/deposit/tests/cli/test_client.py | 8 +- 5 files changed, 263 insertions(+), 35 deletions(-) create mode 100644 swh/deposit/tests/cli/conftest.py create mode 100644 swh/deposit/tests/cli/test_admin.py
Changes applied before test
commit acf202c5bb810f5700f1a317b72b8d2734f47c49 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 14 11:44:08 2020 +0200 Add validation step after metadata ingestion of a metadata-only deposit Related to T2681 commit f17263e2ad2ac5f52fd4f2eb0deac58e9b9dc374 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Oct 13 14:59:50 2020 +0200 deposit.cli.admin: Add types and coverage on module This will allow to be more confident when we will refactor the configuration part.
See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/234/ for more details.
Build is green
Patch application report for D4255 (id=15040)
Could not rebase; Attempt merge onto 2f5300e60a...
Updating 2f5300e6..06f43725 Fast-forward swh/deposit/api/deposit_update.py | 24 +++++ swh/deposit/cli/admin.py | 65 ++++++------ swh/deposit/tests/cli/conftest.py | 12 +++ swh/deposit/tests/cli/test_admin.py | 189 +++++++++++++++++++++++++++++++++++ swh/deposit/tests/cli/test_client.py | 8 +- 5 files changed, 263 insertions(+), 35 deletions(-) create mode 100644 swh/deposit/tests/cli/conftest.py create mode 100644 swh/deposit/tests/cli/test_admin.py
Changes applied before test
commit 06f437255a222dbd42d788f6f143366ce1b41979 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 14 11:44:08 2020 +0200 Add validation step after metadata ingestion of a metadata-only deposit Related to T2681 commit 51446a11df5cbd6608b282135ea5f2ff30791397 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Oct 13 14:59:50 2020 +0200 deposit.cli.admin: Add types and coverage on module This will allow to be more confident when we will refactor the configuration part.
See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/238/ for more details.
Build is green
Patch application report for D4255 (id=15051)
Could not rebase; Attempt merge onto 51446a11df...
Updating 51446a11..92634764 Fast-forward swh/deposit/api/deposit_update.py | 24 +++++++ swh/deposit/cli/admin.py | 15 +++-- swh/deposit/tests/cli/test_admin.py | 128 ++++++++++++++++++++++++++++++++++++ 3 files changed, 160 insertions(+), 7 deletions(-)
Changes applied before test
commit 92634764f00a0c561523f4fed8073cd9132a5c81 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 14 11:44:08 2020 +0200 Add validation step after metadata ingestion of a metadata-only deposit Related to T2681 commit 4bb8a36d7a65ce65cf7302fd7a6b7b137fe7531e Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 14 12:40:03 2020 +0200 deposit.cli.admin: Add remaining coverage on `swh deposit reschedule`
See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/247/ for more details.
I don't see why we need this validation. The same function already called raw_extrinsic_metadata_add. If _add is unreliable (which it is not), then we have a lot of other "validations" to do everywhere else.
Additionally, the check may fail for the wrong reasons as there is no guarantee that an added object is immediately gettable, for example:
- eventually consistent DBs, such as Cassandra
- buffer proxy in the middle
- a "multiplexing" proxy I'm considering to solve T2602
swh/deposit/api/deposit_update.py | ||
---|---|---|
291–292 | why this assumption? It has the potential to cause bugs, so just use swh.core.api.classes.stream_results. It fetches new pages lazily, so if extra pages aren't needed, it won't fetch them. |
Yes, i was dubious to the utility of it...
But yesterday, I was in no mood to argue (I could not find relevant counter
arguments to it either).
So thanks for those \m/
swh/deposit/api/deposit_update.py | ||
---|---|---|
291–292 | I think that since we start at "around" now (after=date_now), the But it's mostly because I thought of calling stream_results but could not I would not mind a sample if you have that in store ;) |
swh/deposit/api/deposit_update.py | ||
---|---|---|
291–292 | it's not 0 if for some reason HAL sent us a burst of updates at the same time (eg. a user hammering an update button in their UI) results: Iterable[RawExtrinsicMetadata] = stream_results( self.storage_metadata.raw_extrinsic_metadata_get,MetadataTargetType.DIRECTORY, deposit_swhid, metadata_authority, after=date_now, ) |
swh/deposit/api/deposit_update.py | ||
---|---|---|
291–292 | i did not say 0, i said tends towards 0. Even with a user bursting updates, i'm not convinced we'll hit the 10k limits (default limit to the get call). Thanks for stream_results snippet .oO(d'oh) |
Build is green
Patch application report for D4255 (id=15086)
Rebasing onto 4bb8a36d7a...
Current branch diff-target is up to date.
Changes applied before test
commit 237e75c6e5b05c8bb38cec3fe94f28d3f40ec112 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 14 11:44:08 2020 +0200 Add validation step after metadata ingestion of a metadata-only deposit Related to T2681
See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/263/ for more details.
Build is green
Patch application report for D4255 (id=15100)
Could not rebase; Attempt merge onto 6f9b0d99ee...
Updating 6f9b0d9..f20b1bc Fast-forward requirements-swh.txt | 2 +- swh/icinga_plugins/deposit.py | 57 ++++++- swh/icinga_plugins/tests/test_deposit.py | 274 ++++++++++++++++++++++++------- 3 files changed, 270 insertions(+), 63 deletions(-)
Changes applied before test
commit f20b1bc8ce07668d081c2a735064af6bf59074b6 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Oct 13 18:23:22 2020 +0200 Add deposit metadata update step in deposit icinga check This extends the current icinga checks on deposit to trigger a metadata update when the main deposit is done. It retrieves the swhid from the deposit status calls. Then triggers a metadata update on the deposit using the latest deposit client abilities (0.3). Related to T2685 commit 5d43396856cf15f7989d5ce4e2ea8d289e9c05aa Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Oct 15 14:50:52 2020 +0200 test_deposit: Clarify unexpected output in case of assertion failures commit 2c326c4a0a8b2f77bc7f73332301f8352b81b2c9 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Oct 15 14:44:42 2020 +0200 test_deposit: Introduce status_template helper function To ease incoming scenario updates (notably the deposit update metadata). commit be18e1f9dd4ac7220d573443492d66a57b4981b9 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Oct 15 14:42:18 2020 +0200 test_deposit: Use f-string instead of concatenate strings
See https://jenkins.softwareheritage.org/job/DICP/job/tests-on-diff/14/ for more details.
Build is green
Patch application report for D4255 (id=15102)
Rebasing onto 7e05dbe497...
Current branch diff-target is up to date.
Changes applied before test
commit d8f1d946656e3bc896a26406ca85f06de4c756db Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Oct 14 11:44:08 2020 +0200 Add validation step after metadata ingestion of a metadata-only deposit Related to T2681
See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/264/ for more details.
Thanks @ardumont for this diff.
Thanks @vlorentz for the comment.
I asked for that task because with the content deposit we know that an ingested deposit returns a SWHID, the SWHID is the certificate that the ingestion has succeeded.
With metadata, for now, we don't have this kind of certificate and we assume that it worked if it didn't return an error (which is in my opinion not a sufficient guarantee).
For the meantime, maybe returning something with the add function that can be similar to a certificate, for example the primary key, can play this role.
If metadata will be hashed and an identifier could be returned as a certificate this can solve my issue, maybe this diff and task needs to wait for that implementation.
swh/deposit/api/deposit_update.py | ||
---|---|---|
295 | @vlorentz I saw somewhere : If yes, this means that the format inserted to the ERMDS on the deposit side should match the chosen format. |
The SWHID is computer by the loader. If we had a storage that silently dropped all objects, the SWHID would still be computed.
What you are saying is that the SWHID is no guarantee, so we shouldn't create guarantee for a metadata insert.
So what I would prefer is a better guarantee.
We can keep this diff or drop this diff.
In any case, seems we need to open a task about better insertion guarantees, before changing status to done.