Page MenuHomeSoftware Heritage

Remove metadata merging; use only the latest document
ClosedPublic

Authored by vlorentz on Feb 21 2022, 4:38 PM.

Details

Summary

We don't use that feature at all as far as I am aware.

I also find that it complicates any metadata handling (especially the validation
I would like to add in the near future, T3965), and probably does not match semantics
intended by SWORD (merging occurs on PUT requests, as we don't implement PATCH)

Diff Detail

Repository
rDDEP Push deposit
Branch
validate-metadata
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 27011
Build 42235: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 42234: arc lint + arc unit

Event Timeline

vlorentz retitled this revision from Remove metadata merging to Remove metadata merging; use only the latest document.Feb 21 2022, 4:38 PM

Build is green

Patch application report for D7211 (id=26138)

Rebasing onto 40adc8c23b...

Current branch diff-target is up to date.
Changes applied before test
commit 975a2aaba28591f050832237945a952c50ecf569
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Feb 21 16:37:49 2022 +0100

    Remove metadata merging
    
    We don't use that feature at all as far as I am aware.
    
    I also find that it complicates any metadata handling (especially the validation
    I would like to add in the near future), and probably does not match semantics
    intended by SWORD (merging occurs on PUT requests, as we don't implement PATCH)

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/711/ for more details.

Build is green

Patch application report for D7211 (id=26140)

Rebasing onto 40adc8c23b...

Current branch diff-target is up to date.
Changes applied before test
commit a25afe62bd1ea0452ef2e395575d1ec64382f6b4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Feb 21 16:37:49 2022 +0100

    Remove metadata merging; use only the latest document
    
    We don't use that feature at all as far as I am aware.
    
    I also find that it complicates any metadata handling (especially the validation
    I would like to add in the near future), and probably does not match semantics
    intended by SWORD (merging occurs on PUT requests, as we don't implement PATCH)

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/713/ for more details.

We don't use that feature at all as far as I am aware.

We actually don't know whether that's used or not (don't know how to check).

Given how I interpreted the sword v2 protocol back in the day, I kinda assumed that
there was no order in the deposit (regarding the atom received). The merge is coming
from there to simplify the checks (and insidiously, that implemented an unintended "most
recent" order)...

AFAIK, we never explicited that the last metadata (atom) deposited was the one that will
be used. So as long as we now document this behavior (last metadata atom entry received
is the one selected for the deposit ingestion), I'm fine with this change.

I also find that it complicates any metadata handling...

It sure does.

ardumont added a subscriber: moranegg.

lgtm, request changes so we update the deposit documentation along to notify about that behavior.

@moranegg heads up about this.

This revision now requires changes to proceed.Feb 21 2022, 5:03 PM

We don't use that feature at all as far as I am aware.

We actually don't know whether that's used or not (don't know how to check).

Analysis in P1296. Except in one old deposit, preserving old requests just keeps the edit history of HAL's users, which I don't think is very useful. (and usually it's incorrect data)

Given how I interpreted the sword v2 protocol back in the day, I kinda assumed that
there was no order in the deposit (regarding the atom received). The merge is coming
from there to simplify the checks (and insidiously, that implemented an unintended "most
recent" order)...

Relevant spec: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#protocoloperations_editingcontent_metadata

it does say "replace", there's nothing about merging.

AFAIK, we never explicited that the last metadata (atom) deposited was the one that will
be used. So as long as we now document this behavior (last metadata atom entry received
is the one selected for the deposit ingestion), I'm fine with this change.

lgtm, request changes so we update the deposit documentation along to notify about that behavior.

@moranegg heads up about this.

As far as I can tell, the current behavior was not documented either.

Given how I interpreted the sword v2 protocol back in the day, I kinda assumed that
there was no order in the deposit (regarding the atom received). The merge is coming
from there to simplify the checks (and insidiously, that implemented an unintended "most
recent" order)...

Relevant spec: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#protocoloperations_editingcontent_metadata
it does say "replace", there's nothing about merging.

Deposit got in production prior to all sword endpoints being implemented ^. Hence the
merge choice which is not done through those endpoints by the way, it's done by the
checkers and the loader iirc.

AFAIK, we never explicited that the last metadata (atom) deposited was the one that will
be used. So as long as we now document this behavior (last metadata atom entry received
is the one selected for the deposit ingestion), I'm fine with this change.

lgtm, request changes so we update the deposit documentation along to notify about that behavior.

@moranegg heads up about this.

As far as I can tell, the current behavior was not documented either.

yes, that's what i said in my previous comment (quoted again here). Let's not make that
mistake again and document it now.

Build is green

Patch application report for D7211 (id=26155)

Rebasing onto 770cc0f515...

Current branch diff-target is up to date.
Changes applied before test
commit 7727a9c0c8b8d40a8914ae74760a3bf7d35f255b
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Mon Feb 21 16:37:49 2022 +0100

    Remove metadata merging; use only the latest document
    
    We don't use that feature at all as far as I am aware.
    
    I also find that it complicates any metadata handling (especially the validation
    I would like to add in the near future), and probably does not match semantics
    intended by SWORD (merging occurs on PUT requests, as we don't implement PATCH)

See https://jenkins.softwareheritage.org/job/DDEP/job/tests-on-diff/721/ for more details.

lgtm, request changes so we update the deposit documentation along to notify about that behavior.

@moranegg heads up about this.

As far as I can tell, the current behavior was not documented either.

yes, that's what i said in my previous comment (quoted again here). Let's not make that
mistake again and document it now.

I disagree, this purely follows semantics described by SWORD, and there is no reason to re-document this particular part of SWORD.

I disagree, this purely follows semantics described by SWORD, and there is no reason
to re-document this particular part of SWORD.

Which we are absolutely all fluent in [1]...

anyway, ok then ¯\_(ツ)_/¯

[1] not me

This revision is now accepted and ready to land.Feb 22 2022, 2:01 PM

well, neither am I. But I don't think repeating this very specific point improves the overall understanding of the spec