Page MenuHomeSoftware Heritage

Extend new deposit endpoint to support metadata-only deposits
Closed, ResolvedPublic

Description

Extend same deposit endpoint

Depositing metadata is SWORD compliant - see http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#protocoloperations_creatingresource_entry)

It is required to use the same endpoint as the regular deposit.

Although, it might be challenging to do so, since the metadata deposit requires a SWHID and the process that follows is quite different from the deposit.

Related Objects

StatusAssignedTask
Work in Progressmoranegg
OpenNone
Resolvedardumont
ResolvedNone
OpenNone
Resolvedardumont
Resolvedardumont
Resolvedvlorentz
ResolvedNone
Resolvedvlorentz
Resolvedardumont
Resolvedvlorentz
Resolvedvlorentz
Resolvedardumont
Resolvedvlorentz
Resolvedanlambert
Resolvedardumont
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
OpenNone
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
OpenNone
Resolvedmoranegg

Event Timeline

moranegg triaged this task as Normal priority.Aug 26 2020, 5:22 PM
moranegg created this task.

Although, it might be challenging to do so, since the metadata deposit requires a SWHID and the process that follows is quite different from the deposit.

Maybe not so much in the end, we could use the slug ("suggested identifier").

Quoting the linked page:

The client MAY supply a Slug header providing a suggested identifier for the deposited content

If that slug is a SWHID (and it exists), then we infer the deposit client wants a metadata only deposit and progress as such.

After a talk with Bruno and Yannick on HAL, they say that depositing metadata is: 6.5.2. Replacing the Metadata of a Resource
because the resource already exists on SWH and this should be a PUT of new (maybe new from scratch) metadata on an existing identifier (SWHID).

That's true if you consider the graph sub-dag to be the Resource. I assumed at Resource to be the deposit itself; but re-reading the SWORDv2 spec, it's not obvious which one is the right interpretation.

zack renamed this task from Extend software deposit endpoint to enable only metadata deposits to Extend software deposit endpoint to support metadata-only deposits.Sep 1 2020, 6:49 PM
moranegg renamed this task from Extend software deposit endpoint to support metadata-only deposits to Extend new deposit endpoint to support metadata-only deposits.Sep 3 2020, 3:56 PM

After this morning's meeting with @vlorentz and @ardumont:
We will keep the metadata-only deposit specs with the idea of a separate namespace swh for which we need to write the schema (not sure we have that).

This way, the xml with metadata has a section where the identified artifact is mentioned:

Reference a snapshot, revision or release:

With ${type} in {snp (snapshot), rev (revision), rel (release) }:
<swh:deposit>
  <swh:reference>
    <swh:object id="swh:1:${type}:aaaaaaaaaaaaaa..."/>
  </swh:reference>
</swh:deposit>

We need to add to the list of types: directory and content

The possibility to deposit metadata on an origin should be implemented as well, but is not suited for institutional repositories (e.g HAL).
Reference an origin:

<swh:deposit>
  <swh:reference>
    <swh:origin url="https://github.com/user/repo"/>
  </swh:reference>
</swh:deposit>

This specs fits the POST of a new deposit in SWORD and is described in the SWORD v2 documentation (6.3.3. Creating a Resource with an Atom Entry)

The sequence here is:

  1. The client sends (POST) new metadata (without content)
  2. In the xml a reference to the object or origin MUST be provided in a deposittag
  3. The status of the deposit is verified
  4. The metadata is injected into the MetaData Storage in its raw form
  5. The status of the deposit is done

@vlorentz some comments on: https://docs.softwareheritage.org/devel/swh-deposit/specs/spec-meta-deposit.html

  1. Th example is with HAL by referencing an origin, which is an unexisting scenarion

Can you change in the example to:

<swh:deposit>
      <swh:reference>
        object swhid="swh:1:dir:31b5c8cc985d190b5a7ef4878128ebfdc2358f49;
                      origin=https://hal.archives-ouvertes.fr/hal-01243573;
                      visit=swh:1:snp:4fc1e36fca86b2070204bedd51106014a614f321;
                      anchor=swh:1:rev:9c5de20cfb54682370a398fcc733e829903c8cba;
                     path=/moranegg-AffectationRO-df7f68b/"
      </swh:reference>
    </swh:deposit>
  1. Add this in requirements:
1. The metadata-only deposit is sent via SWORD protocol with a POST request the same as a classic deposit to a client's collection (see here https://docs.softwareheritage.org/devel/swh-deposit/spec-api.html#create-deposit)
2.  The metadata-only deposit is composed from ONLY one xml file containing all metadata
3. It MUST comply to the metadata requirements in https://docs.softwareheritage.org/devel/swh-deposit/metadata.html
4. It MUST reference an **object** or  an **origin**  in a deposit tag
5. The reference SHOULD exist in the SWH archive (to verify with upper management)
6. The **object** reference MUST be a SWHID on one of the following artifact types: 
 - snapshot
 - release
 - revision
 - directory
 - content
7. The SWHID MAY be simple (core SWHID) 
8. The SWHID MAY be complex with context (adding classifiers as documented here https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html)
8. The SWHID MUST NOT reference a fragment of code with the classifier `lines`
  1. Should we change the link https://www.softwareheritage.org/schema/2018/deposit? which is just a redirection at the moment.. but 2018 might not be a good thing to have in a namespace url.
  1. I have some diagrams in mermaid that I can transform to PlantUML that we can add to the use cases section.

add also an example for Core SWHID:

<swh:deposit>
      <swh:reference>
        object swhid="swh:1:dir:31b5c8cc985d190b5a7ef4878128ebfdc2358f49"
      </swh:reference>
    </swh:deposit>
  • In the case of the contextual SWHID, it is important to check if there are white spaces inside the SWHID before resolving with the archive

The last suggested adaptations are D4403 D4405.

ardumont changed the task status from Open to Work in Progress.Nov 17 2020, 12:08 PM

Is this related to T1021?

I'd say yes, i added that task as parent task.

This really begins to look like a subway map :-)