Page MenuHomeSoftware Heritage

Create artifact release when 'releaseNotes' is in metadata
Open, NormalPublic

Description

When depositing in SWH different software artifacts are created to ingest the content and metadata of the deposit.
origin- snapshot -HEAD branch- revision with metadata- root directory- content
At present, there is no notion of release, but the deposits from HAL are in fact releases.

On the other hand, not all deposits need a release.
This is why only deposits with releaseNotes and softwareVersion are concerned and the entry for this metadata should be the message of the release.

Here is the mapping table between terms and SWH release fields:

Metadata term Fallback valueSWH release fieldDescription
XXtargetrevision containing all metadata
XXtarget_typehard-coded: 'revision'
softwareVersionXnamerelease name if any
releaseNoteXmessagerelease message if any
datePublisheddeposit_datedatepublication date

One use case for the releases is with using Wikidata identifier on HAL deposits.
When pointing to SWH from Wikidata we can only use the swh:1:rel:<id> property.
We wish for T1752 to have links from software entities in Wikidata to software releases archived in SWH.

Event Timeline

moranegg triaged this task as Normal priority.May 24 2019, 3:33 PM
moranegg created this task.
moranegg added a subscriber: rdicosmo.
moranegg renamed this task from Create artifact release when 'releaseNote' is in metadata to Create artifact release when 'releaseNotes' is in metadata .May 24 2019, 4:41 PM
olasd added a subscriber: olasd.Jun 18 2019, 5:09 PM

(I'll pass on the underlying limitation of being forced to link to a release object from wikidata, which feels a bit artificial but is out of scope for this task)

(in the current data model) a release has the following fields:

  1. target / target_type (mandatory) pointing at the contents of the release
  2. release name (mandatory) usually a version number, or in the case of git the name of the tag
  3. release message (not mandatory but surely nice to have) probably the releaseNote
  4. author and date (not mandatory but again probably nice to have) contain the author of the release and the date it was made

The contents of target/target type are clear to me (it's the revision object generated for the deposit).
The name is not clear (is there a field in the deposit metadata we should use? should we hardcode something?). I don't know whether we should (or can) generate an author and a date.

Then there's the question of what the snapshot should look like. Should the HEAD still point at the revision? At the release ? If the HEAD points at the revision, how do we point to the release object (which branch name)?

Once these design questions are cleared the implementation should be much easier to delegate.

moranegg added a comment.EditedJun 19 2019, 11:54 AM

From your comments, we can specify the following:
A deposit containing release metadata (softwareVersion, releaseNote) is a release software artifact.
The metadata that arrives with a deposit MUST use CodeMeta vocabulary (this should be reviewed following T1345).

In T1755#33805, @olasd wrote:
  1. target / target_type (mandatory) pointing at the contents of the release

target will be the revision created containing all metadata

  1. release name (mandatory) usually a version number, or in the case of git the name of the tag

use metadata term softwareVersion

  1. release message (not mandatory but surely nice to have) probably the releaseNote

use metadata term releaseNote

  1. author and date (not mandatory but again probably nice to have) contain the author of the release and the date it was made

use datePublished for dateif existing in metadata, if not use deposit_date
there is also dateCreated that should be (and is used) for author_date in revision

The name is not clear (is there a field in the deposit metadata we should use? should we hardcode something?). I don't know whether we should (or can) generate an author and a date.

I believe, the comments above answers that.

Then there's the question of what the snapshot should look like. Should the HEAD still point at the revision? At the release ? If the HEAD points at the revision, how do we point to the release object (which branch name)?

This is a tricky question, not sure if my solution works, but we could have:
HEAD -> revision x -> directory y
[branch] release v1 -> revision x -> directory y
result : snapshot contains 2 occurrences HEAD & v1

Once these design questions are cleared the implementation should be much easier to delegate.

I will try a first draft and we will see where it takes us :-p

Summarizing the last exchange to:

|-----------------------------------+----------------+-------------------+----------------------------------|
| metadata provided (codeMeta term) | Fallback value | SWH release field | Description                      |
|-----------------------------------+----------------+-------------------+----------------------------------|
| X                                 | X              | target            | revision containing all metadata |
| X                                 | X              | target_type       | hard-coded: 'revision'           |
| softwareVersion                   | X              | name              | Release name if any              |
| releaseNote                       | X              | message           | Release message if any           |
| datePublished (if any)            | `deposit_date` | date              | Publication date                 |
|-----------------------------------+----------------+-------------------+----------------------------------|
moranegg updated the task description. (Show Details)Jun 19 2019, 1:44 PM
moranegg moved this task from Backlog to In progress on the SWORD deposit board.

At the moment the loader-tar is not adapted to receiving release objects.
@ardumont, Should we wait the refactoring of the loader-tar to successfully implement this feature for deposits?

@ardumont, Should we wait the refactoring of the loader-tar to successfully implement this feature for deposits?

It depends on that task's priority.
If it's urgent, you could adapt the current tar loader implementation to deal with release.
(That won't change the behavior of anything else as there is only the deposit which uses it).