Page MenuHomeSoftware Heritage

Create artifact release when 'releaseNotes' is in metadata
Closed, MigratedEdits Locked

Description

When depositing in SWH different software artifacts are created to ingest the content and metadata of the deposit.
origin- snapshot -HEAD branch- revision with metadata- root directory- content
At present, there is no notion of release, but the deposits from HAL are in fact releases.

On the other hand, not all deposits need a release.
This is why only deposits with releaseNotes and softwareVersion are concerned and the entry for this metadata should be the message of the release.

Here is the mapping table between terms and SWH release fields:

Metadata term Fallback valueSWH release fieldDescription
XXtargetrevision containing all metadata
XXtarget_typehard-coded: 'revision'
softwareVersionXnamerelease name if any
releaseNoteXmessagerelease message if any
datePublisheddeposit_datedatepublication date

One use case for the releases is with using Wikidata identifier on HAL deposits.
When pointing to SWH from Wikidata we can only use the swh:1:rel:<id> property.
We wish for T1752 to have links from software entities in Wikidata to software releases archived in SWH.

Event Timeline

moranegg triaged this task as Normal priority.May 24 2019, 3:33 PM
moranegg created this task.
moranegg added a subscriber: rdicosmo.
moranegg renamed this task from Create artifact release when 'releaseNote' is in metadata to Create artifact release when 'releaseNotes' is in metadata .May 24 2019, 4:41 PM

(I'll pass on the underlying limitation of being forced to link to a release object from wikidata, which feels a bit artificial but is out of scope for this task)

(in the current data model) a release has the following fields:

  1. target / target_type (mandatory) pointing at the contents of the release
  2. release name (mandatory) usually a version number, or in the case of git the name of the tag
  3. release message (not mandatory but surely nice to have) probably the releaseNote
  4. author and date (not mandatory but again probably nice to have) contain the author of the release and the date it was made

The contents of target/target type are clear to me (it's the revision object generated for the deposit).
The name is not clear (is there a field in the deposit metadata we should use? should we hardcode something?). I don't know whether we should (or can) generate an author and a date.

Then there's the question of what the snapshot should look like. Should the HEAD still point at the revision? At the release ? If the HEAD points at the revision, how do we point to the release object (which branch name)?

Once these design questions are cleared the implementation should be much easier to delegate.

From your comments, we can specify the following:
A deposit containing release metadata (softwareVersion, releaseNote) is a release software artifact.
The metadata that arrives with a deposit MUST use CodeMeta vocabulary (this should be reviewed following T1345).

In T1755#33805, @olasd wrote:
  1. target / target_type (mandatory) pointing at the contents of the release

target will be the revision created containing all metadata

  1. release name (mandatory) usually a version number, or in the case of git the name of the tag

use metadata term softwareVersion

  1. release message (not mandatory but surely nice to have) probably the releaseNote

use metadata term releaseNote

  1. author and date (not mandatory but again probably nice to have) contain the author of the release and the date it was made

use datePublished for dateif existing in metadata, if not use deposit_date
there is also dateCreated that should be (and is used) for author_date in revision

The name is not clear (is there a field in the deposit metadata we should use? should we hardcode something?). I don't know whether we should (or can) generate an author and a date.

I believe, the comments above answers that.

Then there's the question of what the snapshot should look like. Should the HEAD still point at the revision? At the release ? If the HEAD points at the revision, how do we point to the release object (which branch name)?

This is a tricky question, not sure if my solution works, but we could have:
HEAD -> revision x -> directory y
[branch] release v1 -> revision x -> directory y
result : snapshot contains 2 occurrences HEAD & v1

Once these design questions are cleared the implementation should be much easier to delegate.

I will try a first draft and we will see where it takes us :-p

Summarizing the last exchange to:

|-----------------------------------+----------------+-------------------+----------------------------------|
| metadata provided (codeMeta term) | Fallback value | SWH release field | Description                      |
|-----------------------------------+----------------+-------------------+----------------------------------|
| X                                 | X              | target            | revision containing all metadata |
| X                                 | X              | target_type       | hard-coded: 'revision'           |
| softwareVersion                   | X              | name              | Release name if any              |
| releaseNote                       | X              | message           | Release message if any           |
| datePublished (if any)            | `deposit_date` | date              | Publication date                 |
|-----------------------------------+----------------+-------------------+----------------------------------|

At the moment the loader-tar is not adapted to receiving release objects.
@ardumont, Should we wait the refactoring of the loader-tar to successfully implement this feature for deposits?

@ardumont, Should we wait the refactoring of the loader-tar to successfully implement this feature for deposits?

It depends on that task's priority.
If it's urgent, you could adapt the current tar loader implementation to deal with release.
(That won't change the behavior of anything else as there is only the deposit which uses it).

now that the metadata is going to be in a separated metadata storage, there is the question of keeping the revision artifact.

From today's discussion (with @ardumont, @vlorentz and @zack), we will create a release tag for deposits, following the specs above and we will keep the revision in place for the development history.

@ardumont this task should be the next priority after metadata-only deposits.

Note to self and you :-)

After today's call with @ardumont, @douardda and @vlorentz we have reviewed the question of depositing the next version vs updating the same version.
With this in mind,
when updating the same version using EM-IRI: the object created on SWH should be a revision
when depositing a new version we are not sure if we do an edit of the precedent deposit or create a new deposit.

TODO: decide:-)

I just realized that new content = new SWHID
so if we accept a EM-IRI with new content it needs to capture the new SWHID and "delete" the old SWHID
this is not a good idea unless we keep all history of content modification
hence there is a need of a new deposit with each new content and to each will be attached only one SWHID core

no, we will duplicate the metadata. one for each content

of course we will but the client will see only one SWHID to one deposit_id

superseded by T3809

The deposit have release.
The releaseNotes property is not yet used for the release message.