Page MenuHomeSoftware Heritage

Build a connector for software deposit via Zenodo/InvenioRDM
Open, NormalPublic

Description

The popular Zenodo platform is now becoming a white label open source application, InvenioRDM, built on top of the Invenio library: https://inveniosoftware.org/products/rdm/
There are many partners collaborating on this new version, and this is the right time to contribute to it a software deposit functionality similar to what we have for HAL:

  • SWORD 3 support is planned for InvenioRDM
  • CodeMeta support/export is planned for InvenioRDM
  • our new BibTeX types would be welcome in InvenioRDM

First public release expected this summer.

Related Objects

StatusAssignedTask
Work in Progressmoranegg
OpenNone
OpenNone
Resolvedmoranegg
Resolvedvlorentz
Resolvedmoranegg
ResolvedNone
Resolvedardumont
Resolvedardumont
Resolvedvlorentz
ResolvedNone
Resolvedvlorentz
Resolvedardumont
Resolvedvlorentz
Resolvedvlorentz
Resolvedardumont
Resolvedvlorentz
Resolvedanlambert
Resolvedardumont
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
Resolvedvlorentz
OpenNone
Resolvedmoranegg
ResolvedNone
OpenNone

Event Timeline

rdicosmo triaged this task as Normal priority.Apr 1 2020, 5:41 PM
rdicosmo created this task.

Great news !!

Does this mean we need to be SWORD 3 compatible?

Great news !!

Does this mean we need to be SWORD 3 compatible?

After some thought, it appears to me that's not absolutely necessary, we need to discuss this point with the people working on the Invenio codebase.

Here is the pad shared with the InvenioRDM team:
https://hackmd.io/YIJXcf3YTDiwwYGD-yePrA

The InvenioRDM team propose to divide the work on their side into 3-4 phases:

  1. Core Development
  2. InvenioRDM to Software Heritage base integration- implement the interactions outlined in Figure 1 and Figure 2 (available in the hack.md)
  3. GitHub Integration - extend to integration with GitHub, enabling automatic deposit into Invenio, and “save code now” in Software Heritage
  4. Advanced GitHub Integration

On the SWH side for the InvenioRDM integration and HAL extension to the deposit metadata, the following steps are needed:

  1. Extend same deposit endpoint
  2. Add new option client swh-deposit
  3. specs about metadata verifications
    • choose metadata format T2311
    • SWHID (core and/or with context)
    • not empty metadata
    • url / authors ?
    • (syntax) incorrect ?
  4. implementation of metadata verification
  5. specs about deposit metadata storage
    • keep only in metadata storage
    • don't create origin (or other graph artifacts) for metadata deposit
  6. implementation of deposit metadata storage
  7. keep release in mind for content deposit T1755

I have made a survey of the existing code to ensure what I think happens in the
deposit is correct. TL; DR, it is!

Existing update metadata endpoints are focused in the
swh.deposit.api.deposit_update module [1].

Up-to-now, there is a restriction of use within the base class to prevent
updating deposit with status other than 'partial' [2]. That restriction should
be relaxed for the deposit metadata update case (when a SWHID is provided in
some ways). It should stay for the other existing cases.

[1] https://forge.softwareheritage.org/source/swh-deposit/browse/master/swh/deposit/api/deposit_update.py$86-161

[2] https://forge.softwareheritage.org/source/swh-deposit/browse/master/swh/deposit/api/common.py$755-778

May 2021 update

Where are we now?

Stage 2 in the CottageLabs specs:
https://docs.google.com/document/d/1z0ItQa8e2bFAha_9NtEG3hd1ZJCV6T4Zi4wrXWRdTfo/edit?usp=sharing

Next steps to complete Stage 2:

  • metadata record with SWHID (exports, citation with SWHID)
  • update metadata deposit - when the metadata of a deposit is changed

See the repositories for the InvenioRDM-SWH integration:
https://github.com/inveniosoftware/invenio-swh
https://github.com/CottageLabs/invenio-swh-demo

See the Invenio-cli repository (to which the invenio-swh module needs to connect to be deployed on a live instance)

Notes from the Zenodo-CottageLabs-SWH meeting

1. the additional software properties we want to add for software records:

  • operatingSystem (proposed list)
  • runtimePlatform (proposed list)
  • programmingLanguage (proposed list)
  • codeRepository (url)
  • releaseNotes (text)
  • softwareVersion (text)
  • developmentStatus (proposed list)
  • issueTracker (url)

1.1. It is preferable to wait for the customized properties feature before adding the software properties into the InvenioRDM form.
1.2. The releaseNotes property can be added as a type of additionalDescription
1.3. The codeRepository property can be added as a type of relatedIdentifiers (I need to check this with Martin from DataCite)
1.4. I will verify with HAL what proposed lists are used to help with the following properties operatingSystem, runtimePlatform, programmingLanguage and developmentStatus.
1.5. we see a specific software properties category in blue with the customized properties feature - when it is ready, we will discuss exactly which properties should be added from CodeMeta
1.6. [update] the custom fields feature has been dropped from the roadmap for June

2. the GitHub integration

2.1. the url of a code repository is used in supplemented by property in zenodo (we need to keep this in mind when working on phase 3 of the integration)
2.2. [not discussed, but important to review] the origin in SWH should be the (concept) DOI of a deposit (for which the SWH property createOrigin should be used in the SWORD metadata)

  • the version DOI needs to be added in metadata as an identifier, but will not act as the origin
  • the code repository (in supplemented by) should be also added to the metadata with the CodeMeta term codeRepository

2.2. the content from the release notes on GitHub is used in the description property on a zenodo record

  • it might be useful to review exactly what information from GitHub is used in zenodo
  • if not changed in Zenodo, then send to SWH metadata as is = as description

2.3. the GitHub guide dates from 2016: https://guides.github.com/activities/citable-code/ and does not explain how authors are being collected from the repository

3. CodeMeta

3.1. with the CodeMeta task force we are working on the adoption of the CodeMeta properties in schema.org (2 of the properties in the list above are CodeMeta specific)
3.2. I will verify with Matt and Carl (CodeMeta maintainers) when is planned the v3 release of CodeMeta

4. Pushing the InvenioRDM-SWH integration into Zenodo

4.1. Cottage labs and Software Heritage will finish the implementation of the integration workflow with the following items:

  • updated metadata workflow
  • record and display suitable minimal information on deposit state

4.2. Cottage labs and Software Heritage scheduled a sprint this Friday and we will see where we are at
4.3. Cottage labs will do a PyPi package release with the complete workflow that can be used by other systems
4.4. we will demonstrate the workflow at a Bi-weekly InvenioRDM meeting to get feedback and find a volunteer live system to put in production
4.5. after first push in production on a live system, we can consider pushing on Zenodo :-)

5. source code display in a software record

5.1. first we will implement the display of a the SWHID with a browse button as implemented on HAL
5.2. we did discuss the future (far future?) possibility to have the source code content displayed in a widget on the record page, which will require implementation of the widget by the SWH team at some point => to see with Roberto if this is part of the plan (discussion on T3351)

September 2021 update

A. Cottage Labs update:

    • Development of the InvenioRDM SWH integration was paused until the 6.0 LTS release of InvenioRDM on the 5th August 2021.
    • WIP in stage 2:
      1. Link new versions of previous deposits to those previous deposits in SWH by using add_to_origin
      2. Send the correct HTTP header to enable us to send replacement metadata when the metadata of a record is updated in InvenioRDM.
    • InvenioRDM’s delivery timeline has slipped repeatedly, and has not been a stable integration target for much of the timeframe of this project.
      1. The descoping in June 2021 of custom properties from the metadata schema from the LTS release. It had been recommended that we use custom properties for CodeMeta metadata such as operatingSystem, developmentStatus, but this is now difficult to achieve until custom property support is implemented.
      2. The later removal of support for extension metadata from the LTS release’s record metadata schema, despite having already built our deposit-process-related metadata on top of it.
  • Going forward, Cottage Labs require clarity from the InvenioRDM project on a suitable and sustainable approach to take for these outstanding issues:
    1. Storing SWH deposit-related metadata on or alongside the record
    2. Exposing that metadata through to the record detail template
    3. Extending the record detail template to display SWH deposit-related metadata

B. CERN's update :

  • Lars is keen to help get the SWH work finished, and accepts that with features changing underneath Cottage labs, that the InvenioRDM team will shoulder some of that burden.
  • Work should be directed at the LTS release, as stability can't be guaranteed, also the LTS should be well supported and bugs fixed quickly.