Page MenuHomeSoftware Heritage

Add possibility to fetch last SWHID for a deposit using an origin on deposit cli
Closed, ResolvedPublic

Description

In IPOL for article #300, this information is added to the entry:

sw_id="swh:1:dir:03d81d9e8d583aa52bfe5a696c875a406571684c"
sw_origin="origin=https://doi.org/10.5201/ipol.2020.300"
sw_visit="visit=swh:1:snp:e3c3904624230050561a3c5a615b3852fda6a22c"
sw_anchor="anchor=swh:1:rev:2bda715d58c19bfc04bac7028a5c67780f177cd3"
sw_title="Implementation of the LLCC method for image enhancement"
sw_authors="Jose-Luis Lisani"
sw_date="2018-01-01"
sw_license="AGPL-3.0-or-later"
sw_version="2.0"

The fields origin, title, authors, date, license and version can be easily retrieved from the IPOL description file.

But the id, anchor and visit fields are more difficult to retrieve.

Use case
An editor is trying to find out the simplest way to create a script that gives me the SWHID information for each deposited article. I kept all the deposits IDs, so there is no problem in using this information and the 'swh deposit status' command. I was just wondering if there was a more direct option using as query the 'slug'.

The slug is a.k.a the origin.

SWH answer in January:

Using the deposit-id is best because it identifies one artifact, when giving a slug(which creates an origin) it has the capacity to accept multiple versions of an artifact for one origin.
A request using the origin, can potentially return a list of SWHIDs or the last deposit, which might be ambiguous.
We will reflect on the possibilities to implement this functionality and get back to you.

Event Timeline

moranegg triaged this task as Normal priority.Apr 1 2021, 1:08 PM
moranegg created this task.

Discussed for 2021W18 or 2021W19 during today's call.

Ipol's editor has agreed that this solution will be good for their use case.

This is possible with the main API:

@moranegg I've went the script way in the end (what val initially proposed).

I had to slightly adapt the swh-web-client module [1] first (done, and unstucking the
packaging along the way ;).

The other way around is not that simple to implement. There are too many cogs to adapt
(client, and server side, D6210 would have been an incomplete start btw). As Val
suggested multiple times, information is already on the archive so i'm not willing to
duplicate the reading in the deposit.

Out of this change, a script [2] is ready to be duplicated if Jose Luis wants or
inspires from it. See the example use [3].

@vlorentz I pushed it on snippet since it hardcodes how to fetch the revision "path"
(from the snapshot) we are interested in for the deposit case. And also because it's
mostly intended for duplication.

As usual, feel free to suggest, adapt or push something better ;). For example, I don't
know whether we want an entry point directly in the swh web command line instead of
the snippet (if it's the case, that'd be extra work though and i feel like i've spent
too much time on this already).

[1] D6211

[1] https://archive.softwareheritage.org/browse/origin/content/?origin_url=https://forge.softwareheritage.org/source/snippets/&path=ardumont/last_swhid/cli.py&timestamp=2021-09-08T14:04:21Z#L18-L47

[3]

$ python3 -m cli https://doi.org/10.5201/ipol.2018.236
swh:1:dir:d85591aeefea2c1c58142e34683fd1923b19c895
$ shuf ipol-deposit-origins.txt| head -1
https://doi.org/10.5201/ipol.2019.248
$ python3 -m cli https://doi.org/10.5201/ipol.2019.248
swh:1:dir:1c5ff6be79f897470f3d7365c09b64aa173de575
$ python3 -m cli https://doi.org/10.5201/ipol.2019 2>&1 | tail -1
ValueError: No origin found matching https://doi.org/10.5201/ipol.2019

Cheers,

ardumont changed the task status from Open to Work in Progress.Sep 8 2021, 4:31 PM
ardumont moved this task from Backlog to In code review on the SWORD deposit board.