Page MenuHomeSoftware Heritage

Write tutorial for calculating SWHIDs locally
Open, NormalPublic

Description

A step-by-step tutorial to calculate the SWHIDs as we do in presentations, how it can be calculated locally and how it matches what we find on the archive.
Also it can be nice to reference the code fragment in charge of the SWHID with a SWHID :-)


Main specs are here: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html

There is the cli steps here: https://docs.softwareheritage.org/devel/swh-model/cli.html

there are some details in https://docs.softwareheritage.org/devel/apidoc/swh.model.identifiers.html
eg. https://docs.softwareheritage.org/devel/apidoc/swh.model.identifiers.html#swh.model.identifiers.directory_identifier
or https://docs.softwareheritage.org/devel/apidoc/swh.model.identifiers.html#swh.model.identifiers.revision_identifier

Event Timeline

moranegg triaged this task as Normal priority.Tue, Feb 2, 10:18 AM
moranegg created this task.

From swh-devel:
On Fri, Feb 12, 2021 at 4:43 PM Stefano Zacchiroli <zack@upsilon.cc> wrote:


$ mkvirtualenv swh
...
$ pip install swh.model[cli]
...
$ swh identify --help
Usage: swh identify [OPTIONS] OBJECTS...

Compute the Software Heritage persistent identifier (SWHID) for the given
source code object(s).

For more details about SWHIDs see:

https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html

Tip: you can pass "-" to identify the content of standard input.

Examples:

  $ swh identify fork.c kmod.c sched/deadline.c
  swh:1:cnt:2e391c754ae730bd2d8520c2ab497c403220c6e3    fork.c
  swh:1:cnt:0277d1216f80ae1adeed84a686ed34c9b2931fc2    kmod.c
  swh:1:cnt:57b939c81bce5d06fa587df8915f05affbe22b82    sched/deadline.c

  $ swh identify --no-filename /usr/src/linux/kernel/
  swh:1:dir:f9f858a48d663b3809c9e2f336412717496202ab

  $ git clone --mirror https://forge.softwareheritage.org/source/helloworld.git
  $ swh identify --type snapshot helloworld.git/
  swh:1:snp:510aa88bdc517345d258c1fc2babcd0e1f905e93        helloworld.git

Options:
  --dereference / --no-dereference
                                  follow (or not) symlinks for OBJECTS passed
                                  as arguments (default: follow)

  --filename / --no-filename      show/hide file name (default: show)
  -t, --type [auto|content|directory|origin|snapshot]
                                  type of object to identify (default: auto)
  -x, --exclude PATTERN           Exclude directories using glob patterns
                                  (e.g., '*.git' to exclude all .git
                                  directories)

  -v, --verify SWHID              reference identifier to be compared with
                                  computed one

  -h, --help                      Show this message and exit.

We're also adding support for computing SWHIDs from commit IDs of other
VCS (e.g., Mercurial, as long as the repo is locally available), but
it's not integrated in the above CLI yet.