Page MenuHomeSoftware Heritage

Make package loaders create releases objects instead of revisions
Closed, MigratedEdits Locked

Event Timeline

Here is an overview of the fields (+ internal version name + branch name) used by each package loader, after D6616:

Loaderversionbranchnamemessagetargettarget_typesyntheticauthordateNotes
archivepassed as argrelease_name( version)=version"swh-loader-package: synthetic revision message"...dirtrueSWH robotpassed as arg
cranmetadata.get( "Version", passed as arg)release_name( version)=version=version...dirtruemetadata.get( "Maintainer", "")metadata.get( "Date")metadata is intrinsic
debianpassed as arg (eg. stretch/contrib/0.7.2-3)release_name( version)=version"Synthetic revision for Debian source package %s version %s"...dirtruemetadata .changelog .personmetadata .changelog .datemetadata is intrinsic. no more RevisionType.DSC
depositHEADonly HEADHEAD{client}: Deposit {id} in collection {collection}...dirtrueSWH robot<codemeta: dateCreated> from SWORD XMLrevisions had parents
nixguixURLURLURL""...dirtrue""Noneit's the URL of the artifact referenced by the derivation
npmmetadata["version"]release_name( version)=version=version...dirtruefrom int metadata or ""from ext metadata or None
opamas given by opam"{opam_package}.{version}"=version=version...dirtruefrom metadataNone"{self.opam_package}.{version}" matches the version names used by opam's backend. metadata is extrinsic
pypimetadata["version"]release_name( version) or release_name( version, filename)=version"{version}: {metadata[ 'comment_text']}" or just version...dirtruefrom int metadata or ""from ext metadata or Nonemetadata is intrinsic

using this function:

def release_name(version: str, filename: Optional[str] = None) -> str:
    if filename:                                                      
        return "releases/%s/%s" % (version, filename)                 
    return "releases/%s" % version

Copy of an email I sent on 2021-11-17:

Context

Since their creation, SWH's package loaders create "revision" objects to represent packages rather than "release", even though releases matched their meaning more closely (see https://docs.softwareheritage.org/devel/swh-model/data-model.html#software-artifacts)

This was due to technicalities, that prevented them from storing some metadata they needed on objects other than revisions. Thanks to recent work (the "extrinsic metadata storage" and "extids"), this is no longer a problem, so we are ready to make them write releases instead.

The change

So last Wednesday, we pushed an update to SWH's staging environment to finally do the switch to releases; you can see the results at https://webapp.staging.swh.network/ by searching for packages in your favorite package loader (NPM, OPAM, PyPI, ...) and looking for one visited within the last 7 days. For example: https://webapp.staging.swh.network/browse/origin/directory/?origin_url=https://www.npmjs.com/package/steam-market-manager

The update has four parts:

  1. new packages will be written as releases;
  2. the deposit loader will no longer write "parent" relationships between revisions; clients should list visits instead;
  3. existing revisions are automatically updated to releases (without re-fetching the package from the origin);
  4. (not deployed yet) we will use the opportunity to tweak values of fields populated in release objects to be more consistent across package loaders https://forge.softwareheritage.org/D6629

Existing visits will remain unchanged, and their snapshot will keep pointing to revision objects.

VCS loaders (Git, Mercurial, SVN, ...) also remain unchanged.

What's next

Our tests show this is all working as intended, so we are going to make the same changes to (the loaders of) the main archive early next week.

This does not fundamentally change Software Heritage's data model. From the API point of view, this just means that the /api/1/snapshot/ endpoint will return more releases, and may now return snapshots that are made of *only* releases (we did not have any so far, as far as I know).

If you wrote an API client using this endpoint, please make sure this is not an issue.