Revisions and Commits
Event Timeline
Here is an overview of the fields (+ internal version name + branch name) used by each package loader, after D6616:
Loader | version | branch | name | message | target | target_type | synthetic | author | date | Notes |
---|---|---|---|---|---|---|---|---|---|---|
archive | passed as arg | release_name( version) | =version | "swh-loader-package: synthetic revision message" | ... | dir | true | SWH robot | passed as arg | |
cran | metadata.get( "Version", passed as arg) | release_name( version) | =version | =version | ... | dir | true | metadata.get( "Maintainer", "") | metadata.get( "Date") | metadata is intrinsic |
debian | passed as arg (eg. stretch/contrib/0.7.2-3) | release_name( version) | =version | "Synthetic revision for Debian source package %s version %s" | ... | dir | true | metadata .changelog .person | metadata .changelog .date | metadata is intrinsic. no more RevisionType.DSC |
deposit | HEAD | only HEAD | HEAD | {client}: Deposit {id} in collection {collection} | ... | dir | true | SWH robot | <codemeta: dateCreated> from SWORD XML | revisions had parents |
nixguix | URL | URL | URL | "" | ... | dir | true | "" | None | it's the URL of the artifact referenced by the derivation |
npm | metadata["version"] | release_name( version) | =version | =version | ... | dir | true | from int metadata or "" | from ext metadata or None | |
opam | as given by opam | "{opam_package}.{version}" | =version | =version | ... | dir | true | from metadata | None | "{self.opam_package}.{version}" matches the version names used by opam's backend. metadata is extrinsic |
pypi | metadata["version"] | release_name( version) or release_name( version, filename) | =version | "{version}: {metadata[ 'comment_text']}" or just version | ... | dir | true | from int metadata or "" | from ext metadata or None | metadata is intrinsic |
using this function:
def release_name(version: str, filename: Optional[str] = None) -> str: if filename: return "releases/%s/%s" % (version, filename) return "releases/%s" % version
Copy of an email I sent on 2021-11-17:
Context
Since their creation, SWH's package loaders create "revision" objects to represent packages rather than "release", even though releases matched their meaning more closely (see https://docs.softwareheritage.org/devel/swh-model/data-model.html#software-artifacts)
This was due to technicalities, that prevented them from storing some metadata they needed on objects other than revisions. Thanks to recent work (the "extrinsic metadata storage" and "extids"), this is no longer a problem, so we are ready to make them write releases instead.
The change
So last Wednesday, we pushed an update to SWH's staging environment to finally do the switch to releases; you can see the results at https://webapp.staging.swh.network/ by searching for packages in your favorite package loader (NPM, OPAM, PyPI, ...) and looking for one visited within the last 7 days. For example: https://webapp.staging.swh.network/browse/origin/directory/?origin_url=https://www.npmjs.com/package/steam-market-manager
The update has four parts:
- new packages will be written as releases;
- the deposit loader will no longer write "parent" relationships between revisions; clients should list visits instead;
- existing revisions are automatically updated to releases (without re-fetching the package from the origin);
- (not deployed yet) we will use the opportunity to tweak values of fields populated in release objects to be more consistent across package loaders https://forge.softwareheritage.org/D6629
Existing visits will remain unchanged, and their snapshot will keep pointing to revision objects.
VCS loaders (Git, Mercurial, SVN, ...) also remain unchanged.
What's next
Our tests show this is all working as intended, so we are going to make the same changes to (the loaders of) the main archive early next week.
This does not fundamentally change Software Heritage's data model. From the API point of view, this just means that the /api/1/snapshot/ endpoint will return more releases, and may now return snapshots that are made of *only* releases (we did not have any so far, as far as I know).
If you wrote an API client using this endpoint, please make sure this is not an issue.