Page MenuHomeSoftware Heritage

Origin URLs generated for Fedora origins
Closed, MigratedEdits Locked

Description

The Fedora lister currently uses rpm://fedora/packages/$package_name as origin URLs, inspired by the Debian lister's deb://Debian/packages/$package_name and deb://Ubuntu/packages/$package_name URLs.

However, neither deb: and rpm: are standard which risks conflicting with future specs (or other non-standard uses), and in itself it's not great.

While it is (probably?) too late to change Debian origins, we need to make a decision before deploying the Fedora lister in production.

Event Timeline

After reviewing and hacking on the fedora lister, I think we should use origin URL in the form https://packages.fedoraproject.org/pkgs/{src_pkg_name} for a fedora source package.

For instance with the python-babelfish source package, the HTML page we land on contain all relevant links about the package and its related subpackages generated from the upstream sources (see spec file).

For the debian-based distribution case, thinking back about it, we should have used origin URL in the form :

  • https://packages.debian.org/source/{pkg_name} for Debian (see python-quamash for instance)
  • https://packages.ubuntu.com/source/{pkg_name} for Ubuntu (see python-quamash for instance)

I do not think it is too late to modify origin URLs for deb visit type, we can still trigger a new full listing of debian-based distribution source packages
and remove the old origins with no standard URL scheme once all packages processed.

Actually for fedora, I found a better origin URL pattern: https://src.fedoraproject.org/rpms/{pkg_name}

This makes more sense as we are listing source packages and it also has the advantage to not leading to a 404
for legacy packages no longer supported, for instance:

@anlambert What about non-Fedora RPM repositories? (RHEL, SUSE, Rocky Linux, ...)

@anlambert What about non-Fedora RPM repositories? (RHEL, SUSE, Rocky Linux, ...)

These ones do not have a website detailing their source packages so we will have to generate origin URLs ourselves.
As generating HTTP URLs that do no exist is not great, I think using a non standard URL scheme is still the right
way to go here, maybe we could prefix it to avoid possible conflicts in the future, something like swh+rpm://<distribution>/packages/<package> ?

Nevertheless, I think that if it exists a real URL relevant to what we are archiving (like fedora), we should use it.