Page MenuHomeSoftware Heritage

feat: Introduce RPM loader
ClosedPublic

Authored by KShivendu on Oct 23 2022, 8:23 AM.

Details

Summary

Introduce RPM loader for ingesting .rpm files from Fedora archives

Related to T4648

Diff Detail

Repository
rDLDBASE Generic VCS/Package Loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Build has FAILED

Patch application report for D8753 (id=31662)

Rebasing onto e6847f3616...

First, rewinding head to replay your work on top of it...
Applying: feat: Bare minimum implementation of RPM loader
Applying: fix expected format of extra loader args packages
Changes applied before test
commit 6395a16c63ff664859937d1b36c42cc550b7e93f
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Oct 27 15:49:23 2022 +0530

    fix expected format of extra loader args packages

commit 7faa3fa4b076c8418263595de38be99e50851477
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Bare minimum implementation of RPM loader

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1007/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1007/console

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 27 2022, 12:25 PM
Harbormaster failed remote builds in B32615: Diff 31662!

Build has FAILED

Patch application report for D8753 (id=31662)

Rebasing onto e6847f3616...

First, rewinding head to replay your work on top of it...
Applying: feat: Bare minimum implementation of RPM loader
Applying: fix expected format of extra loader args packages
Changes applied before test
commit 4b4a9edcbb4a6ceae1f77ea5284eecad8eea9aa4
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Oct 27 15:49:23 2022 +0530

    fix expected format of extra loader args packages

commit 0cacbf42c96ebc6216785219793a2becea15c6e3
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Bare minimum implementation of RPM loader

Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1008/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1008/console

Build is green

Patch application report for D8753 (id=31669)

Rebasing onto e6847f3616...

First, rewinding head to replay your work on top of it...
Applying: feat: Bare minimum implementation of RPM loader
Applying: fix expected format of extra loader args packages
Changes applied before test
commit c0e079ebf7daad9eb55c52b2867915ec3857f534
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Oct 27 15:49:23 2022 +0530

    fix expected format of extra loader args packages

commit 7ac2627d3bdd9d5906a33a63e389a0d51ed5433d
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Bare minimum implementation of RPM loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1009/ for more details.

  • feat: Use subprocess.check_call and extract .tar obtained from .rpm

Build is green

Patch application report for D8753 (id=31722)

Rebasing onto e6847f3616...

First, rewinding head to replay your work on top of it...
Applying: feat: Bare minimum implementation of RPM loader
Applying: fix expected format of extra loader args packages
Applying: feat: Use subprocess.check_call and extract .tar obtained from .rpm
Changes applied before test
commit 5bacc3fe4e69a0268d8de1a965847a4431472375
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Nov 3 10:57:15 2022 +0530

    feat: Use subprocess.check_call and extract .tar obtained from .rpm

commit 5c745970be8da82e005c20ba841e70fdfc472e60
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Oct 27 15:49:23 2022 +0530

    fix expected format of extra loader args packages

commit 5f2ebc04ebabd0fce1541e08168e2f8bdbbe5c48
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Bare minimum implementation of RPM loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1014/ for more details.

  • feat: Use subprocess.check_call and extract .tar obtained from .rpm
  • Remove extra TODOs and FIXMEs
  • Add missing copyrights

Build is green

Patch application report for D8753 (id=31723)

Rebasing onto e6847f3616...

First, rewinding head to replay your work on top of it...
Applying: feat: Bare minimum implementation of RPM loader
Applying: fix expected format of extra loader args packages
Applying: feat: Use subprocess.check_call and extract .tar obtained from .rpm
Changes applied before test
commit 29efc4de50ef4f7a78931ab6cd7e69f6f0f5a4ff
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Nov 3 10:57:15 2022 +0530

    feat: Use subprocess.check_call and extract .tar obtained from .rpm

commit 2990666402966c7dc5a8ef5d9099cc226f094b43
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Oct 27 15:49:23 2022 +0530

    fix expected format of extra loader args packages

commit cefb9e46d99c60ad5c4a2e9740c6adcb734dd92a
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Bare minimum implementation of RPM loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1015/ for more details.

swh/loader/package/rpm/loader.py
40

Here edition can be ["Everything", "Server", "Workstation", "Modular"] (although it's already included in the version)

105–107

Any thoughts on this? I think it makes sense to use builld_time as the release date. Should we just hardcore the author field as "Fedora archive" or something like that?

132–154

Please suggest improvements in this :)

swh/loader/package/rpm/tests/test_rpm.py
135

The content hasn't changed so it should be uneventful, right?

vlorentz added inline comments.
swh/loader/package/rpm/loader.py
105–107

use the empty author, like some other package loaders do

134

send stderr to the logfile too

  • feat: Make the lister incremental and use build_time as release date
  • tests: Refactor to use fewer lines of code and improve readability

Build is green

Patch application report for D8753 (id=31774)

Rebasing onto bf2cb039d5...

First, rewinding head to replay your work on top of it...
Applying: feat: Bare minimum implementation of RPM loader
Using index info to reconstruct a base tree...
M	setup.py
Falling back to patching base and 3-way merge...
Auto-merging setup.py
CONFLICT (content): Merge conflict in setup.py
Patch failed at 0001 feat: Bare minimum implementation of RPM loader

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto bf2cb039d5...

Already up to date.
Changes applied before test
commit ca5e6d2094d167faacc5b32a85d38c8b5ce7ec5d
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Nov 6 00:34:34 2022 +0530

    feat: Make the lister incremental and use build_time as release date

commit 0bd2aabe0aa8723a29845d0fc2e82c81192dcbe0
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Nov 3 10:57:15 2022 +0530

    feat: Use subprocess.check_call and extract .tar obtained from .rpm

commit 779ba82f3ad56e25ec824fcf2212551674b8d57e
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Oct 27 15:49:23 2022 +0530

    fix expected format of extra loader args packages

commit b26268d12c6820051f0507c035e29db6dac824b4
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Bare minimum implementation of RPM loader

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1017/ for more details.

  • Squash commits
  • Improve comments, fix vars, and add docs

Build is green

Patch application report for D8753 (id=31775)

Rebasing onto bf2cb039d5...

First, rewinding head to replay your work on top of it...
Applying: feat: Bare minimum implementation of RPM loader
Using index info to reconstruct a base tree...
M	setup.py
Falling back to patching base and 3-way merge...
Auto-merging setup.py
CONFLICT (content): Merge conflict in setup.py
Patch failed at 0001 feat: Bare minimum implementation of RPM loader

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto bf2cb039d5...

Already up to date.
Changes applied before test
commit eedd1308cb2c0797e184ef97e24f783a5e43ff5f
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Bare minimum implementation of RPM loader
    
    feat: Make the lister incremental and use build_time as release date

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1018/ for more details.

feat: Remove microseconds from buildTime metadata to match real values from fedora lister

Build is green

Patch application report for D8753 (id=31802)

Rebasing onto bf2cb039d5...

First, rewinding head to replay your work on top of it...
Applying: feat: Incremental RPM loader implementation
Using index info to reconstruct a base tree...
M	setup.py
Falling back to patching base and 3-way merge...
Auto-merging setup.py
CONFLICT (content): Merge conflict in setup.py
Patch failed at 0001 feat: Incremental RPM loader implementation

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto bf2cb039d5...

Already up to date.
Changes applied before test
commit 6562b90c15ca7a4706370db14bf7af24fe592804
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Incremental RPM loader implementation

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1019/ for more details.

anlambert added a subscriber: anlambert.

@KShivendu , I added some inline comments to improve the loader output.

I also noticed that many rpm archives contain a spec file and some source tarballs in their content.
I think it will be of interest to create extra snapshot branches targeting directories corresponding
to the contents of those tarballs. You can extract those tarballs while building a release and implement
def extra_branches(self) -> Dict[bytes, Mapping[str, Any]]: to reference those extra branches to add.
You can inspire from the diff below:

diff --git a/swh/loader/package/rpm/loader.py b/swh/loader/package/rpm/loader.py
index 2b93dc2..179d5aa 100644
--- a/swh/loader/package/rpm/loader.py
+++ b/swh/loader/package/rpm/loader.py
@@ -1,19 +1,24 @@
-# Copyright (C) 2019-2021  The Software Heritage developers
+# Copyright (C) 2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
+from __future__ import annotations
+
 import logging
-from os import path, remove
+from os import path, remove, walk
 import string
 import subprocess
+import tempfile
 from typing import Any, Dict, Iterator, List, Mapping, Optional, Sequence, Tuple
 
 import attr
+from packaging.version import parse as parse_version
 
 from swh.core.tarball import uncompress
 from swh.loader.package.loader import BasePackageInfo, PackageLoader
 from swh.loader.package.utils import EMPTY_AUTHOR
+from swh.model import from_disk
 from swh.model.model import ObjectType, Release, Sha1Git, TimestampWithTimezone
 from swh.storage.interface import StorageInterface
 
@@ -22,28 +27,25 @@ logger = logging.getLogger(__name__)
 
 @attr.s
 class RpmPackageInfo(BasePackageInfo):
-    raw_info = attr.ib(type=Dict[str, Any])
     name = attr.ib(type=str)
     build_time = attr.ib(type=str, default=None)
     """Build time of the package in iso format. (e.g. 2017-02-10T04:59:31+00:00)"""
 
     EXTID_TYPE = "rpm-sha256"
-    MANIFEST_FORMAT = string.Template("$version $url")
+    MANIFEST_FORMAT = string.Template("$name $version $build_time")
 
     @classmethod
-    def from_metadata(
-        cls, a_metadata: Dict[str, Any], origin: str, version: str
-    ) -> "RpmPackageInfo":
+    def from_metadata(cls, a_metadata: Dict[str, Any], version: str) -> RpmPackageInfo:
         filename = a_metadata["url"].split("/")[-1]
         assert filename.endswith(".rpm")
 
         return cls(
+            name=a_metadata["name"],  # nginx
             url=a_metadata["url"],  # url of the .rpm file
             filename=filename,  # nginx-1.18.0-5.fc34.src.rpm
-            version=version,  # 34/Everything/1.18.0
+            version=version,  # 1.18.0-5.fc34
             build_time=a_metadata["buildTime"],
-            raw_info=a_metadata,
-            name=a_metadata["name"],
+            checksums=a_metadata["checksums"],
         )
 
 
@@ -54,7 +56,7 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
         self,
         storage: StorageInterface,
         url: str,
-        packages: Mapping[str, Any],
+        packages: Dict[str, Dict[str, Any]],
         **kwargs: Any,
     ):
         """RPM Loader implementation.
@@ -64,7 +66,7 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
             packages: versioned packages and associated artifacts, example::
 
               {
-                '34/Everything/1.18.0': {
+                '1.18.0-5.fc34': {
                   'name': 'nginx',
                   'version': '1.18.0'
                   'release': 34,
@@ -78,17 +80,20 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
         super().__init__(storage=storage, url=url, **kwargs)
         self.url = url
         self.packages = packages
+        self.tarball_branches: Dict[bytes, Dict[str, Any]] = {}
 
     def get_versions(self) -> Sequence[str]:
-        """Returns the keys of the packages input (e.g. 34/Everything/1.18.0, etc...)"""
-        return list(self.packages)
+        """Returns the keys of the packages input (e.g. 1.18.0-5.fc34, etc...)"""
+        return list(sorted(self.packages, key=parse_version))
+
+    def get_default_version(self) -> str:
+        """Get the newest release version of a rpm package"""
+        return self.get_versions()[-1]
 
     def get_package_info(self, version: str) -> Iterator[Tuple[str, RpmPackageInfo]]:
         yield (
             version,
-            RpmPackageInfo.from_metadata(
-                self.packages[version], self.origin.url, version
-            ),
+            RpmPackageInfo.from_metadata(self.packages[version], version),
         )
 
     def uncompress(
@@ -100,13 +105,43 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
     def build_release(
         self, p_info: RpmPackageInfo, uncompressed_path: str, directory: Sha1Git
     ) -> Optional[Release]:
+
+        # extract tarballs that might be located in the root directory of the rpm
+        # package and adds a dedicated branch for it in the snapshot
+        root, _, files = next(walk(uncompressed_path))
+        for file in files:
+            file_path = path.join(root, file)
+            with tempfile.TemporaryDirectory() as tmpdir:
+                try:
+                    uncompress(file_path, tmpdir)
+                except Exception:
+                    # not a tarball
+                    continue
+
+                tarball_dir = from_disk.Directory.from_disk(
+                    path=tmpdir.encode("utf-8"),
+                    max_content_length=self.max_content_size,
+                )
+
+                contents, skipped_contents, directories = from_disk.iter_directory(
+                    tarball_dir
+                )
+                self.storage.skipped_content_add(skipped_contents)
+                self.storage.content_add(contents)
+                self.storage.directory_add(directories)
+
+                self.tarball_branches[file.encode()] = {
+                    "target_type": "directory",
+                    "target": tarball_dir.hash,
+                }
+
         msg = (
             f"Synthetic release for Rpm source package {p_info.name} "
             f"version {p_info.version}\n"
         )
 
         return Release(
-            name=p_info.name.encode(),
+            name=p_info.version.encode(),
             message=msg.encode(),
             author=EMPTY_AUTHOR,
             date=TimestampWithTimezone.from_iso8601(p_info.build_time),
@@ -115,6 +150,9 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
             synthetic=True,
         )
 
+    def extra_branches(self) -> Dict[bytes, Mapping[str, Any]]:
+        return self.tarball_branches
+
 
 def extract_rpm_package(rpm_path: str, dest: str) -> str:
     """Extracts an RPM package."""
swh/loader/package/rpm/loader.py
1

Copyright (C) 2022

25

this field is not used, you can remove it

31

I would rather use string.Template("$name $version $build_time") as manifest format.

35

origin parameter is not used, you can remove it

36

Add from __future__ import annotations at the top of the file and you can remove the quotes around RpmPackageInfo

43

with recent updates of the lister, version is now 1.18.0-5.fc34

84

we should sort the versions just in case: return list(sorted(self.packages, key=parse_version))

from packaging.version import parse as parse_version
85

You should also implement get_default_version to have the latest version displayed by default in the webapp.

def get_default_version(self) -> str:
    """Get the newest release version of a rpm package"""
    return self.get_versions()[-1]
109

The release name should be the version here otherwise all releases will have the same name.

This revision now requires changes to proceed.Nov 10 2022, 3:05 PM
  • Extract .tar.gz as a seperate branch (and other suggestions made by @anlambert)
  • Remove .tar.gz extraction logic from extract_rpm_package function. Previously, I was just replacing .tar.gz with its extracted folder but now we are creating a separate branch as well.
  • Updating relevant tests for the same

Build is green

Patch application report for D8753 (id=31860)

Rebasing onto bf2cb039d5...

First, rewinding head to replay your work on top of it...
Applying: feat: Incremental RPM loader implementation
Using index info to reconstruct a base tree...
M	setup.py
Falling back to patching base and 3-way merge...
Auto-merging setup.py
CONFLICT (content): Merge conflict in setup.py
Patch failed at 0001 feat: Incremental RPM loader implementation

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto bf2cb039d5...

Already up to date.
Changes applied before test
commit 7ce76fe96751f35dd56fbb2aec9ce56bc7579d53
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Incremental RPM loader implementation

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1025/ for more details.

Minor fixes in the loader docstrings

Build is green

Patch application report for D8753 (id=31862)

Rebasing onto bf2cb039d5...

First, rewinding head to replay your work on top of it...
Applying: feat: Incremental RPM loader implementation
Using index info to reconstruct a base tree...
M	setup.py
Falling back to patching base and 3-way merge...
Auto-merging setup.py
CONFLICT (content): Merge conflict in setup.py
Patch failed at 0001 feat: Incremental RPM loader implementation

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto bf2cb039d5...

Already up to date.
Changes applied before test
commit 70e2577b2b36d048738b6de47de73b808d042374
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Incremental RPM loader implementation

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1026/ for more details.

Build is green

Patch application report for D8753 (id=31889)

Rebasing onto 31ab1aa69e...

Current branch diff-target is up to date.
Changes applied before test
commit 44089c1677721ff720c1f43db696df8945df1140
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Incremental RPM loader implementation

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1030/ for more details.

swh/loader/package/rpm/loader.py
165

Any suggestions on how can I test this? Coverage is 99% because of this.

@KShivendu , we are almost in a landable state for the RPM loader.

I gave a last round of tests to the fedora lister and RPM loader in docker and now that D8847 got landed for the webapp, I have a better understanding of what must be done to have results similar to the debian lister/loader.

I pushed D8848 to slightly update the fedora lister regarding the package versions we should provide to the generic RPM loader.

After these changes, I had to update the RPM loader the following way to have an output similar to the debian loader (diff generated from the current state of that Phab diff):

diff --git a/swh/loader/package/rpm/loader.py b/swh/loader/package/rpm/loader.py
index 8b22685..cf30028 100644
--- a/swh/loader/package/rpm/loader.py
+++ b/swh/loader/package/rpm/loader.py
@@ -17,7 +17,7 @@ from packaging.version import parse as parse_version
 
 from swh.core.tarball import uncompress
 from swh.loader.package.loader import BasePackageInfo, PackageLoader
-from swh.loader.package.utils import EMPTY_AUTHOR
+from swh.loader.package.utils import EMPTY_AUTHOR, release_name
 from swh.model import from_disk
 from swh.model.model import ObjectType, Release, Sha1Git, TimestampWithTimezone
 from swh.storage.interface import StorageInterface
@@ -28,11 +28,13 @@ logger = logging.getLogger(__name__)
 @attr.s
 class RpmPackageInfo(BasePackageInfo):
     name = attr.ib(type=str)
+    intrinsic_version = attr.ib(type=str)
+    """Intrinsic version of the package, independent from the distribution it was found"""
     build_time = attr.ib(type=str, default=None)
     """Build time of the package in iso format. (e.g. 2017-02-10T04:59:31+00:00)"""
 
     EXTID_TYPE = "rpm-sha256"
-    MANIFEST_FORMAT = string.Template("$name $version $build_time")
+    MANIFEST_FORMAT = string.Template("$name $intrinsic_version $build_time")
 
     @classmethod
     def from_metadata(cls, a_metadata: Dict[str, Any], version: str) -> RpmPackageInfo:
@@ -43,7 +45,8 @@ class RpmPackageInfo(BasePackageInfo):
             name=a_metadata["name"],  # nginx
             url=a_metadata["url"],  # url of the .rpm file
             filename=filename,  # nginx-1.18.0-5.fc34.src.rpm
-            version=version,  # 1.18.0-5.fc34
+            version=version,  # fedora34/everything/1.18.0-5
+            intrinsic_version=a_metadata["version"],  # 1.18.0-5
             build_time=a_metadata["buildTime"],
             checksums=a_metadata["checksums"],
         )
@@ -66,9 +69,9 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
             packages: versioned packages and associated artifacts, example::
 
               {
-                '1.18.0-5.fc34': {
+                'fedora34/everything/1.18.0-5': {
                   'name': 'nginx',
-                  'version': '1.18.0'
+                  'version': '1.18.0-5'
                   'release': 34,
                   'edition': 'Everything',
                   'buildTime': '2022-11-01T12:00:55+00:00',
@@ -87,7 +90,7 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
         self.tarball_branches: Dict[bytes, Mapping[str, Any]] = {}
 
     def get_versions(self) -> Sequence[str]:
-        """Returns the keys of the packages input (e.g. 1.18.0-5.fc34, etc...)"""
+        """Returns the keys of the packages input (e.g. fedora34/everything/1.18.0-5, etc...)"""
         return list(sorted(self.packages, key=parse_version))
 
     def get_default_version(self) -> str:
@@ -95,7 +98,10 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
         return self.get_versions()[-1]
 
     def get_package_info(self, version: str) -> Iterator[Tuple[str, RpmPackageInfo]]:
-        yield (version, RpmPackageInfo.from_metadata(self.packages[version], version))
+        yield (
+            release_name(version),
+            RpmPackageInfo.from_metadata(self.packages[version], version),
+        )
 
     def uncompress(
         self, dl_artifacts: List[Tuple[str, Mapping[str, Any]]], dest: str
@@ -141,7 +147,7 @@ class RpmLoader(PackageLoader[RpmPackageInfo]):
         )
 
         return Release(
-            name=p_info.version.encode(),
+            name=p_info.intrinsic_version.encode(),
             message=msg.encode(),
             author=EMPTY_AUTHOR,
             date=TimestampWithTimezone.from_iso8601(p_info.build_time),

Tests still need to be updated though.

swh/loader/package/rpm/loader.py
165

debug logs are not of interest to test in that case and 99% coverage is already quite good, do not bother with it.

This revision now requires changes to proceed.Nov 15 2022, 5:34 PM
swh/loader/package/rpm/tests/test_tasks.py
1–66

Use this instead to test the celery task for the rpm loader:

# Copyright (C) 2022  The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information

import uuid

import pytest

from swh.scheduler.model import ListedOrigin, Lister

NAMESPACE = "swh.loader.package.rpm"

RPM_ORIGIN_URL = "https://src.fedoraproject.org/rpms/0xFFFF"

RPM_PACKAGES = {
    "fedora36/everything/0.10-4": {
        "name": "0xFFFF",
        "version": "0.10-4",
        "release": 36,
        "edition": "Everything",
        "buildTime": "2022-01-19T19:13:53+00:00",
        "url": (
            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/"
            "36/Everything/source/tree/Packages/0/0xFFFF-0.10-4.fc36.src.rpm"
        ),
        "checksums": {
            "sha256": "45eee8d990d502324ae665233c320b8a5469c25d735f1862e094c1878d6ff2cd"
        },
    }
}


@pytest.fixture
def fedora_lister():
    return Lister(name="fedora", instance_name="fedora", id=uuid.uuid4())


@pytest.fixture
def fedora_listed_origin(fedora_lister):
    return ListedOrigin(
        lister_id=fedora_lister.id,
        url=RPM_ORIGIN_URL,
        visit_type="rpm",
        extra_loader_arguments={
            "packages": RPM_PACKAGES,
        },
    )


def test_rpm_loader_task_for_listed_origin(
    loading_task_creation_for_listed_origin_test,
    fedora_lister,
    fedora_listed_origin,
):

    loading_task_creation_for_listed_origin_test(
        loader_class_name=f"{NAMESPACE}.loader.RpmLoader",
        task_function_name=f"{NAMESPACE}.tasks.LoadRpm",
        lister=fedora_lister,
        listed_origin=fedora_listed_origin,
    )
  • Use intrinsic version (and other suggestions by @anlambert)
  • Update tests for the same

Build is green

Patch application report for D8753 (id=31894)

Rebasing onto 31ab1aa69e...

Current branch diff-target is up to date.
Changes applied before test
commit d071b137000a47dc53a057563362242c9bc77c39
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Incremental RPM loader implementation

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1031/ for more details.

@KShivendu , before landing this please fix the year in some license headers and update tests to match fedora lister output.

swh/loader/package/rpm/tasks.py
1

(C) 2022

swh/loader/package/rpm/tests/test_rpm.py
1

(C) 2022

29

fedora34/everything/1.18.0-5 instead to match the lister output

31

1.18.0-5 to match the lister output

44–56

same here

This revision now requires changes to proceed.Nov 16 2022, 10:56 AM
  • Fix license headers
  • Fix test cases to match Fedora lister output

Build is green

Patch application report for D8753 (id=31898)

Rebasing onto 31ab1aa69e...

Current branch diff-target is up to date.
Changes applied before test
commit a196c85d5ab1ed3ce09db6800559641c309f51e9
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sun Oct 23 11:49:47 2022 +0530

    feat: Incremental RPM loader implementation

See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1032/ for more details.

Looks good to me, thanks !

This revision is now accepted and ready to land.Nov 16 2022, 2:18 PM