Details
- Reviewers
anlambert - Group Reviewers
Reviewers - Maniphest Tasks
- T4581: RubyGems loader
- Commits
- rDLDBASE8e34a6d77996: Rubygems: Improve lister to make use of artifacts and rubygems_metadata
rDLDBASE0022bb50abd1: Add rubygems loader
Diff Detail
- Repository
- rDLDBASE Generic VCS/Package Loader
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build has FAILED
Patch application report for D8569 (id=30919)
Rebasing onto 1facea3cd2...
Current branch diff-target is up to date.
Changes applied before test
commit fc60a338c728cae7e62b52bccd2adba972b67185 Author: Raphaël Gomès <rgomes@octobus.net> Date: Wed Sep 28 18:21:57 2022 +0200 Add rubygems loader
Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/906/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/906/console
Build is green
Patch application report for D8569 (id=30920)
Rebasing onto 1facea3cd2...
Current branch diff-target is up to date.
Changes applied before test
commit 0e5bb384551bac47c099a120457144dab2cdee83 Author: Raphaël Gomès <rgomes@octobus.net> Date: Wed Sep 28 18:21:57 2022 +0200 Add rubygems loader
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/907/ for more details.
This is the initial plan that I was going for with regards to what to load inside the .gem.
When discussing this over IRC, it became apparent that a more high-level policy decision is needed to know exactly what to do with RubyGems (and .gem files in general). Questions arose like:
Should we load from data.tar.gz contained inside even though it's technically not source code? What to do about the binary artifacts What is the RubyGems policy about code-redistribution license issues in binary-only gems?
I still think that the vast majority of the code will stay the same regardless of the policy (unless we decide not to load anything, of course), so this patch and all the previous exploratory work are not useless.
swh/loader/package/rubygems/loader.py | ||
---|---|---|
112 | I think we should use the value from the built_at field instead for the release date. For instance if you look at all the rails versions in its HTML page, |
Build is green
Patch application report for D8569 (id=30985)
Rebasing onto 6299c091ec...
First, rewinding head to replay your work on top of it... Applying: Add rubygems loader Using index info to reconstruct a base tree... M docs/package-loader-specifications.rst M setup.py Falling back to patching base and 3-way merge... Auto-merging setup.py Auto-merging docs/package-loader-specifications.rst
Changes applied before test
commit f108a1cdc185150ba3bb033985e1ae30378240c8 Author: Raphaël Gomès <rgomes@octobus.net> Date: Wed Sep 28 18:21:57 2022 +0200 Add rubygems loader
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/921/ for more details.
@Alphare, fyi I improved the rubygems lister in that commit in order to gather all artifacts related to a gem and send these info to the loader as extra arguments.
Below is an example of the lister output for a gem:
ListedOrigin( url="https://rubygems.org/gems/haar_joke", visit_type="rubygems", last_update=iso8601.parse_date("2016-11-05T00:00:00+00:00"), extra_loader_arguments={ "artifacts": [ { "url": "https://rubygems.org/downloads/haar_joke-0.0.2.gem", "length": 8704, "version": "0.0.2", "filename": "haar_joke-0.0.2.gem", "checksums": { "sha256": "85a8cf5f41890e9605265eeebfe9e99aa0350a01a3c799f9f55a0615a31a2f5f" }, }, { "url": "https://rubygems.org/downloads/haar_joke-0.0.1.gem", "length": 8704, "version": "0.0.1", "filename": "haar_joke-0.0.1.gem", "checksums": { "sha256": "a2ee7052fb8ffcfc4ec0fdb77fae9a36e473f859af196a36870a0f386b5ab55e" }, }, ], "rubygem_metadata": [ { "date": "2016-11-05T00:00:00+00:00", "authors": "Gemma Gotch", "version": "0.0.2", "extrinsic_metadata_url": "https://rubygems.org/api/v2/rubygems/haar_joke/versions/0.0.2.json", }, { "date": "2016-07-23T00:00:00+00:00", "authors": "Gemma Gotch", "version": "0.0.1", "extrinsic_metadata_url": "https://rubygems.org/api/v2/rubygems/haar_joke/versions/0.0.1.json", }, ], }, }
It enables to improve the scheduling of loading tasks for Ruby gems (by providing last_update value to ListedOrigin)
and it will save you of couple of calls to RubyGems Web API in the loader to fetch the list of versions for a gem.
So loader implementation must be adapted to use these new arguments.
Rubygems: Improve loader to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments
Use artifacts and rubygems_metadata to get list of versions, artifacts checksums and extrinsic metadata url
Add an EXTID manifest
Add metadata from extrinsic metadata
Build is green
Patch application report for D8569 (id=31678)
Could not rebase; Attempt merge onto e6847f3616...
Updating e6847f3..081df1d Fast-forward docs/package-loader-specifications.rst | 9 ++ setup.py | 1 + swh/loader/package/rubygems/__init__.py | 17 +++ swh/loader/package/rubygems/loader.py | 170 +++++++++++++++++++++ swh/loader/package/rubygems/tasks.py | 14 ++ .../package/rubygems/tests/data/expected.json | 76 +++++++++ .../api_v2_rubygems_haar_joke_versions_0.0.1.json | 1 + .../api_v2_rubygems_haar_joke_versions_0.0.2.json | 1 + .../downloads_haar_joke-0.0.1.gem | Bin 0 -> 8704 bytes .../downloads_haar_joke-0.0.2.gem | Bin 0 -> 8704 bytes swh/loader/package/rubygems/tests/test_rubygems.py | 62 ++++++++ swh/loader/package/rubygems/tests/test_tasks.py | 61 ++++++++ 12 files changed, 412 insertions(+) create mode 100644 swh/loader/package/rubygems/__init__.py create mode 100644 swh/loader/package/rubygems/loader.py create mode 100644 swh/loader/package/rubygems/tasks.py create mode 100644 swh/loader/package/rubygems/tests/data/expected.json create mode 100644 swh/loader/package/rubygems/tests/data/https_rubygems.org/api_v2_rubygems_haar_joke_versions_0.0.1.json create mode 100644 swh/loader/package/rubygems/tests/data/https_rubygems.org/api_v2_rubygems_haar_joke_versions_0.0.2.json create mode 100644 swh/loader/package/rubygems/tests/data/https_rubygems.org/downloads_haar_joke-0.0.1.gem create mode 100644 swh/loader/package/rubygems/tests/data/https_rubygems.org/downloads_haar_joke-0.0.2.gem create mode 100644 swh/loader/package/rubygems/tests/test_rubygems.py create mode 100644 swh/loader/package/rubygems/tests/test_tasks.py
Changes applied before test
commit 081df1dfec49e66007b230364d0ae31ea92a8094 Author: Franck Bret <franck.bret@octobus.net> Date: Thu Oct 27 18:50:19 2022 +0200 Rubygems: Improve lister to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments Use artifacts and rubygems_metadata to get list of versions, artifacts checksums and extrinsic metadata url Add an EXTID manifest Set metadata from extrinsic metadata commit 0022bb50abd156cca30472249a440234a626169b Author: Raphaël Gomès <rgomes@octobus.net> Date: Thu Oct 27 10:37:54 2022 +0200 Add rubygems loader Reviewers: #reviewers, anlambert Subscribers: anlambert Maniphest Tasks: T4581 Differential Revision: https://forge.softwareheritage.org/D8569
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1010/ for more details.
@anlambert Please note I used 'rubygems_metadata' instead of 'rubygem_metadata' as in the lister. Maybe I'm wrong but I think the lister should rename to rubygems_metadata?
@franckbret, I did not use plural because we are processing a single gem in the loader (with multiple versions but those are metadata for a single gem).
So I do not think we should modify the lister output.
Could you also test that raw extrinsic metadata have been correctly stored in the storage and can be retrieved ?
You can inspire from the cpan loader test to implement it.
I also added some inline comments to handle.
swh/loader/package/rubygems/loader.py | ||
---|---|---|
133 | there is no need to parse the received JSON here | |
146 | Use rubygem-release-json instead as format name (cpan loader use cpan-release-json). | |
147 | metadata=extrinsic_metadata_raw | |
swh/loader/package/rubygems/tests/test_rubygems.py | ||
15–18 | It would be easier to read if you inline the loader input data in that file imho, plus you only use the first element of the expected list and the second one is a copy of the first one. |
Build is green
Patch application report for D8569 (id=31716)
Could not rebase; Attempt merge onto e6847f3616...
Updating e6847f3..8e34a6d Fast-forward docs/package-loader-specifications.rst | 9 + setup.py | 1 + swh/loader/package/rubygems/__init__.py | 17 ++ swh/loader/package/rubygems/loader.py | 168 ++++++++++++++++++ swh/loader/package/rubygems/tasks.py | 14 ++ .../api_v2_rubygems_haar_joke_versions_0.0.1.json | 1 + .../api_v2_rubygems_haar_joke_versions_0.0.2.json | 1 + .../downloads_haar_joke-0.0.1.gem | Bin 0 -> 8704 bytes .../downloads_haar_joke-0.0.2.gem | Bin 0 -> 8704 bytes swh/loader/package/rubygems/tests/test_rubygems.py | 196 +++++++++++++++++++++ swh/loader/package/rubygems/tests/test_tasks.py | 61 +++++++ 11 files changed, 468 insertions(+) create mode 100644 swh/loader/package/rubygems/__init__.py create mode 100644 swh/loader/package/rubygems/loader.py create mode 100644 swh/loader/package/rubygems/tasks.py create mode 100644 swh/loader/package/rubygems/tests/data/https_rubygems.org/api_v2_rubygems_haar_joke_versions_0.0.1.json create mode 100644 swh/loader/package/rubygems/tests/data/https_rubygems.org/api_v2_rubygems_haar_joke_versions_0.0.2.json create mode 100644 swh/loader/package/rubygems/tests/data/https_rubygems.org/downloads_haar_joke-0.0.1.gem create mode 100644 swh/loader/package/rubygems/tests/data/https_rubygems.org/downloads_haar_joke-0.0.2.gem create mode 100644 swh/loader/package/rubygems/tests/test_rubygems.py create mode 100644 swh/loader/package/rubygems/tests/test_tasks.py
Changes applied before test
commit 8e34a6d7799660ef89beb683fc5577887e0366a3 Author: Franck Bret <franck.bret@octobus.net> Date: Thu Oct 27 18:50:19 2022 +0200 Rubygems: Improve lister to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments Use artifacts and rubygems_metadata to get list of versions, artifacts checksums and extrinsic metadata url Add an EXTID manifest Set metadata from extrinsic metadata commit 0022bb50abd156cca30472249a440234a626169b Author: Raphaël Gomès <rgomes@octobus.net> Date: Thu Oct 27 10:37:54 2022 +0200 Add rubygems loader Reviewers: #reviewers, anlambert Subscribers: anlambert Maniphest Tasks: T4581 Differential Revision: https://forge.softwareheritage.org/D8569
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/1011/ for more details.