Stateless lister for https://pub.dev based on http api to list package names and versions
Details
- Reviewers
vlorentz - Group Reviewers
Reviewers - Maniphest Tasks
- T4465: Ingest pub.dev (Dart, Flutter)
- Commits
- rDLS5410b6e3f38a: Pub.dev lister for Dart and Flutter packages
Diff Detail
- Repository
- rDLS Listers
- Branch
- pubdev
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 31076 Build 48618: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 48617: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D8287 (id=29921)
Rebasing onto dde7865ac4...
First, rewinding head to replay your work on top of it... Applying: [WIP] Pub.dev lister for Dart and Flutter packages
Changes applied before test
commit b026fbc39c3e296362d1bf5fa773aa814e45fc66 Author: Franck Bret <franck.bret@octobus.net> Date: Tue Aug 23 10:41:23 2022 +0200 [WIP] Pub.dev lister for Dart and Flutter packages Stateless lister for https://pub.dev based on http api to list package names and versions
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/601/ for more details.
Hey,
If I understand this correctly:
- get_pages() first gets a list of packages, then
- get_pages() fetches all these pages, then
- get_origins_from_page() creates an origin for each page with the list of versions, with a list of version of that package
Given that the lister's job is only to discover a list of origins, I think the lister should only do step 1 and create a ListedOrigin.
The loader should then fetch the list of versions of each package, which is part of its job.
The only reason some listers fetch the list of versions is that it would not be otherwise available to loaders, but that is not true here, so we should avoid it.
This keeps the architecture closer to what we want to do, and avoids bloating the scheduler database with (potentially long) lists of releases in each row.
Ok, you answer the questions I wanted to ask. I will change the lister to create a ListedOrigin that is the second endpoint api.
Build is green
Patch application report for D8287 (id=29951)
Rebasing onto dde7865ac4...
First, rewinding head to replay your work on top of it... Applying: [WIP] Pub.dev lister for Dart and Flutter packages
Changes applied before test
commit 0503d9b9484e602c20654e58bbc898126db562f0 Author: Franck Bret <franck.bret@octobus.net> Date: Tue Aug 23 10:41:23 2022 +0200 [WIP] Pub.dev lister for Dart and Flutter packages Stateless lister for https://pub.dev based on http api to list package names
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/606/ for more details.
swh/lister/pubdev/lister.py | ||
---|---|---|
89 | I feel it would make more sense to move this loop to get_origins_from_page, since we only have one page. (Not a big deal, in the end the result is the same) | |
105 | last_update should be None when the lister doesn't know when the package was last updated. Setting a non-None value would make the scheduler believe we know for sure there is an update, so it would prioritize loading this origin over others. | |
swh/lister/pubdev/tests/test_lister.py | ||
24–48 | You can use set equality to simplify this. On error, pytest can be smart and diff set content just as well as lists Also I removed a redundant assignment and length comparison. |
Some improvments after review
Lister is now one page
Simplier dict comparison in tests
Build is green
Patch application report for D8287 (id=29988)
Rebasing onto 4b511b4181...
First, rewinding head to replay your work on top of it... Applying: Pub.dev lister for Dart and Flutter packages
Changes applied before test
commit 9d010ef6018b4ca422f87b3ffd19cb42df6a8659 Author: Franck Bret <franck.bret@octobus.net> Date: Tue Aug 23 10:41:23 2022 +0200 Pub.dev lister for Dart and Flutter packages Stateless lister for https://pub.dev based on http api to list package names
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/610/ for more details.
Build is green
Patch application report for D8287 (id=30026)
Rebasing onto ce72969de5...
First, rewinding head to replay your work on top of it... Applying: Pub.dev lister for Dart and Flutter packages
Changes applied before test
commit b32f54622ddfa67cb129cc0226d0c27fd2729597 Author: Franck Bret <franck.bret@octobus.net> Date: Tue Aug 23 10:41:23 2022 +0200 Pub.dev lister for Dart and Flutter packages Stateless lister for https://pub.dev based on http api to list package names
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/620/ for more details.
Build is green
Patch application report for D8287 (id=30050)
Rebasing onto ce72969de5...
Current branch diff-target is up to date.
Changes applied before test
commit 5410b6e3f38a31b9d20befd30e37ad8c85b8ae8e Author: Franck Bret <franck.bret@octobus.net> Date: Tue Aug 23 10:41:23 2022 +0200 Pub.dev lister for Dart and Flutter packages Stateless lister for https://pub.dev based on http api to list package names
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/621/ for more details.