Page MenuHomeSoftware Heritage

Pub.dev lister for Dart and Flutter packages
ClosedPublic

Authored by franckbret on Aug 23 2022, 10:46 AM.

Details

Summary

Stateless lister for https://pub.dev based on http api to list package names and versions

Diff Detail

Repository
rDLS Listers
Branch
pubdev
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 30973
Build 48444: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 48443: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8287 (id=29921)

Rebasing onto dde7865ac4...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Pub.dev lister for Dart and Flutter packages
Changes applied before test
commit b026fbc39c3e296362d1bf5fa773aa814e45fc66
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 23 10:41:23 2022 +0200

    [WIP] Pub.dev lister for Dart and Flutter packages
    
    Stateless lister for https://pub.dev based on http api to list package names and versions

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/601/ for more details.

Hey,

If I understand this correctly:

  1. get_pages() first gets a list of packages, then
  2. get_pages() fetches all these pages, then
  3. get_origins_from_page() creates an origin for each page with the list of versions, with a list of version of that package

Given that the lister's job is only to discover a list of origins, I think the lister should only do step 1 and create a ListedOrigin.
The loader should then fetch the list of versions of each package, which is part of its job.

The only reason some listers fetch the list of versions is that it would not be otherwise available to loaders, but that is not true here, so we should avoid it.

This keeps the architecture closer to what we want to do, and avoids bloating the scheduler database with (potentially long) lists of releases in each row.

Hey,

If I understand this correctly:

  1. get_pages() first gets a list of packages, then
  2. get_pages() fetches all these pages, then
  3. get_origins_from_page() creates an origin for each page with the list of versions, with a list of version of that package

Given that the lister's job is only to discover a list of origins, I think the lister should only do step 1 and create a ListedOrigin.
The loader should then fetch the list of versions of each package, which is part of its job.

The only reason some listers fetch the list of versions is that it would not be otherwise available to loaders, but that is not true here, so we should avoid it.

This keeps the architecture closer to what we want to do, and avoids bloating the scheduler database with (potentially long) lists of releases in each row.

Ok, you answer the questions I wanted to ask. I will change the lister to create a ListedOrigin that is the second endpoint api.

Remove package versions data from lister.

Build is green

Patch application report for D8287 (id=29951)

Rebasing onto dde7865ac4...

First, rewinding head to replay your work on top of it...
Applying: [WIP] Pub.dev lister for Dart and Flutter packages
Changes applied before test
commit 0503d9b9484e602c20654e58bbc898126db562f0
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 23 10:41:23 2022 +0200

    [WIP] Pub.dev lister for Dart and Flutter packages
    
    Stateless lister for https://pub.dev based on http api to list package names

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/606/ for more details.

vlorentz added inline comments.
swh/lister/pubdev/lister.py
89

I feel it would make more sense to move this loop to get_origins_from_page, since we only have one page. (Not a big deal, in the end the result is the same)

105

last_update should be None when the lister doesn't know when the package was last updated.

Setting a non-None value would make the scheduler believe we know for sure there is an update, so it would prioritize loading this origin over others.

swh/lister/pubdev/tests/test_lister.py
24–48

You can use set equality to simplify this. On error, pytest can be smart and diff set content just as well as lists

Also I removed a redundant assignment and length comparison.

This revision now requires changes to proceed.Aug 24 2022, 1:58 PM

Some improvments after review

Lister is now one page
Simplier dict comparison in tests

Build is green

Patch application report for D8287 (id=29988)

Rebasing onto 4b511b4181...

First, rewinding head to replay your work on top of it...
Applying: Pub.dev lister for Dart and Flutter packages
Changes applied before test
commit 9d010ef6018b4ca422f87b3ffd19cb42df6a8659
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 23 10:41:23 2022 +0200

    Pub.dev lister for Dart and Flutter packages
    
    Stateless lister for https://pub.dev based on http api to list package names

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/610/ for more details.

franckbret retitled this revision from [WIP] Pub.dev lister for Dart and Flutter packages to Pub.dev lister for Dart and Flutter packages.

Build is green

Patch application report for D8287 (id=30026)

Rebasing onto ce72969de5...

First, rewinding head to replay your work on top of it...
Applying: Pub.dev lister for Dart and Flutter packages
Changes applied before test
commit b32f54622ddfa67cb129cc0226d0c27fd2729597
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 23 10:41:23 2022 +0200

    Pub.dev lister for Dart and Flutter packages
    
    Stateless lister for https://pub.dev based on http api to list package names

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/620/ for more details.

This revision is now accepted and ready to land.Aug 26 2022, 10:12 AM

Build is green

Patch application report for D8287 (id=30050)

Rebasing onto ce72969de5...

Current branch diff-target is up to date.
Changes applied before test
commit 5410b6e3f38a31b9d20befd30e37ad8c85b8ae8e
Author: Franck Bret <franck.bret@octobus.net>
Date:   Tue Aug 23 10:41:23 2022 +0200

    Pub.dev lister for Dart and Flutter packages
    
    Stateless lister for https://pub.dev based on http api to list package names

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/621/ for more details.