Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 8 2023

gitlab-migration closed T1246: pypi loader: Analyze existing errors as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:21 PM · Archive coverage, Origin-Pypi
gitlab-migration changed the status of T2098: Deploy package loaders from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:00 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
gitlab-migration changed the status of T2023: Re-implement gnu loader with package loader, a subtask of T1389: Implement a base "package" loader for package managers, from Resolved to Migrated.
Jan 8 2023, 9:59 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
gitlab-migration changed the status of T2022: Re-implement npm loader with base loader, a subtask of T1389: Implement a base "package" loader for package managers, from Resolved to Migrated.
Jan 8 2023, 9:59 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
gitlab-migration changed the status of T2021: Re-implement pypi loader with package loader from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 9:59 PM · Origin-Pypi, Archive coverage
gitlab-migration changed the status of T2021: Re-implement pypi loader with package loader, a subtask of T1389: Implement a base "package" loader for package managers, from Resolved to Migrated.
Jan 8 2023, 9:59 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
gitlab-migration changed the status of T1389: Implement a base "package" loader for package managers from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 9:58 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
gitlab-migration changed the status of T1181: pypi: Schedule ingestion from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 9:58 PM · Archive coverage, Origin-Pypi
gitlab-migration changed the status of T1181: pypi: Schedule ingestion, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Resolved to Migrated.
Jan 8 2023, 9:58 PM · Archive coverage, Origin-Pypi
gitlab-migration changed the status of T420: mirror PyPI, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Wontfix to Migrated.
Jan 8 2023, 9:56 PM · Archive coverage, Origin-Pypi
gitlab-migration changed the status of T420: mirror PyPI from Wontfix to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 9:56 PM · Origin-Pypi
gitlab-migration changed the status of T422: PyPI lister, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Resolved to Migrated.
Jan 8 2023, 9:56 PM · Archive coverage, Origin-Pypi
gitlab-migration changed the status of T422: PyPI lister from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 9:56 PM · Developers, Origin-Pypi
gitlab-migration changed the status of T419: ingest PyPI into the Software Heritage archive (meta task) from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 9:56 PM · Archive coverage, Origin-Pypi
gitlab-migration changed the status of T3399: Improve PyPI lister to pull last update information when running incrementally from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:34 PM · Origin-Pypi, Lister
gitlab-migration changed the status of T2025: Re-implement debian loader with package loader mechanism, a subtask of T1389: Implement a base "package" loader for package managers, from Resolved to Migrated.
Jan 8 2023, 4:28 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
gitlab-migration changed the status of T2024: Re-implement deposit loader with package loader mechanism, a subtask of T1389: Implement a base "package" loader for package managers, from Resolved to Migrated.
Jan 8 2023, 4:28 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
gitlab-migration changed the status of T2017: package loader: Discuss revision metadata normalization, a subtask of T1389: Implement a base "package" loader for package managers, from Resolved to Migrated.
Jan 8 2023, 4:28 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
gitlab-migration changed the status of T1890: pypi lister: Add tests from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Origin-Pypi, Lister
gitlab-migration changed the status of T421: PyPI loader from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:19 PM · PyPI loader, Origin-Pypi
gitlab-migration changed the status of T421: PyPI loader, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Resolved to Migrated.
Jan 8 2023, 4:19 PM · Archive coverage, Origin-Pypi

Jul 9 2021

ardumont closed T3399: Improve PyPI lister to pull last update information when running incrementally as Resolved.

Deployed and running so closing.

Jul 9 2021, 2:52 PM · Origin-Pypi, Lister
ardumont added a comment to T3399: Improve PyPI lister to pull last update information when running incrementally.

Deployed in production as well and triggered a run:

Jul 9 2021, 2:51 PM · Origin-Pypi, Lister
ardumont added a comment to T3399: Improve PyPI lister to pull last update information when running incrementally.

From a quick test, it looks like the "Project and release activity details" feed can go back multiple years without any issue, allowing us to backfill the data for all known origins, before adding the incremental behavior to the lister.

Jul 9 2021, 2:09 PM · Origin-Pypi, Lister
ardumont added a comment to T3399: Improve PyPI lister to pull last update information when running incrementally.

Deployed in staging and triggered a run:

Jul 9 2021, 2:03 PM · Origin-Pypi, Lister
ardumont closed D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 12:59 PM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 12:56 PM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.
  • Adapt according to last suggestion about stub
  • and from the irc discussion, adding a sleep call to avoid the first throttling due to the initialization of the xmlrpc client [1]
Jul 9 2021, 12:53 PM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 12:21 PM · Scheduling utilities, Origin-Pypi
ardumont added inline comments to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 12:19 PM · Scheduling utilities, Origin-Pypi
ardumont added inline comments to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 12:18 PM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.

Attend to the major part of the review (thx)

Jul 9 2021, 12:18 PM · Scheduling utilities, Origin-Pypi
olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 12:11 PM · Scheduling utilities, Origin-Pypi
ardumont added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Looks good, thanks a lot!

Jul 9 2021, 12:06 PM · Scheduling utilities, Origin-Pypi
olasd accepted D5977: Make PyPI lister incremental and complete in regards to last_update.

Looks good, thanks a lot!

Jul 9 2021, 11:57 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:47 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:45 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:41 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:40 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:15 AM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 11:15 AM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.
  • Handle throttling appropriately (adapt throttling decorators [1])
  • Rework logic to still deal with pages of results so the write in dbs occur incrementally
Jul 9 2021, 11:12 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 11:10 AM · Scheduling utilities, Origin-Pypi
ardumont planned changes to D5977: Make PyPI lister incremental and complete in regards to last_update.

Running through docker, i actually need to change a few things:

  • throttling needs to change (D5983)
  • actual logic of sending all origins in one go won't be ok for the first run (no flush prior to actually finish the run...) so i'll need to rework this (in-progress) [1]
Jul 9 2021, 10:59 AM · Scheduling utilities, Origin-Pypi
ardumont added a revision to T3399: Improve PyPI lister to pull last update information when running incrementally: D5983: pypi.lister: Handle xml-rpc throttling properly.
Jul 9 2021, 10:50 AM · Origin-Pypi, Lister
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 10:18 AM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.

Rebase

Jul 9 2021, 10:15 AM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 10:11 AM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.
  • Rebase
  • Refactor
Jul 9 2021, 10:08 AM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 9 2021, 10:03 AM · Scheduling utilities, Origin-Pypi
ardumont retitled D5977: Make PyPI lister incremental and complete in regards to last_update from Add incremental PyPI lister which fetches last_update per origin to Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 10:01 AM · Scheduling utilities, Origin-Pypi
ardumont updated the test plan for D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 10:01 AM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.

Adapt according to review:

  • This rewrote completely the lister and stops using the simple api [1]
  • Uses the client.changelog_last_serial and client.changelog_since_serial endpoint
Jul 9 2021, 10:00 AM · Scheduling utilities, Origin-Pypi
ardumont added inline comments to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 9 2021, 8:52 AM · Scheduling utilities, Origin-Pypi

Jul 8 2021

ardumont added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

From my quick testing, I have noticed that the changelog methods only returns a limited number of results, so we would need to iterate calls.

I think the properly paginated way of doing with the current PyPI XMLRPC api is to call:

  • changelog_last_serial() to get the highest serial, to be used as a termination condition. Currently returns 2168587
  • changelog_since_serial(<current_serial>) in a loop until the last serial returned is higher than the one set as termination condition. Looks like this returns 50k results per call.

(this will make us miss the last few updates that happened since the lister started, but this is probably marginal).

Jul 8 2021, 9:26 AM · Scheduling utilities, Origin-Pypi

Jul 7 2021

olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.
In D5977#153851, @olasd wrote:

Looking at the size of the changelog (2-ish million entries for 50k-ish pages means 50-ish requests), I /think/ the lister could always be running in incremental mode, rather than having to maintain two modes in the long run.

Jul 7 2021, 6:04 PM · Scheduling utilities, Origin-Pypi
olasd added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

From my quick testing, I have noticed that the changelog methods only returns a limited number of results, so we would need to iterate calls.

Jul 7 2021, 6:02 PM · Scheduling utilities, Origin-Pypi
ardumont added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Nice, that lister should perform better once that feature deployed to production.

Jul 7 2021, 5:45 PM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 7 2021, 5:43 PM · Scheduling utilities, Origin-Pypi
ardumont updated the diff for D5977: Make PyPI lister incremental and complete in regards to last_update.

Adapt according to judicious remarks (thanks ;)

Jul 7 2021, 5:39 PM · Scheduling utilities, Origin-Pypi
ardumont added inline comments to D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 7 2021, 5:38 PM · Scheduling utilities, Origin-Pypi
ardumont added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Thanks!

Jul 7 2021, 5:24 PM · Scheduling utilities, Origin-Pypi
anlambert added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Nice, that lister should perform better once that feature deployed to production. I added a first batch of inline comments.

Jul 7 2021, 5:18 PM · Scheduling utilities, Origin-Pypi
swh-public-ci added a comment to D5977: Make PyPI lister incremental and complete in regards to last_update.

Build is green

Jul 7 2021, 4:35 PM · Scheduling utilities, Origin-Pypi
ardumont added a reviewer for D5977: Make PyPI lister incremental and complete in regards to last_update: olasd.
Jul 7 2021, 4:34 PM · Scheduling utilities, Origin-Pypi
ardumont added a revision to T3399: Improve PyPI lister to pull last update information when running incrementally: D5977: Make PyPI lister incremental and complete in regards to last_update.
Jul 7 2021, 4:20 PM · Origin-Pypi, Lister

Jun 21 2021

olasd triaged T3399: Improve PyPI lister to pull last update information when running incrementally as Normal priority.
Jun 21 2021, 2:48 PM · Origin-Pypi, Lister

Nov 26 2019

ardumont closed T2098: Deploy package loaders as Resolved.
Nov 26 2019, 5:28 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont updated the task description for T2098: Deploy package loaders.
Nov 26 2019, 5:28 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont closed T1389: Implement a base "package" loader for package managers as Resolved.
Nov 26 2019, 12:25 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont removed a parent task for T2098: Deploy package loaders: T1389: Implement a base "package" loader for package managers.
Nov 26 2019, 12:24 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont removed a subtask for T1389: Implement a base "package" loader for package managers: T2098: Deploy package loaders.
Nov 26 2019, 12:24 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont updated the task description for T2098: Deploy package loaders.
Nov 26 2019, 12:13 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Nov 20 2019

ardumont updated the task description for T2098: Deploy package loaders.
Nov 20 2019, 9:42 AM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Nov 19 2019

ardumont removed a subtask for T1389: Implement a base "package" loader for package managers: T2026: Implement cran loader with package manager mechanism.
Nov 19 2019, 12:27 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont closed T2025: Re-implement debian loader with package loader mechanism, a subtask of T1389: Implement a base "package" loader for package managers, as Resolved.
Nov 19 2019, 12:26 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont closed T2024: Re-implement deposit loader with package loader mechanism, a subtask of T1389: Implement a base "package" loader for package managers, as Resolved.
Nov 19 2019, 12:26 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont closed T2023: Re-implement gnu loader with package loader, a subtask of T1389: Implement a base "package" loader for package managers, as Resolved.
Nov 19 2019, 12:26 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont closed T2022: Re-implement npm loader with base loader, a subtask of T1389: Implement a base "package" loader for package managers, as Resolved.
Nov 19 2019, 12:25 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont closed T2021: Re-implement pypi loader with package loader as Resolved.
Nov 19 2019, 12:25 PM · Origin-Pypi, Archive coverage
ardumont closed T2021: Re-implement pypi loader with package loader, a subtask of T1389: Implement a base "package" loader for package managers, as Resolved.
Nov 19 2019, 12:25 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont triaged T2098: Deploy package loaders as Normal priority.
Nov 19 2019, 12:25 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Oct 13 2019

ardumont changed the status of T2025: Re-implement debian loader with package loader mechanism, a subtask of T1389: Implement a base "package" loader for package managers, from Open to Work in Progress.
Oct 13 2019, 6:37 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Oct 5 2019

ardumont added a project to T1389: Implement a base "package" loader for package managers: Origin-Debian.
Oct 5 2019, 2:30 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont added a project to T1389: Implement a base "package" loader for package managers: Origin-CRAN.
Oct 5 2019, 2:29 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Oct 3 2019

ardumont changed the status of T2024: Re-implement deposit loader with package loader mechanism, a subtask of T1389: Implement a base "package" loader for package managers, from Open to Work in Progress.
Oct 3 2019, 6:03 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Oct 1 2019

ardumont closed T2017: package loader: Discuss revision metadata normalization, a subtask of T1389: Implement a base "package" loader for package managers, as Resolved.
Oct 1 2019, 6:46 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont added a subtask for T1389: Implement a base "package" loader for package managers: T2017: package loader: Discuss revision metadata normalization.
Oct 1 2019, 6:46 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont added a project to T1389: Implement a base "package" loader for package managers: Origin-GNU.
Oct 1 2019, 6:40 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont triaged T2026: Implement cran loader with package manager mechanism as Normal priority.
Oct 1 2019, 1:29 PM · Origin-CRAN, Archive coverage
ardumont added a comment to T1389: Implement a base "package" loader for package managers.

Current work is in the swh-loader-repository within the package-loader branch.

Oct 1 2019, 1:23 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont renamed T1389: Implement a base "package" loader for package managers from Implement a base loader for package managers to Implement a base "package" loader for package managers.
Oct 1 2019, 1:20 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont changed the status of T1389: Implement a base "package" loader for package managers from Open to Work in Progress.
Oct 1 2019, 1:20 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont changed the status of T2021: Re-implement pypi loader with package loader from Open to Work in Progress.
Oct 1 2019, 1:19 PM · Origin-Pypi, Archive coverage
ardumont changed the status of T2021: Re-implement pypi loader with package loader, a subtask of T1389: Implement a base "package" loader for package managers, from Open to Work in Progress.
Oct 1 2019, 1:19 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont changed the status of T2022: Re-implement npm loader with base loader, a subtask of T1389: Implement a base "package" loader for package managers, from Open to Work in Progress.
Oct 1 2019, 1:19 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage
ardumont renamed T2021: Re-implement pypi loader with package loader from Re-implement pypi loader with base loader to Re-implement pypi loader with package loader.
Oct 1 2019, 1:18 PM · Origin-Pypi, Archive coverage
ardumont changed the status of T2023: Re-implement gnu loader with package loader from Open to Work in Progress.
Oct 1 2019, 1:14 PM · Origin-GNU, Archive coverage
ardumont triaged T2022: Re-implement npm loader with base loader as Normal priority.
Oct 1 2019, 1:12 PM · Origin-npm, Archive coverage
ardumont triaged T2021: Re-implement pypi loader with package loader as Normal priority.
Oct 1 2019, 1:11 PM · Origin-Pypi, Archive coverage