Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 13 2019

nahimilega added a comment to T1389: Implement a base "package" loader for package managers.

@douardda We have a separate task(T1777) for rubygem lister. I guess we can add a separate column in the table by @anlambert showing status of the lister implementation and the tasks related to it.

Sep 13 2019, 9:03 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Sep 12 2019

douardda added a comment to T1389: Implement a base "package" loader for package managers.
Package managerPackages listing url Package metadata urlPackage source tarball url
RubygemsNo public api endpoint availablehttps://rubygems.org/api/v2/rubygems/[package]/versions/[version].jsonavailable in the medata
Sep 12 2019, 11:08 AM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Sep 11 2019

ardumont closed T419: ingest PyPI into the Software Heritage archive (meta task) as Resolved.
Sep 11 2019, 2:55 PM · Archive coverage, Origin-Pypi
ardumont removed a parent task for T1246: pypi loader: Analyze existing errors: T419: ingest PyPI into the Software Heritage archive (meta task).
Sep 11 2019, 2:54 PM · Archive coverage, Origin-Pypi
ardumont removed a subtask for T419: ingest PyPI into the Software Heritage archive (meta task): T1246: pypi loader: Analyze existing errors.
Sep 11 2019, 2:54 PM · Archive coverage, Origin-Pypi
ardumont added a comment to T1246: pypi loader: Analyze existing errors.

how many are left?

Sep 11 2019, 10:34 AM · Archive coverage, Origin-Pypi

Jul 18 2019

nahimilega closed T1890: pypi lister: Add tests as Resolved by committing rDLS08ade29e6de0: swh.lister.pypi: Add tests.
Jul 18 2019, 6:50 PM · Origin-Pypi, Lister

Jul 14 2019

nahimilega added a revision to T1890: pypi lister: Add tests: D1733: swh.lister.core: Add tests for simple lister.
Jul 14 2019, 8:28 PM · Origin-Pypi, Lister

Jul 8 2019

ardumont triaged T1890: pypi lister: Add tests as Normal priority.
Jul 8 2019, 10:29 AM · Origin-Pypi, Lister

Jul 7 2019

nahimilega added a revision to T1389: Implement a base "package" loader for package managers: D1694: swh.loader.package: Implement a method to prepare package visit.
Jul 7 2019, 10:10 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Jul 2 2019

nahimilega added a comment to T1389: Implement a base "package" loader for package managers.

Extending over the plan by @olasd. Here are some of my thoughts on the implementation of base loader.

Jul 2 2019, 12:46 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Jun 17 2019

olasd added a comment to T1389: Implement a base "package" loader for package managers.
In T1389#33215, @zack wrote:

Thanks @olasd, @ardumont, and @anlambert for this, it's a great plan and I like it a lot !

Just a few comments on the sidelines:

The lister will generate a one-shot task to load each package for the given repository, with the full information needed to do the data fetching.

This seemed clear from a different part of the description, but fwiw here I'm assuming the plan is to only load the version of the packages not already known/ingested in the past.

Jun 17 2019, 4:19 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Jun 12 2019

zack added a comment to T1389: Implement a base "package" loader for package managers.

Thanks @olasd, @ardumont, and @anlambert for this, it's a great plan and I like it a lot !

Jun 12 2019, 1:58 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

May 25 2019

zack added a comment to T1246: pypi loader: Analyze existing errors.

how many are left? can we close this as well as T419 now that the PyPI listers/loaders have been in production for a while?

May 25 2019, 5:00 PM · Archive coverage, Origin-Pypi

May 20 2019

olasd updated subscribers of T1389: Implement a base "package" loader for package managers.

We've discussed a plausible plan for a "base package manager loader" with @ardumont and, to some extent, @anlambert.

May 20 2019, 6:03 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Feb 7 2019

anlambert added a comment to T1389: Implement a base "package" loader for package managers.

The table below summarizes how to list all packages and get their metadata from well-known package managers.

Feb 7 2019, 4:32 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Feb 5 2019

anlambert raised the priority of T1389: Implement a base "package" loader for package managers from Wishlist to Normal.
Feb 5 2019, 2:31 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Dec 18 2018

vlorentz added a parent task for T1389: Implement a base "package" loader for package managers: T1425: refactor the loader stack for package managers.
Dec 18 2018, 4:57 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Nov 27 2018

anlambert triaged T1389: Implement a base "package" loader for package managers as Wishlist priority.
Nov 27 2018, 12:23 PM · Origin-Debian, Origin-CRAN, Origin-GNU, Origin-npm, Origin-Pypi, Archive coverage

Oct 22 2018

ardumont updated the task description for T1246: pypi loader: Analyze existing errors.
Oct 22 2018, 10:24 AM · Archive coverage, Origin-Pypi

Oct 18 2018

ardumont added a comment to T1246: pypi loader: Analyze existing errors.

Ok, so reworked the group_by_exception snippet to have a more sensible output:

Oct 18 2018, 11:27 AM · Archive coverage, Origin-Pypi
ardumont updated the title for P319 reschedule.pypi.csv - first origins with missing author resulted in T1246 issue (fixed since) from reschedule.pypi.csv - origin with missing author resulted in issue (fixed since) to reschedule.pypi.csv - first origins with missing author resulted in T1246 issue (fixed since).
Oct 18 2018, 9:34 AM · Origin-Pypi, PyPI loader
ardumont updated the title for P319 reschedule.pypi.csv - first origins with missing author resulted in T1246 issue (fixed since) from reschedule.pypi.csv to reschedule.pypi.csv - origin with missing author resulted in issue (fixed since).
Oct 18 2018, 9:33 AM · Origin-Pypi, PyPI loader

Oct 17 2018

ardumont added a comment to T1246: pypi loader: Analyze existing errors.

In any case, for now, like i said in [2], we will first schedule back
those 1409 origins in error.

Oct 17 2018, 4:22 PM · Archive coverage, Origin-Pypi
ardumont created P319 reschedule.pypi.csv - first origins with missing author resulted in T1246 issue (fixed since).
Oct 17 2018, 4:21 PM · Origin-Pypi, PyPI loader

Oct 16 2018

ardumont added a comment to T1246: pypi loader: Analyze existing errors.

Here is the pypi report about the loading errors.

Oct 16 2018, 2:03 PM · Archive coverage, Origin-Pypi

Oct 5 2018

ardumont renamed T1246: pypi loader: Analyze existing errors from Analyze pypi errors to pypi loader: Analyze existing errors.
Oct 5 2018, 6:31 PM · Archive coverage, Origin-Pypi
ardumont added a comment to T1246: pypi loader: Analyze existing errors.

kibana dashboard will help in that matters (P311 because it's noisy).

Oct 5 2018, 6:30 PM · Archive coverage, Origin-Pypi
ardumont triaged T1246: pypi loader: Analyze existing errors as Normal priority.
Oct 5 2018, 6:28 PM · Archive coverage, Origin-Pypi

Sep 21 2018

ardumont closed T421: PyPI loader as Resolved.
Sep 21 2018, 6:35 PM · PyPI loader, Origin-Pypi
ardumont closed T421: PyPI loader, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), as Resolved.
Sep 21 2018, 6:35 PM · Archive coverage, Origin-Pypi

Sep 20 2018

ardumont closed T1181: pypi: Schedule ingestion, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), as Resolved.
Sep 20 2018, 11:17 AM · Archive coverage, Origin-Pypi
ardumont closed T1181: pypi: Schedule ingestion as Resolved.
Sep 20 2018, 11:17 AM · Archive coverage, Origin-Pypi
ardumont added a comment to T1181: pypi: Schedule ingestion.

Now, it's scheduled. Just need to wait for the swh-scheduler-runner.service to finish its loop on task_types.

Sep 20 2018, 9:52 AM · Archive coverage, Origin-Pypi
ardumont added a comment to T1181: pypi: Schedule ingestion.
swhscheduler@saatchi:~$ python3 -m swh.scheduler.cli task list-pending -t swh-lister-pypi
Found 1 tasks
Sep 20 2018, 9:48 AM · Archive coverage, Origin-Pypi
ardumont updated the task description for T1181: pypi: Schedule ingestion.
Sep 20 2018, 9:47 AM · Archive coverage, Origin-Pypi
ardumont added a comment to T1181: pypi: Schedule ingestion.

Schedule the lister-pypi:

Sep 20 2018, 9:47 AM · Archive coverage, Origin-Pypi

Sep 19 2018

ardumont changed the status of T1181: pypi: Schedule ingestion, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Open to Work in Progress.
Sep 19 2018, 7:52 PM · Archive coverage, Origin-Pypi
ardumont changed the status of T1181: pypi: Schedule ingestion from Open to Work in Progress.
Sep 19 2018, 7:52 PM · Archive coverage, Origin-Pypi

Sep 6 2018

ardumont updated the task description for T1181: pypi: Schedule ingestion.
Sep 6 2018, 5:38 PM · Archive coverage, Origin-Pypi
ardumont renamed T1181: pypi: Schedule ingestion from pypi: Trigger listing task to pypi: Schedule ingestion.
Sep 6 2018, 5:37 PM · Archive coverage, Origin-Pypi
ardumont triaged T1181: pypi: Schedule ingestion as Normal priority.
Sep 6 2018, 5:31 PM · Archive coverage, Origin-Pypi
ardumont closed T422: PyPI lister, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), as Resolved.
Sep 6 2018, 5:31 PM · Archive coverage, Origin-Pypi
ardumont closed T422: PyPI lister as Resolved.
Sep 6 2018, 5:31 PM · Developers, Origin-Pypi

Sep 4 2018

zack edited projects for T421: PyPI loader, added: PyPI loader; removed Developers.
Sep 4 2018, 10:30 AM · PyPI loader, Origin-Pypi

Aug 27 2018

ardumont added a comment to T421: PyPI loader.

Then again, i'll check the pypi api's documentation. Hopefully, it's explained somewhere ;)

Aug 27 2018, 9:50 AM · PyPI loader, Origin-Pypi

Aug 23 2018

olasd added a comment to T421: PyPI loader.
In T421#21696, @zack wrote:

Still, we should probably have a "master" branch, to ease navigation, shouldn't we? (What do we do for Debian packages on this?)

Aug 23 2018, 3:47 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

So, having one branch in the snapshot per distribution format (tar/zip/etc.) is a nice and clean way of handling this.

Aug 23 2018, 3:10 PM · PyPI loader, Origin-Pypi
zack added a comment to T421: PyPI loader.
In T421#21693, @olasd wrote:
  • Unpack all the sdist formats
  • If things are well, the contents are identical. In that case, the revision objects would end up with the same id; we can ignore that there ever was multiple formats, and just have a single branch pointing to a single revision for that version of the package in the snapshot
  • If the contents are different, load both and make the snapshot have a branch pointing to each format.
Aug 23 2018, 9:40 AM · PyPI loader, Origin-Pypi

Aug 22 2018

ardumont added a comment to T421: PyPI loader.

The Debian loader doesn't create release objects. Our data model doesn't allow to attach arbitrary structured metadata to release objects (as Git doesn't either), so we've shortcut this level of indirection.

Aug 22 2018, 4:16 PM · PyPI loader, Origin-Pypi

Aug 21 2018

olasd added a comment to T421: PyPI loader.
In T421#21639, @zack wrote:

The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:

  1. the stack of objects to be added to the DAG
  2. the metadata to extract

For (1), I think what we currently do for Debian packages is as you said, i.e., snapshot -> release -> revision -> tarball root dir. Maybe you can check for comparison (or @olasd can chime in?). We should do the same here.

Aug 21 2018, 6:28 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

There remains 3 actions to do for the current implementation to be complete:

Aug 21 2018, 4:41 PM · PyPI loader, Origin-Pypi

Aug 2 2018

ardumont added a comment to T421: PyPI loader.

As far as I can tell from those examples, the metadata that PyPI gives you are the most recent ones, probably the ones extracted from the most recent version, so it would be incorrect to associate them to other releases.

Aug 2 2018, 3:56 PM · PyPI loader, Origin-Pypi
zack added a comment to T421: PyPI loader.

The pypi api provides already quite the information (P288, P289 for examples).
For now, the current implementation leverages it.

Aug 2 2018, 3:45 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

For (1), I think what we currently do for Debian packages is as you said, i.e., snapshot -> release -> revision -> tarball root dir. Maybe you can check for comparison (or @olasd can chime in?). We should do the same here.

Aug 2 2018, 3:31 PM · PyPI loader, Origin-Pypi
zack updated subscribers of T421: PyPI loader.

The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:

  1. the stack of objects to be added to the DAG
  2. the metadata to extract
Aug 2 2018, 2:37 PM · PyPI loader, Origin-Pypi

Aug 1 2018

ardumont changed the status of T422: PyPI lister, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Open to Work in Progress.
Aug 1 2018, 3:10 PM · Archive coverage, Origin-Pypi
ardumont changed the status of T422: PyPI lister from Open to Work in Progress.
Aug 1 2018, 3:10 PM · Developers, Origin-Pypi
ardumont changed the status of T421: PyPI loader from Open to Work in Progress.
Aug 1 2018, 3:10 PM · PyPI loader, Origin-Pypi
ardumont changed the status of T421: PyPI loader, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Open to Work in Progress.
Aug 1 2018, 3:10 PM · Archive coverage, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

capable of extracting upstream metadata that are meaningful (and specific to) PyPI.

Aug 1 2018, 3:10 PM · PyPI loader, Origin-Pypi

Jul 26 2018

ardumont closed T420: mirror PyPI, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), as Wontfix.
Jul 26 2018, 3:33 PM · Archive coverage, Origin-Pypi
ardumont closed T420: mirror PyPI as Wontfix.

As per comment [1], closing this as we will not implement mirroring.

Jul 26 2018, 3:33 PM · Origin-Pypi
ardumont added a comment to T420: mirror PyPI.

the out-of-dated-ness is most likely due to the fact that, since I opened this task 2 years ago...

Jul 26 2018, 12:02 PM · Origin-Pypi

Jul 25 2018

zack added a comment to T422: PyPI lister.

If your consumer is actually an organization or service that will be downloading a lot of packages from PyPI, consider using your own index mirror or cache.

That's not a sustainable way. If we choose that path for all the forges we need to archive... that will be difficult in terms of infrastructure and maintenance.
Jul 25 2018, 10:28 AM · Developers, Origin-Pypi
zack added a comment to T420: mirror PyPI.

better LWN link to the actual article covering this: https://lwn.net/Articles/751458/

Jul 25 2018, 10:25 AM · Origin-Pypi
zack added a comment to T420: mirror PyPI.

Looking at the faq [4], they also (now?) recommend bandersnatch. Quoting it:

Jul 25 2018, 10:25 AM · Origin-Pypi

Jul 24 2018

ardumont added a comment to T422: PyPI lister.

They have multiple apis:

  • basic json one [1] which permits to request information on a per project basis (no listing) [1] (~> foresee the use of this one for the loader)
  • xmlrpc deprecated one [2] (this one lists ~> that would be for the lister use)
  • html page (listing all packages)
  • rss feed (update events)
Jul 24 2018, 4:25 PM · Developers, Origin-Pypi
ardumont added a comment to T420: mirror PyPI.

> PyPI is easy to mirror [1], and a network of public mirrors [2] already exist. The Python package bandersnatch [3] automate the task of setting up an initial mirror and keeping it up to date.

Jul 24 2018, 2:44 PM · Origin-Pypi

Jun 19 2018

zack edited projects for T419: ingest PyPI into the Software Heritage archive (meta task), added: Archive coverage; removed Archive content.
Jun 19 2018, 3:29 PM · Archive coverage, Origin-Pypi

Apr 7 2017

zack added a project to T419: ingest PyPI into the Software Heritage archive (meta task): Archive content.
Apr 7 2017, 11:06 AM · Archive coverage, Origin-Pypi

Feb 10 2017

zack renamed T419: ingest PyPI into the Software Heritage archive (meta task) from add PyPI to Software Heritage to ingest PyPI into the Software Heritage archive (meta task).
Feb 10 2017, 8:49 AM · Archive coverage, Origin-Pypi

Jul 28 2016

xenon removed a watcher for Origin-Pypi: xenon.
Jul 28 2016, 11:05 PM
xenon added a watcher for Origin-Pypi: xenon.
Jul 28 2016, 11:05 PM

May 26 2016

zack added a comment to T420: mirror PyPI.

A full mirror of PyPI, took 1 day ago, is currently available on the Debsources machine under /srv/pypi . Ask @zack if you don't have access and would like to have a look at its structure.

May 26 2016, 9:29 AM · Origin-Pypi
zack created T422: PyPI lister.
May 26 2016, 9:28 AM · Developers, Origin-Pypi
zack created T421: PyPI loader.
May 26 2016, 9:27 AM · PyPI loader, Origin-Pypi
zack created T420: mirror PyPI.
May 26 2016, 9:26 AM · Origin-Pypi
zack created T419: ingest PyPI into the Software Heritage archive (meta task).
May 26 2016, 9:21 AM · Archive coverage, Origin-Pypi
zack created Origin-Pypi.
May 26 2016, 9:19 AM