@douardda We have a separate task(T1777) for rubygem lister. I guess we can add a separate column in the table by @anlambert showing status of the lister implementation and the tasks related to it.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sep 13 2019
Sep 12 2019
In T1389#27957, @anlambert wrote:
Package manager Packages listing url Package metadata url Package source tarball url Rubygems No public api endpoint available https://rubygems.org/api/v2/rubygems/[package]/versions/[version].json available in the medata
Sep 11 2019
how many are left?
Jul 18 2019
Jul 14 2019
Jul 8 2019
Jul 7 2019
Jul 2 2019
Extending over the plan by @olasd. Here are some of my thoughts on the implementation of base loader.
Jun 17 2019
In T1389#33215, @zack wrote:Thanks @olasd, @ardumont, and @anlambert for this, it's a great plan and I like it a lot !
Just a few comments on the sidelines:
The lister will generate a one-shot task to load each package for the given repository, with the full information needed to do the data fetching.
This seemed clear from a different part of the description, but fwiw here I'm assuming the plan is to only load the version of the packages not already known/ingested in the past.
Jun 12 2019
Thanks @olasd, @ardumont, and @anlambert for this, it's a great plan and I like it a lot !
May 25 2019
how many are left? can we close this as well as T419 now that the PyPI listers/loaders have been in production for a while?
May 20 2019
We've discussed a plausible plan for a "base package manager loader" with @ardumont and, to some extent, @anlambert.
Feb 7 2019
The table below summarizes how to list all packages and get their metadata from well-known package managers.
Feb 5 2019
Dec 18 2018
Nov 27 2018
Oct 22 2018
Oct 18 2018
Ok, so reworked the group_by_exception snippet to have a more sensible output:
Oct 17 2018
In any case, for now, like i said in [2], we will first schedule back
those 1409 origins in error.
Oct 16 2018
Here is the pypi report about the loading errors.
Oct 5 2018
kibana dashboard will help in that matters (P311 because it's noisy).
Sep 21 2018
Sep 20 2018
Now, it's scheduled. Just need to wait for the swh-scheduler-runner.service to finish its loop on task_types.
swhscheduler@saatchi:~$ python3 -m swh.scheduler.cli task list-pending -t swh-lister-pypi Found 1 tasks
Schedule the lister-pypi:
Sep 19 2018
Sep 6 2018
Sep 4 2018
Aug 27 2018
Then again, i'll check the pypi api's documentation. Hopefully, it's explained somewhere ;)
Aug 23 2018
In T421#21696, @zack wrote:Still, we should probably have a "master" branch, to ease navigation, shouldn't we? (What do we do for Debian packages on this?)
So, having one branch in the snapshot per distribution format (tar/zip/etc.) is a nice and clean way of handling this.
In T421#21693, @olasd wrote:
- Unpack all the sdist formats
- If things are well, the contents are identical. In that case, the revision objects would end up with the same id; we can ignore that there ever was multiple formats, and just have a single branch pointing to a single revision for that version of the package in the snapshot
- If the contents are different, load both and make the snapshot have a branch pointing to each format.
Aug 22 2018
The Debian loader doesn't create release objects. Our data model doesn't allow to attach arbitrary structured metadata to release objects (as Git doesn't either), so we've shortcut this level of indirection.
Aug 21 2018
In T421#21639, @zack wrote:The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:
- the stack of objects to be added to the DAG
- the metadata to extract
For (1), I think what we currently do for Debian packages is as you said, i.e., snapshot -> release -> revision -> tarball root dir. Maybe you can check for comparison (or @olasd can chime in?). We should do the same here.
There remains 3 actions to do for the current implementation to be complete:
Aug 2 2018
As far as I can tell from those examples, the metadata that PyPI gives you are the most recent ones, probably the ones extracted from the most recent version, so it would be incorrect to associate them to other releases.
In T421#21642, @ardumont wrote:
For (1), I think what we currently do for Debian packages is as you said, i.e., snapshot -> release -> revision -> tarball root dir. Maybe you can check for comparison (or @olasd can chime in?). We should do the same here.
The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:
- the stack of objects to be added to the DAG
- the metadata to extract
Aug 1 2018
capable of extracting upstream metadata that are meaningful (and specific to) PyPI.
Jul 26 2018
As per comment [1], closing this as we will not implement mirroring.
the out-of-dated-ness is most likely due to the fact that, since I opened this task 2 years ago...
Jul 25 2018
In T422#21473, @ardumont wrote:If your consumer is actually an organization or service that will be downloading a lot of packages from PyPI, consider using your own index mirror or cache.
That's not a sustainable way. If we choose that path for all the forges we need to archive... that will be difficult in terms of infrastructure and maintenance.
better LWN link to the actual article covering this: https://lwn.net/Articles/751458/
In T420#21471, @ardumont wrote:Looking at the faq [4], they also (now?) recommend bandersnatch. Quoting it:
Jul 24 2018
They have multiple apis:
- basic json one [1] which permits to request information on a per project basis (no listing) [1] (~> foresee the use of this one for the loader)
- xmlrpc deprecated one [2] (this one lists ~> that would be for the lister use)
- html page (listing all packages)
- rss feed (update events)
> PyPI is easy to mirror [1], and a network of public mirrors [2] already exist. The Python package bandersnatch [3] automate the task of setting up an initial mirror and keeping it up to date.
Jun 19 2018
Apr 7 2017
Feb 10 2017
Jul 28 2016
May 26 2016
A full mirror of PyPI, took 1 day ago, is currently available on the Debsources machine under /srv/pypi . Ask @zack if you don't have access and would like to have a look at its structure.