Page MenuHomeSoftware Heritage

Origin-PypiTag
ActivePublic

Members

  • This project does not have any members.

Watchers

  • This project does not have any watchers.

Recent Activity

Tue, Nov 27

anlambert triaged T1389: Implement a base loader for package managers as Wishlist priority.
Tue, Nov 27, 12:23 PM · Origin-npm, Origin-Pypi, Archive coverage

Oct 22 2018

ardumont updated the task description for T1246: pypi loader: Analyze existing errors.
Oct 22 2018, 10:24 AM · Archive coverage, Origin-Pypi

Oct 18 2018

ardumont added a comment to T1246: pypi loader: Analyze existing errors.

Ok, so reworked the group_by_exception snippet to have a more sensible output:

Oct 18 2018, 11:27 AM · Archive coverage, Origin-Pypi
ardumont updated the title for P319 reschedule.pypi.csv - first origins with missing author resulted in T1246 issue (fixed since) from reschedule.pypi.csv - origin with missing author resulted in issue (fixed since) to reschedule.pypi.csv - first origins with missing author resulted in T1246 issue (fixed since).
Oct 18 2018, 9:34 AM · Origin-Pypi, PyPI loader
ardumont updated the title for P319 reschedule.pypi.csv - first origins with missing author resulted in T1246 issue (fixed since) from reschedule.pypi.csv to reschedule.pypi.csv - origin with missing author resulted in issue (fixed since).
Oct 18 2018, 9:33 AM · Origin-Pypi, PyPI loader

Oct 17 2018

ardumont added a comment to T1246: pypi loader: Analyze existing errors.

In any case, for now, like i said in [2], we will first schedule back
those 1409 origins in error.

Oct 17 2018, 4:22 PM · Archive coverage, Origin-Pypi
ardumont created P319 reschedule.pypi.csv - first origins with missing author resulted in T1246 issue (fixed since).
Oct 17 2018, 4:21 PM · Origin-Pypi, PyPI loader

Oct 16 2018

ardumont added a comment to T1246: pypi loader: Analyze existing errors.

Here is the pypi report about the loading errors.

Oct 16 2018, 2:03 PM · Archive coverage, Origin-Pypi

Oct 5 2018

ardumont renamed T1246: pypi loader: Analyze existing errors from Analyze pypi errors to pypi loader: Analyze existing errors.
Oct 5 2018, 6:31 PM · Archive coverage, Origin-Pypi
ardumont added a comment to T1246: pypi loader: Analyze existing errors.

kibana dashboard will help in that matters (P311 because it's noisy).

Oct 5 2018, 6:30 PM · Archive coverage, Origin-Pypi
ardumont triaged T1246: pypi loader: Analyze existing errors as Normal priority.
Oct 5 2018, 6:28 PM · Archive coverage, Origin-Pypi

Sep 21 2018

ardumont closed T421: PyPI loader as Resolved.
Sep 21 2018, 6:35 PM · PyPI loader, Origin-Pypi
ardumont closed T421: PyPI loader, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), as Resolved.
Sep 21 2018, 6:35 PM · Archive coverage, Origin-Pypi

Sep 20 2018

ardumont closed T1181: pypi: Schedule ingestion, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), as Resolved.
Sep 20 2018, 11:17 AM · Archive coverage, Origin-Pypi
ardumont closed T1181: pypi: Schedule ingestion as Resolved.
Sep 20 2018, 11:17 AM · Archive coverage, Origin-Pypi
ardumont added a comment to T1181: pypi: Schedule ingestion.

Now, it's scheduled. Just need to wait for the swh-scheduler-runner.service to finish its loop on task_types.

Sep 20 2018, 9:52 AM · Archive coverage, Origin-Pypi
ardumont added a comment to T1181: pypi: Schedule ingestion.
swhscheduler@saatchi:~$ python3 -m swh.scheduler.cli task list-pending -t swh-lister-pypi
Found 1 tasks
Sep 20 2018, 9:48 AM · Archive coverage, Origin-Pypi
ardumont updated the task description for T1181: pypi: Schedule ingestion.
Sep 20 2018, 9:47 AM · Archive coverage, Origin-Pypi
ardumont added a comment to T1181: pypi: Schedule ingestion.

Schedule the lister-pypi:

Sep 20 2018, 9:47 AM · Archive coverage, Origin-Pypi

Sep 19 2018

ardumont changed the status of T1181: pypi: Schedule ingestion, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Open to Work in Progress.
Sep 19 2018, 7:52 PM · Archive coverage, Origin-Pypi
ardumont changed the status of T1181: pypi: Schedule ingestion from Open to Work in Progress.
Sep 19 2018, 7:52 PM · Archive coverage, Origin-Pypi

Sep 6 2018

ardumont updated the task description for T1181: pypi: Schedule ingestion.
Sep 6 2018, 5:38 PM · Archive coverage, Origin-Pypi
ardumont renamed T1181: pypi: Schedule ingestion from pypi: Trigger listing task to pypi: Schedule ingestion.
Sep 6 2018, 5:37 PM · Archive coverage, Origin-Pypi
ardumont triaged T1181: pypi: Schedule ingestion as Normal priority.
Sep 6 2018, 5:31 PM · Archive coverage, Origin-Pypi
ardumont closed T422: PyPI lister, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), as Resolved.
Sep 6 2018, 5:31 PM · Archive coverage, Origin-Pypi
ardumont closed T422: PyPI lister as Resolved.
Sep 6 2018, 5:31 PM · Developers, Origin-Pypi

Sep 4 2018

zack edited projects for T421: PyPI loader, added: PyPI loader; removed Developers.
Sep 4 2018, 10:30 AM · PyPI loader, Origin-Pypi

Aug 27 2018

ardumont added a comment to T421: PyPI loader.

Then again, i'll check the pypi api's documentation. Hopefully, it's explained somewhere ;)

Aug 27 2018, 9:50 AM · PyPI loader, Origin-Pypi

Aug 23 2018

olasd added a comment to T421: PyPI loader.
In T421#21696, @zack wrote:

Still, we should probably have a "master" branch, to ease navigation, shouldn't we? (What do we do for Debian packages on this?)

Aug 23 2018, 3:47 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

So, having one branch in the snapshot per distribution format (tar/zip/etc.) is a nice and clean way of handling this.

Aug 23 2018, 3:10 PM · PyPI loader, Origin-Pypi
zack added a comment to T421: PyPI loader.
In T421#21693, @olasd wrote:
  • Unpack all the sdist formats
  • If things are well, the contents are identical. In that case, the revision objects would end up with the same id; we can ignore that there ever was multiple formats, and just have a single branch pointing to a single revision for that version of the package in the snapshot
  • If the contents are different, load both and make the snapshot have a branch pointing to each format.
Aug 23 2018, 9:40 AM · PyPI loader, Origin-Pypi

Aug 22 2018

ardumont added a comment to T421: PyPI loader.

The Debian loader doesn't create release objects. Our data model doesn't allow to attach arbitrary structured metadata to release objects (as Git doesn't either), so we've shortcut this level of indirection.

Aug 22 2018, 4:16 PM · PyPI loader, Origin-Pypi

Aug 21 2018

olasd added a comment to T421: PyPI loader.
In T421#21639, @zack wrote:

The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:

  1. the stack of objects to be added to the DAG
  2. the metadata to extract

    For (1), I think what we currently do for Debian packages is as you said, i.e., snapshot -> release -> revision -> tarball root dir. Maybe you can check for comparison (or @olasd can chime in?). We should do the same here.
Aug 21 2018, 6:28 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

There remains 3 actions to do for the current implementation to be complete:

Aug 21 2018, 4:41 PM · PyPI loader, Origin-Pypi

Aug 2 2018

ardumont added a comment to T421: PyPI loader.

As far as I can tell from those examples, the metadata that PyPI gives you are the most recent ones, probably the ones extracted from the most recent version, so it would be incorrect to associate them to other releases.

Aug 2 2018, 3:56 PM · PyPI loader, Origin-Pypi
zack added a comment to T421: PyPI loader.

The pypi api provides already quite the information (P288, P289 for examples).
For now, the current implementation leverages it.

Aug 2 2018, 3:45 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

For (1), I think what we currently do for Debian packages is as you said, i.e., snapshot -> release -> revision -> tarball root dir. Maybe you can check for comparison (or @olasd can chime in?). We should do the same here.

Aug 2 2018, 3:31 PM · PyPI loader, Origin-Pypi
zack updated subscribers of T421: PyPI loader.

The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:

  1. the stack of objects to be added to the DAG
  2. the metadata to extract
Aug 2 2018, 2:37 PM · PyPI loader, Origin-Pypi

Aug 1 2018

ardumont changed the status of T422: PyPI lister, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Open to Work in Progress.
Aug 1 2018, 3:10 PM · Archive coverage, Origin-Pypi
ardumont changed the status of T422: PyPI lister from Open to Work in Progress.
Aug 1 2018, 3:10 PM · Developers, Origin-Pypi
ardumont changed the status of T421: PyPI loader from Open to Work in Progress.
Aug 1 2018, 3:10 PM · PyPI loader, Origin-Pypi
ardumont changed the status of T421: PyPI loader, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), from Open to Work in Progress.
Aug 1 2018, 3:10 PM · Archive coverage, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

capable of extracting upstream metadata that are meaningful (and specific to) PyPI.

Aug 1 2018, 3:10 PM · PyPI loader, Origin-Pypi

Jul 26 2018

ardumont closed T420: mirror PyPI, a subtask of T419: ingest PyPI into the Software Heritage archive (meta task), as Wontfix.
Jul 26 2018, 3:33 PM · Archive coverage, Origin-Pypi
ardumont closed T420: mirror PyPI as Wontfix.

As per comment [1], closing this as we will not implement mirroring.

Jul 26 2018, 3:33 PM · Origin-Pypi
ardumont added a comment to T420: mirror PyPI.

the out-of-dated-ness is most likely due to the fact that, since I opened this task 2 years ago...

Jul 26 2018, 12:02 PM · Origin-Pypi

Jul 25 2018

zack added a comment to T422: PyPI lister.

If your consumer is actually an organization or service that will be downloading a lot of packages from PyPI, consider using your own index mirror or cache.

That's not a sustainable way. If we choose that path for all the forges we need to archive... that will be difficult in terms of infrastructure and maintenance.
Jul 25 2018, 10:28 AM · Developers, Origin-Pypi
zack added a comment to T420: mirror PyPI.

better LWN link to the actual article covering this: https://lwn.net/Articles/751458/

Jul 25 2018, 10:25 AM · Origin-Pypi
zack added a comment to T420: mirror PyPI.

Looking at the faq [4], they also (now?) recommend bandersnatch. Quoting it:

Jul 25 2018, 10:25 AM · Origin-Pypi

Jul 24 2018

ardumont added a comment to T422: PyPI lister.

They have multiple apis:

  • basic json one [1] which permits to request information on a per project basis (no listing) [1] (~> foresee the use of this one for the loader)
  • xmlrpc deprecated one [2] (this one lists ~> that would be for the lister use)
  • html page (listing all packages)
  • rss feed (update events)
Jul 24 2018, 4:25 PM · Developers, Origin-Pypi