Page MenuHomeSoftware Heritage

DevelopersGroup
ActivePublic

Recent Activity

May 28 2020

ardumont removed a member for Developers: aeviso.
May 28 2020, 9:52 AM
ardumont added a member for Developers: aeviso.
May 28 2020, 9:39 AM

Oct 18 2018

ardumont added a project to T1279: swh-journal: The migration problem: Journal.
Oct 18 2018, 4:27 PM · Journal
ardumont added a project to T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage: Journal.
Oct 18 2018, 4:15 PM · Journal
ardumont triaged T1279: swh-journal: The migration problem as Normal priority.
Oct 18 2018, 4:07 PM · Journal
ardumont triaged T1278: swh-journal: the monitoring tool question! as Normal priority.
Oct 18 2018, 4:04 PM · Journal
ardumont triaged T1277: swh-journal: Create a journal client for listing origin visits as Normal priority.
Oct 18 2018, 4:03 PM · Indexer, Journal
ardumont closed T527: Insert newly created objects in the journal, a subtask of T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage, as Resolved.
Oct 18 2018, 3:48 PM · Journal
ardumont claimed T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage.
Oct 18 2018, 3:46 PM · Journal
ardumont claimed T1276: swh-journal: Add tests.
Oct 18 2018, 3:46 PM · Continuous Integration, Journal
ardumont triaged T1276: swh-journal: Add tests as Normal priority.
Oct 18 2018, 3:45 PM · Continuous Integration, Journal
ardumont triaged T1275: swh-journal: Complete missing snapshot insertion event from storage to journal as Normal priority.
Oct 18 2018, 3:40 PM · Storage manager, Journal
ardumont renamed T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage from persistent journal infrastructure to record additions to the swh-storage to swh-journal: persistent journal infrastructure to record additions to the swh-storage.
Oct 18 2018, 9:37 AM · Journal

Oct 5 2018

ardumont renamed T439: Indexers: compute (and maintain up-to-date) the filetype of all blobs from compute (and maintain up-to-date) the filetype of all blobs to Indexers: compute (and maintain up-to-date) the filetype of all blobs.
Oct 5 2018, 2:47 PM · Indexer

Oct 1 2018

ardumont added a member for Developers: vlorentz.
Oct 1 2018, 12:11 PM

Sep 6 2018

ardumont closed T422: PyPI lister as Resolved.
Sep 6 2018, 5:31 PM · Developers, Origin-Pypi

Aug 27 2018

ardumont added a comment to T421: PyPI loader.

Then again, i'll check the pypi api's documentation. Hopefully, it's explained somewhere ;)

Aug 27 2018, 9:50 AM · PyPI loader, Origin-Pypi

Aug 23 2018

olasd added a comment to T421: PyPI loader.
In T421#21696, @zack wrote:

Still, we should probably have a "master" branch, to ease navigation, shouldn't we? (What do we do for Debian packages on this?)

Aug 23 2018, 3:47 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

So, having one branch in the snapshot per distribution format (tar/zip/etc.) is a nice and clean way of handling this.

Aug 23 2018, 3:10 PM · PyPI loader, Origin-Pypi
zack added a comment to T421: PyPI loader.
In T421#21693, @olasd wrote:
  • Unpack all the sdist formats
  • If things are well, the contents are identical. In that case, the revision objects would end up with the same id; we can ignore that there ever was multiple formats, and just have a single branch pointing to a single revision for that version of the package in the snapshot
  • If the contents are different, load both and make the snapshot have a branch pointing to each format.
Aug 23 2018, 9:40 AM · PyPI loader, Origin-Pypi

Aug 22 2018

ardumont added a comment to T421: PyPI loader.

The Debian loader doesn't create release objects. Our data model doesn't allow to attach arbitrary structured metadata to release objects (as Git doesn't either), so we've shortcut this level of indirection.

Aug 22 2018, 4:16 PM · PyPI loader, Origin-Pypi

Aug 21 2018

olasd added a comment to T421: PyPI loader.
In T421#21639, @zack wrote:

The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:

  1. the stack of objects to be added to the DAG
  2. the metadata to extract

For (1), I think what we currently do for Debian packages is as you said, i.e., snapshot -> release -> revision -> tarball root dir. Maybe you can check for comparison (or @olasd can chime in?). We should do the same here.

Aug 21 2018, 6:28 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

There remains 3 actions to do for the current implementation to be complete:

Aug 21 2018, 4:41 PM · PyPI loader, Origin-Pypi

Aug 2 2018

ardumont added a comment to T421: PyPI loader.

As far as I can tell from those examples, the metadata that PyPI gives you are the most recent ones, probably the ones extracted from the most recent version, so it would be incorrect to associate them to other releases.

Aug 2 2018, 3:56 PM · PyPI loader, Origin-Pypi
zack added a comment to T421: PyPI loader.

The pypi api provides already quite the information (P288, P289 for examples).
For now, the current implementation leverages it.

Aug 2 2018, 3:45 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

For (1), I think what we currently do for Debian packages is as you said, i.e., snapshot -> release -> revision -> tarball root dir. Maybe you can check for comparison (or @olasd can chime in?). We should do the same here.

Aug 2 2018, 3:31 PM · PyPI loader, Origin-Pypi
zack updated subscribers of T421: PyPI loader.

The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:

  1. the stack of objects to be added to the DAG
  2. the metadata to extract
Aug 2 2018, 2:37 PM · PyPI loader, Origin-Pypi

Aug 1 2018

ardumont changed the status of T422: PyPI lister from Open to Work in Progress.
Aug 1 2018, 3:10 PM · Developers, Origin-Pypi
ardumont changed the status of T421: PyPI loader from Open to Work in Progress.
Aug 1 2018, 3:10 PM · PyPI loader, Origin-Pypi
ardumont added a comment to T421: PyPI loader.

capable of extracting upstream metadata that are meaningful (and specific to) PyPI.

Aug 1 2018, 3:10 PM · PyPI loader, Origin-Pypi

Jul 25 2018

zack added a comment to T422: PyPI lister.

If your consumer is actually an organization or service that will be downloading a lot of packages from PyPI, consider using your own index mirror or cache.

That's not a sustainable way. If we choose that path for all the forges we need to archive... that will be difficult in terms of infrastructure and maintenance.
Jul 25 2018, 10:28 AM · Developers, Origin-Pypi

Jul 24 2018

ardumont added a comment to T422: PyPI lister.

They have multiple apis:

  • basic json one [1] which permits to request information on a per project basis (no listing) [1] (~> foresee the use of this one for the loader)
  • xmlrpc deprecated one [2] (this one lists ~> that would be for the lister use)
  • html page (listing all packages)
  • rss feed (update events)
Jul 24 2018, 4:25 PM · Developers, Origin-Pypi

Jul 10 2018

ardumont added a member for Developers: douardda.
Jul 10 2018, 2:38 PM

Sep 14 2017

zack removed a member for Developers: sushant5.
Sep 14 2017, 10:40 AM

Sep 7 2017

zack added a member for Developers: ftigeot.
Sep 7 2017, 11:41 AM

Jul 3 2017

zack added a member for Developers: anlambert.
Jul 3 2017, 10:14 AM

May 15 2017

ardumont added a member for Developers: sushant5.
May 15 2017, 2:27 PM

Feb 14 2017

zack added a member for Developers: moranegg.
Feb 14 2017, 11:00 AM

Feb 8 2017

zack added a member for Developers: fiendish.
Feb 8 2017, 11:46 AM

Feb 7 2017

zack added a member for Developers: seirl.
Feb 7 2017, 3:41 PM

Jan 24 2017

zack added a member for Developers: rdicosmo.
Jan 24 2017, 12:17 PM
zack removed a member for Developers: qcampos.
Jan 24 2017, 12:16 PM
zack removed a member for Developers: jbertran.
Jan 24 2017, 12:16 PM

Aug 23 2016

olasd changed the status of T527: Insert newly created objects in the journal, a subtask of T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage, from Open to Work in Progress.
Aug 23 2016, 6:14 PM · Journal

Aug 19 2016

olasd closed T526: Add notifications support to swh.storage, a subtask of T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage, as Resolved.
Aug 19 2016, 3:56 PM · Journal

Aug 16 2016

olasd removed a parent task for T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage: T359: Indexers: batch content analyzer infrastructure.
Aug 16 2016, 6:35 PM · Journal
olasd added a comment to T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage.

We have no guarantee that the internal object ids are monotonic: concurrent transactions can make object_ids of objects go backwards.

Aug 16 2016, 5:54 PM · Journal
olasd closed T525: Allow bulk-listing of objects by content-id, a subtask of T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage, as Wontfix.
Aug 16 2016, 12:35 PM · Journal

Aug 9 2016

qcampos added a subtask for T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage: T494: swh-journal: archiver-client: Keep archiver table in sync with new contents.
Aug 9 2016, 5:47 PM · Journal

Jul 26 2016

olasd closed T507: document licensing of kafka and related client modules as Resolved.

Added the kafka-related products to the licensing page. Pretty much everything in the ecosystem is Apache2-licensed.

Jul 26 2016, 2:43 PM · Developers