(periodically) ingest GNU package releases
We have done only a one-off ingestion of GNU package releases (from back in 2015.
We should periodically ingest new GNU package releases, automatic the listing process.

ftp.gnu is now available only via HTTP (and no longer via FTP), but an up-to-date directory listing is available at (thanks Ludovic Courtès for the heads up on this).

zack created this task.Nov 16 2018, 12:08 PM
zack triaged this task as Normal priority.
zack added a project: Archive coverage.
zack renamed this task from periodically ingest GNU package releases to (periodically) ingest GNU package releases.

This should probably be split in 2 tasks:

  • implement a lister to create gnu origins in the scheduler (we most probably have all the necessary code to do that in the swh-lister repository).
  • adapt the loader-tar to be able to retrieve remote tarballs (it works on local tarball) (~> it'd be the occasion to refactor that loader as well ;)
ardumont added a comment.EditedTue, Mar 12, 6:53 PM

@pombreda on #swh-devel suggested to use rsync -r which seems to
provide what we want!

18:36 <pombreda> ardumont, stupid suggestion wrt gnu code: have considered using rsync -r rsync:// ?
18:37 <pombreda> ardumont: to get a directory listing


$ rsync -r rsync:// > full-listing-gnu.txt
$ tail full-listing-gnu.txt
-rw-r--r--      1,259,220 2012/02/09 00:49:44 zile/zile-2.4.5.tar.gz
-rw-r--r--            190 2012/02/09 00:49:44 zile/zile-2.4.5.tar.gz.sig
-rw-r--r--      1,257,698 2012/02/18 16:34:58 zile/zile-2.4.6.tar.gz
-rw-r--r--            190 2012/02/18 16:34:59 zile/zile-2.4.6.tar.gz.sig
-rw-r--r--      1,254,385 2012/03/20 21:19:44 zile/zile-2.4.7.tar.gz
-rw-r--r--            190 2012/03/20 21:19:45 zile/zile-2.4.7.tar.gz.sig
-rw-r--r--      1,184,855 2012/07/13 13:15:48 zile/zile-2.4.8.tar.gz
-rw-r--r--            190 2012/07/13 13:15:49 zile/zile-2.4.8.tar.gz.sig
-rw-r--r--      1,192,776 2012/10/01 23:08:02 zile/zile-2.4.9.tar.gz
-rw-r--r--            190 2012/10/01 23:08:03 zile/zile-2.4.9.tar.gz.sig

