Page MenuHomeSoftware Heritage

ingest Google Code Subversion repositories
Closed, ResolvedPublic

Description

Note:

  • map the origin url with the old googlecode ones. Scheme is http://<project-name>.googlecode.com/svn/
  • we keep the loader-svn's current state (revision hash divergence check detection or svn:external triggers an error which logs an error and stops the loading).

Related Objects

StatusAssignedTask
OpenNone
Resolvedzack
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
ResolvedNone
ResolvedNone
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
OpenNone
ResolvedNone
Work in ProgressNone
Resolvedardumont
Invalidardumont
OpenNone
OpenNone
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
OpenNone

Event Timeline

Currently running on swh's internal infrastructure workers.

ardumont changed the task status from Open to Work in Progress.Jan 11 2017, 12:33 PM
zack renamed this task from Ingest googlecode's svn dump repositories to ingest Google Code Subversion repositories.Feb 12 2017, 6:15 PM
zack added a project: Restricted Project.
zack moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 12 2017, 6:37 PM
zack lowered the priority of this task from High to Normal.Feb 14 2017, 9:52 AM

Command used to trigger the production of tasks:

cat INDEX.shuffle.svndump | ./bin/list-svndump-urls | SWH_WORKER_INSTANCE=swh_loader_svn python3 -u -m swh.loader.svn.producer svn-archive --visit-date 'Tue, 3 May 2016 17:16:32 +0200'

where:

zack removed projects: Restricted Project, SVN Loader.Apr 5 2017, 2:04 PM

An update on this, this is still work in progress.

status

~168.5k repositories to ingest out of 575k repositories.

This is already scheduled in the loader-svn queue.
This is in stand-by (cf. below).

issues

As we hit regularly the following issue:

  1. out of ram -> 2. worker killed without possibilities to run the cleaning step -> 3. out of disk issues -> 4. at least one worker idle (the one without space disk) which consumes without doing anything useful the remaining queued jobs.

Note: Important implementation detail, loader-svn works on disk.

workaround in progress

For now, @olasd and I worked on provisioning vms on beaubourg (almost there) to make those disk workers (git-disk + svn) run there.

The hypervisor beaubourg being not used as much as it could and louvre quite the opposite.

This is a workaround for now as other tasks have higher priorities (i'll mention them back when they exist :)

It got restarted 2 weeks ago (Monday 18th September 2017).
It just finished (Monday 2nd October 2017).

T676 now kicks in.

Reopened since a subtask (or child task) is still opened (T676).

09:41:46      +zack | (as a rule of thumb, parent tasks should not be closed if there are still child taks open, but there might be exceptions)
zack claimed this task.