Page MenuHomeSoftware Heritage

ingest Google Code Subversion repositories
Open, NormalPublic

Description

Note:

  • map the origin url with the old googlecode ones. Scheme is http://<project-name>.googlecode.com/svn/
  • we keep the loader-svn's current state (revision hash divergence check detection or svn:external triggers an error which logs an error and stops the loading).

Related Objects

StatusAssignedTask
OpenNone
Openardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
ResolvedNone
ResolvedNone
Openardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
OpenNone
ResolvedNone
Work in Progressardumont
Resolvedardumont
Invalidardumont
OpenNone
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Resolvedardumont
Openardumont

Event Timeline

ardumont created this task.Jan 9 2017, 5:07 PM
ardumont updated the task description. (Show Details)Jan 10 2017, 10:18 AM

Currently running on swh's internal infrastructure workers.

ardumont changed the task status from Open to Work in Progress.Jan 11 2017, 12:33 PM
zack assigned this task to ardumont.Feb 10 2017, 12:29 PM
zack renamed this task from Ingest googlecode's svn dump repositories to ingest Google Code Subversion repositories.Feb 12 2017, 6:15 PM
zack added a project: Restricted Project.
zack moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Feb 12 2017, 6:37 PM
zack lowered the priority of this task from High to Normal.Feb 14 2017, 9:52 AM
ardumont added a comment.EditedFeb 15 2017, 12:42 PM

Command used to trigger the production of tasks:

cat INDEX.shuffle.svndump | ./bin/list-svndump-urls | SWH_WORKER_INSTANCE=swh_loader_svn python3 -u -m swh.loader.svn.producer svn-archive --visit-date 'Tue, 3 May 2016 17:16:32 +0200'

where:

zack removed projects: Restricted Project, SVN Loader.Apr 5 2017, 2:04 PM
ardumont added a subscriber: olasd.EditedApr 26 2017, 10:16 AM

An update on this, this is still work in progress.

status

~168.5k repositories to ingest out of 575k repositories.

This is already scheduled in the loader-svn queue.
This is in stand-by (cf. below).

issues

As we hit regularly the following issue:

  1. out of ram -> 2. worker killed without possibilities to run the cleaning step -> 3. out of disk issues -> 4. at least one worker idle (the one without space disk) which consumes without doing anything useful the remaining queued jobs.

Note: Important implementation detail, loader-svn works on disk.

workaround in progress

For now, @olasd and I worked on provisioning vms on beaubourg (almost there) to make those disk workers (git-disk + svn) run there.

The hypervisor beaubourg being not used as much as it could and louvre quite the opposite.

This is a workaround for now as other tasks have higher priorities (i'll mention them back when they exist :)

It got restarted 2 weeks ago (Monday 18th September 2017).
It just finished (Monday 2nd October 2017).

T676 now kicks in.

ardumont closed this task as Resolved.Oct 2 2017, 4:22 PM
ardumont reopened this task as Open.Oct 3 2017, 9:49 AM

Reopened since a subtask (or child task) is still opened (T676).

09:41:46      +zack | (as a rule of thumb, parent tasks should not be closed if there are still child taks open, but there might be exceptions)
ardumont changed the status of subtask T879: Reschedule googlecode svn origins from scratch from Open to Work in Progress.Dec 11 2017, 11:03 AM