Page MenuHomeSoftware Heritage

backup: object storage — 2nd copy after first large batch import
Closed, ResolvedPublic

Description

see T239 and T240 for the regular/periodic backups of the object store

Event Timeline

zack created this task.Sep 14 2015, 4:55 PM
zack raised the priority of this task from to Low.
zack updated the task description. (Show Details)
zack added a project: System administrators.
zack moved this task to Backlog on the System administrators board.
zack added a project: Staff.
zack renamed this task from backup: object storage (AKA file content) to backup: object storage — 2nd copy after first large batch import.Jan 18 2016, 3:25 PM
zack claimed this task.
zack updated the task description. (Show Details)
zack added a comment.Jan 18 2016, 4:22 PM

Now that the first batch import (github + snapshot.debian.org + gnu.org) is done and we won't be importing other sources for a while, a full object store backup from uffizi to banco has now started.

The backup is rather bare bone, using the following script:

#!/bin/bash

SRCDIR="/srv/softwareheritage/objects"
DESTHOST="swhstorage@backup.softwareheritage.org"

if [ -z "$1" ] ; then
    echo "Usage: $0 OBJ_FIRST_DIGIT [TAR_OPTIONS]"
    echo "E.g.:  $0 c"
    exit 1
fi
digit="$1"
shift 1

echo "* `date -R` considering objects starting with ${digit}"
for srcdir in ${SRCDIR}/${digit}* ; do
    test -d $srcdir || continue
    echo "* `date -R` sending $srcdir over"
    (cd $srcdir && tar caf - "$@" .) | ssh $DESTHOST "cd $srcdir && tar xaf -"
done

executed in parallel 16 times, one for each 0-9a-f digit.

The processes are running in a script session of my user on uffizi.

Note that this means no integrity check is being done on individual objects, we will need to do that later on (and we need to do that periodically on all object copies anyhow).

zack changed the task status from Open to Work in Progress.Jan 18 2016, 4:23 PM
zack changed the task status from Work in Progress to Open.Jan 21 2016, 10:16 PM

This is back on hold now, as we discovered that the read performances on uffizi from the object store are not as good as they should.

zack changed the task status from Open to Work in Progress.Jan 22 2016, 6:47 PM
zack reassigned this task from zack to olasd.
zack added a subscriber: zack.

By looking at bonnie++ output and doing some math, we have concluded that transfer slowness is essentially dominated by seek time.

We have therefore switched to a different "backup" strategy: *cough* | dd if=/dev/* | nc | *cough*. By transferring 4 (out of 16) shards of objects we saturated the 1Gb link we have between the two machines.

olasd started the first 4 dd in a screen session.

olasd closed this task as Resolved.Feb 4 2016, 2:38 PM

We now have a backup of all the contents that were stored on uffizi at the end of our first batch import.

olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:04 PM