Page MenuHomeSoftware Heritage

Investigate mass copy of data to azure
Closed, MigratedEdits Locked

Description

We want to copy all our contents to azure.

Reading contents one by one on the main storage will take a long time. We need to investigate the best way of doing a bulk copy to azure.

To be able to mount a full partition on azure, we need to create a virtual machine with enough data disks (11 1023GB data disks), then make a RAID0 over the data disks to create a 10+TB block device.

We then need to transfer the data, for instance using dd.

Event Timeline

Azure virtual machines are i/o limited in two ways:

  • total bandwidth
  • total iops

(tests made with a DS3_v2 instance, over 8 data disks, maxing out 12,800 iops / 192 mbps)

We need to tune the raid settings and the dd commands so that we max out the disk bandwidth without getting throttled for IOPS.

A stripe size of 512kB (default for mdadm raid over 8 disks) seems to be maximally used with a dd block size of 4MB (tested with increments from 512kB -> 64 MB).

In T600#10262, @olasd wrote:

Azure virtual machines are i/o limited in two ways:

  • total bandwidth
  • total iops

(tests made with a DS3_v2 instance, over 8 data disks, maxing out 12,800 iops / 192 mbps)

This parenthetical means that the instance is _supposed_ to max out at 12,800 iops / 192 MBps. We haven't been able to reach that yet.

We need to tune the raid settings and the dd commands so that we max out the disk bandwidth without getting throttled for IOPS.

A stripe size of 512kB (default for mdadm raid over 8 disks) seems to be maximally used with a dd block size of 4MB (tested with increments from 512kB -> 64 MB).

This gives us around 20-24MB/s of sustained writes, giving us around a week to transfer a complete 10TB partition.

Tuning the machine to use a bigger instance gives us the full blast: the network of smaller instances is throttled at 25MBps.

We now have a complete partition (2) ready to go on a DS5_v2 instance.

We decided not to start indexing the data (i.e. to only inject the data into the azure object store).

I have started an archiver worker on the test machine (with a local rabbitmq) to start copying the data to the azure object storage.

An estimate for the time to copy should arrive in a few hours.

zack triaged this task as Normal priority.Jan 23 2017, 5:45 PM

The copy has completed and the virtual machine has been removed, allowing us to check for "steady state" money flow.

zack added a subscriber: zack.

Investigation is now done and we seem to have found the right/optimal way to do this.
The task of actually completing a full mirror on Azure will be tracked in T691