Page MenuHomeSoftware Heritage

Vault: use objstorage streaming to store and fetch bundles
Closed, MigratedEdits Locked

Event Timeline

seirl removed seirl as the assignee of this task.Mar 22 2019, 1:33 PM

see T1964 for a concrete example where the lack of streaming is causing problems (after the cooking, when the bundle is ready)

$ wget https://archive.softwareheritage.org/api/1/vault/revision/85678b0d6c52d6fd0af50c8e493c74dd15a7115d/gitfast/raw/
--2019-09-19 11:43:50--  https://archive.softwareheritage.org/api/1/vault/revision/85678b0d6c52d6fd0af50c8e493c74dd15a7115d/gitfast/raw/
Resolving archive.softwareheritage.org (archive.softwareheritage.org)... 128.93.193.31
Connecting to archive.softwareheritage.org (archive.softwareheritage.org)|128.93.193.31|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 539845226 (515M) [application/gzip]
Saving to: ‘index.html’

index.html                    31%[=============>                                ] 162,18M  2,66MB/s    in 66s     

2019-09-19 11:46:13 (2,46 MB/s) - Connection closed at byte 170059557. Retrying.

--2019-09-19 11:46:14--  (try: 2)  https://archive.softwareheritage.org/api/1/vault/revision/85678b0d6c52d6fd0af50c8e493c74dd15a7115d/gitfast/raw/
Connecting to archive.softwareheritage.org (archive.softwareheritage.org)|128.93.193.31|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2019-09-19 11:50:29 ERROR 400: Bad Request.

I wonder whether the best solution wouldn't be to just generate a redirect to a direct download url from the azure bucket using a temporary shared access signature.

e.g. in the rocrail case: https://swhvaultstorage.blob.core.windows.net/contents/36489f4afbc3d2d3a43bf00d79f03deb4e9ed5f7?sp=r&st=2019-09-19T11:14:55Z&se=2019-09-19T19:14:55Z&spr=https&sv=2018-03-28&sig=IlIioroy1rkUxCRxLirH7newNos4AQbigrioxIpXpWA%3D&sr=b (expiry today at 19:15 UTC)

Of course, the current bundles are double-compressed, which makes this... not great.

Pluggable compression has been implemented for all objstorage backends, which means we could

  • store the (compressed) bundles in an uncompressed objstorage on azure
  • when a user requests the bundle