diff --git a/talks-public/2018-02-04-deposit-vault-walkthorugh/deposit-vault-walkthrough.org b/talks-public/2018-02-04-deposit-vault-walkthorugh/deposit-vault-walkthrough.org index 15a16b4..822fe9e 100644 --- a/talks-public/2018-02-04-deposit-vault-walkthorugh/deposit-vault-walkthrough.org +++ b/talks-public/2018-02-04-deposit-vault-walkthorugh/deposit-vault-walkthrough.org @@ -1,213 +1,185 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+LATEX_HEADER_EXTRA: \usepackage{listings} #+INCLUDE: "../../common/modules/prelude.org" :minlevel 1 #+TITLE: Source Code Deposit Walkthrough # does not allow short title, so we override it for beamer as follows : #+BEAMER_HEADER: \title[Software Heritage]{Software Heritage\\Source Code Deposit Walkthrough} #+BEAMER_HEADER: \author{Morane Gruenepter} #+BEAMER_HEADER: \date[04/01/2018, Deposit walkthrough]{04 January 2018\\Deposit walkthrough\\Paris, France} #+AUTHOR: Morane Gruenpeter #+DATE: 04 January 2018 #+EMAIL: morane@gmail.com #+DESCRIPTION: Software Heritage: Source Code Deposit Walkthroug #+KEYWORDS: software heritage preservation knowledge deposit technology sword #+BEAMER_HEADER: \institute[Software Heritage]{Metadata specialist\\Software Heritage\\\href{mailto:morane@softwareheritage.org}{\tt morane@softwareheritage.org}} * Source code deposit: From deposit to vault ** Source code deposit: From deposit to vault :PROPERTIES: :CUSTOM_ID: walkthrough :END: **** First version of our software deposit prototype \\ - *\url{https://deposit.softwareheritage.org/}* + documentation is available on:\\ + *\url{docs.softwareheritage.org/devel/swh-deposit/}* **** Features - pushing *deposits* to the Software Heritage archive - software source code + metadata - full *transparency* of the loading and downloading processes - download the deposit by cooking the bundle in the *vault* +**** SWORD-compliant + - *SWORD v2* protocol for single and multi-part deposits + - deposit MUST, SHOULD and MAY contain certain metadata attributes -* Deposit walkthrough -** Deposit walkthrough - :PROPERTIES: - :CUSTOM_ID: depositwalkthrough - :END: -*** Request service document -#+BEAMER: \scriptsize -#+BEGIN_SRC -$ curl -i --user "$CREDS" \ - https://deposit.softwareheritage.org/1/servicedocument/ -#+END_SRC - -#+BEAMER: \pause -*** response -#+BEAMER: \tiny -#+BEGIN_SRC -HTTP/1.0 200 OK -Server: WSGIServer/0.2 CPython/3.5.3 -Content-Type: application/xml - - -... -2.0 -209715200 - - The Software Heritage (SWH) Archive - - Software Collection - application/zip - Collection Policy - Software Heritage Archive - Collect, Preserve, Share - ... - - - -#+END_SRC - - +* Deposit walkthrough ** Deposit walkthrough *** Pushing a single deposit with metadata #+BEAMER: \tiny #+BEGIN_SRC -$ curl -i -u "$CREDS" \ +#!/usr/bin/env bash + +ARCHIVE=${1-'je-suis-gpl.tar.gz'} +MD5=$(2-md5sum $ARCHIVE | cut -f 1 -d' ') +NAME=$(3-basename $ARCHIVE) +METADATA_ENTRY=${4-'metadata.xml'} +EXTERNAL_ID=${5-'external-id'} +COLLECTION =${6-'fsf_collection'} + +curl -i -u 'client_name':'client_password' \ -X POST \ --data-binary @${ARCHIVE} \ -H "In-Progress: false" \ -H "Content-MD5: ${MD5}" \ -H "Content-Disposition: attachment; filename=${NAME}" \ -H "Slug: ${EXTERNAL_ID}" \ - -H "Packaging: http://purl.org/net/sword/package/SimpleZip" \ - -H "Content-type: application/zip" \ -F "atom=@${METADATA_ENTRY};type=application/atom+xml;type=entry" \ - ${SERVER}/1/${COLLECTION}/ + deposit.softwareheritage.org/1/${COLLECTION} #+END_SRC #+BEAMER: \pause *** response #+BEAMER: \tiny #+BEGIN_SRC - + 11 Jan. 4, 2018, 2:51 p.m. - swh-deposit.zip + je-suis-gpl.tar.gz ready-for-checks - ... + href="deposit.softwareheritage.org/1/${COLLECTION}/11/status/" /> ... #+END_SRC ** Deposit walkthrough *** Multi-part deposit - - To create a multi-part deposit, the *In-Progress* header is /true/. - - The deposit will be completed and marked *ready-for-checks* when the header is /false/. + - *In-Progress* header is /true/ when creating a multi-part deposit. + - the deposit will be completed and marked *ready-for-checks* when the header is /false/. + - use the *DEPOSIT-ID* given on the first deposit. #+BEAMER: \pause *** Updating a multi-part deposit #+BEAMER: \tiny #+BEGIN_SRC -$ curl -i -u "$CREDS" \ +curl -i -u 'client_name':'client_password' \ -X PUT \ --data-binary @${ARCHIVE} \ -H "In-Progress: true" \ -H "Content-MD5: ${MD5}" \ -H "Content-Disposition: attachment; filename=${NAME}" \ - -H 'Slug: external-id' \ - -H 'Packaging: http://purl.org/net/sword/package/SimpleZip' \ - -H 'Content-type: application/zip' \ - ${SERVER}/1/${COLLECTION}/${DEPOSIT_ID}/media/ + -H "Slug:${EXTERNAL_ID}" \ + -H "Content-type: application/zip" \ + deposit.softwareheritage.org/1/${COLLECTION}/${DEPOSIT_ID}/media/ #+END_SRC ** Deposit walkthrough *** What's your status? - *partial* : multi-part deposit is still ongoing - *ready-for-checks*: deposit completed - *ready-for-load*: content and metadata verified - *success*: loading completed successfuly - *failure*: loading failed #+BEAMER: \pause *** Checking the deposit's state #+BEAMER: \tiny #+BEGIN_SRC -$ curl -i -u "${CREDS}" \ - ${SERVER}/1/${COLLECTION}/${DEPOSIT_ID}/status/ +$ curl -i -u 'client_name':'client_password' \ + deposit.softwareheritage.org/1/${COLLECTION}/${DEPOSIT_ID}/status/ #+END_SRC #+BEAMER: \pause *** #+BEAMER: \tiny #+BEGIN_SRC HTTP/1.0 200 OK Date: Thu, 04 Jan 2018 15:20:12 GMT ... 11 success - Loading is successful + the deposited archive has been + successfully ingested into the + Software Heritage archive 608757ea9bd8494d729732cc9a414948c160bd3c #+END_SRC ** The deposit was succesfuly pushed now we want to download the content with the #+BEAMER: \huge \centering *Vault* * Vault walkthrough ** Vault walkthrough *** Requesting download with swh-id #+BEAMER: \tiny #+BEGIN_SRC python from swh.vault.api.client import RemoteVaultClient c = RemoteVaultClient('http://orangeriedev.internal.softwareheritage.org:5005') c.cook('revision_gitfast', '608757ea9bd8494d729732cc9a414948c160bd3c') #+END_SRC *** Checking progress #+BEAMER: \tiny #+BEGIN_SRC py # Call that as many times as you want to check the cooking progress c.progress('revision_gitfast', '608757ea9bd8494d729732cc9a414948c160bd3c') #+END_SRC *** response #+BEAMER: \tiny #+BEGIN_SRC { 'fetch_url': '/api/1/vault/revision_gitfast/594617d1cd9d9d6bc0cfbd531bbaa1ed19627e9b/raw/', 'progress_message': None, 'status': 'done', 'id': 4, 'obj_id': '608757ea9bd8494d729732cc9a414948c160bd3c', 'obj_type': 'revision_gitfast' } #+END_SRC ** Vault walkthrough *** Download when status is marked /done/ #+BEAMER: \tiny #+BEGIN_SRC python $ curl https://archive.softwareheritage.org/api/1/vault/revision_gitfast/swh-id/raw/ \ path/to/revision.gitfast.gz $ git init $ zcat path/to/revision.gitfast.gz | git fast-import $ git revert HEAD #+END_SRC