diff --git a/docs/metadata.rst b/docs/metadata.rst index 8999c6b3..02b6f048 100644 --- a/docs/metadata.rst +++ b/docs/metadata.rst @@ -1,179 +1,178 @@ .. _deposit-metadata: Deposit metadata ================ When making a software deposit into the SWH archive, one can add information describing the software artifact and the software project. .. _metadata-requirements: Metadata requirements --------------------- - **the schema/vocabulary** used *MUST* be specified with a persistent url (DublinCore, DOAP, CodeMeta, etc.) .. code:: xml or or - **the name** of the software deposit *MUST* be provided [atom:title, codemeta:name, dcterms:title] - **the authors** of the software deposit *MUST* be provided - **the url** representing the location of the source *MAY* be provided under the url tag. The url will be used for creating an origin object in the archive. .. code:: xml www.url-example.com - **the description** of the software deposit *SHOULD* be provided [codemeta:description]: short or long description of the software - **the license/s** of the software deposit *SHOULD* be provided [codemeta:license] - other metadata *MAY* be added with terms defined by the schema in use. Examples -------- Using only Atom ~~~~~~~~~~~~~~~ .. code:: xml Awesome Compiler urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a 2017-10-07T15:17:08Z some awesome author Using Atom with CodeMeta ~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: xml Awesome Compiler urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a 1785io25c695 origin url other identifier, DOI, ARK Domain description key-word 1 key-word 2 creation date publication date comment article name article id Collaboration/Projet project name id see also Sponsor A Sponsor B Platform/OS dependencies Version active license url spdx .Net Framework 3.0 Python2.3 author1 Inria UPMC author2 Inria UPMC http://code.com language 1 language 2 http://issuetracker.com Using Atom with DublinCore and CodeMeta (multi-schema entry) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: xml Awesome Compiler - hal urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a hal-01587361 doi:10.5281/zenodo.438684 The assignment problem AffectationRO author [INFO] Computer Science [cs] [INFO.INFO-RO] Computer Science [cs]/Operations Research [cs.RO] SOFTWARE Project in OR: The assignment problemA java implementation for the assignment problem first release description fr 2015-06-01 2017-10-19 en origin url 1.0.0 key word Comment Rfrence interne link Sponsor Platform/OS dependencies Ended license url spdx http://code.com language 1 language 2 Note ---- We aim on harmonizing the metadata from different origins and thus metadata will be translated to the `CodeMeta v.2 `__ vocabulary if possible. diff --git a/docs/specs/spec-loading.rst b/docs/specs/spec-loading.rst index de60cef6..f646f474 100644 --- a/docs/specs/spec-loading.rst +++ b/docs/specs/spec-loading.rst @@ -1,472 +1,471 @@ .. _swh-loading-specs: Loading specification ===================== An important part of the deposit specifications is the loading procedure where a deposit is ingested into the Software Heritage Archive (SWH) using the deposit loader and the complete process of software artifacts creation in the archive. Deposit Loading --------------- The ``swh.loader.package.deposit`` module is able to inject zipfile/tarball's content in SWH with its metadata. The loading of the deposit will use the deposit's associated data: * the metadata * the archive file(s) Artifacts creation ------------------ Deposit to artifacts mapping ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a global view of the deposit ingestion +------------------------------------+-----------------------------------------+ | swh artifact | representation in deposit | +====================================+=========================================+ | origin | https://hal.inria.fr/hal-id | +------------------------------------+-----------------------------------------+ | raw_extrinsic_metadata | aggregated metadata | +------------------------------------+-----------------------------------------+ | snapshot | reception of all occurrences (branches) | +------------------------------------+-----------------------------------------+ | branches | master & tags for releases | | | (not yet implemented) | +------------------------------------+-----------------------------------------+ | release | (optional) synthetic release created | | | from metadata (not yet implemented) | +------------------------------------+-----------------------------------------+ | revision | synthetic revision pointing to | | | the directory (see below) | +------------------------------------+-----------------------------------------+ | directory | root directory of the expanded submitted| | | tarball | +------------------------------------+-----------------------------------------+ Origin artifact ~~~~~~~~~~~~~~~ We create an origin URL by concatenating the client's `provider_url` and the value of the Slug header of the initial POST request of the deposit. For examples: .. code-block:: bash $ http -pb https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/get/ would result in: .. code-block:: json { "origin_visits_url": "https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/visits/", "url": "https://hal.archives-ouvertes.fr/hal-02560320" } Visits ~~~~~~ We identify with a visit each deposit push of the same `external_id`. Here in the example below, two snapshots are identified by two different visits. For examples: .. code-block:: bash $ http -pb https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/visits/ would result in: .. code-block:: json [ { "date": "2020-05-14T11:59:55.942964+00:00", "metadata": {}, "origin": "https://hal.archives-ouvertes.fr/hal-02560320", "origin_visit_url": "https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/visit/2/", "snapshot": "e5e82d064a9c3df7464223042e0c55d72ccff7f0", "snapshot_url": "https://archive.softwareheritage.org/api/1/snapshot/e5e82d064a9c3df7464223042e0c55d72ccff7f0/", "status": "full", "type": "deposit", "visit": 2 }, { "date": "2020-05-14T11:59:41.094260+00:00", "metadata": {}, "origin": "https://hal.archives-ouvertes.fr/hal-02560320", "origin_visit_url": "https://archive.softwareheritage.org/api/1/origin/https://hal.archives-ouvertes.fr/hal-02560320/visit/1/", "snapshot": "3e95ef6e04c381a34cc2f314576bc5644f2c797f", "snapshot_url": "https://archive.softwareheritage.org/api/1/snapshot/3e95ef6e04c381a34cc2f314576bc5644f2c797f/", "status": "full", "type": "deposit", "visit": 1 } ] Snapshot artifact ~~~~~~~~~~~~~~~~~ The snapshot represents one deposit push. The ``HEAD`` branch points to a synthetic revision. For example: .. code-block:: bash $ http -pb https://archive.softwareheritage.org/api/1/snapshot/3e95ef6e04c381a34cc2f314576bc5644f2c797f/ would result in: .. code-block:: json { "branches": { "HEAD": { "target": "2122424b547a8eca9282ba3131ec61ff1d8df7d4", "target_type": "revision", "target_url": "https://archive.softwareheritage.org/api/1/revision/2122424b547a8eca9282ba3131ec61ff1d8df7d4/" } }, "id": "3e95ef6e04c381a34cc2f314576bc5644f2c797f", "next_branch": null } Note that previous versions of the deposit-loader named the branch ``master`` instead, and created release branches under certain conditions. Release artifact ~~~~~~~~~~~~~~~~ .. warning:: This part of the specification is not implemented yet, only releases are currently being created. The content is deposited with a set of descriptive metadata in the CodeMeta vocabulary. The following CodeMeta terms implies that the artifact is a release: - `releaseNotes` - `softwareVersion` If present, a release artifact will be created with the mapping below: +-------------------+-----------------------------------+-----------------+----------------+ | SWH release field | Description | CodeMeta term | Fallback value | +===================+===================================+=================+================+ | target | revision containing all metadata | X |X | +-------------------+-----------------------------------+-----------------+----------------+ | target_type | revision | X |X | +-------------------+-----------------------------------+-----------------+----------------+ | name | release or tag name (mandatory) | softwareVersion | X | +-------------------+-----------------------------------+-----------------+----------------+ | message | message associated with release | releaseNotes | X | +-------------------+-----------------------------------+-----------------+----------------+ | date | release date = publication date | datePublished | deposit_date | +-------------------+-----------------------------------+-----------------+----------------+ -| author | deposit client | author | client | +| author | deposit client | author | X | +-------------------+-----------------------------------+-----------------+----------------+ .. code-block:: json { "release": { "author": { "email": "hal@ccsd.cnrs.fr", "fullname": "HAL ", "name": "HAL" }, "author_url": "/api/1/person/x/", "date": "2019-05-27T16:28:33+02:00", "id": "a9f3396f372ed4a51d75e15ca16c1c2df1fc5c97", "message": "AffectationRO Version 1.1 - added new feature\n", "name": "1.1", "synthetic": true, "target": "396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52", "target_type": "revision", "target_url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/" } } Revision artifact ~~~~~~~~~~~~~~~~~ The metadata sent with the deposit is stored outside the revision, and does not affect the hash computation. It contains the same fields as any revision object; in particular: +-------------------+-----------------------------------------+ | SWH revision field| Description | +===================+=========================================+ | message | synthetic message, containing the name | | | of the deposit client and an internal | | | identifier of the deposit. For example: | | | ``hal: Deposit 817 in collection hal`` | +-------------------+-----------------------------------------+ | author | synthetic author (SWH itself, for now) | +-------------------+-----------------------------------------+ | committer | same as the author (for now) | +-------------------+-----------------------------------------+ | date | see below | +-------------------+-----------------------------------------+ | committer_date | see below | +-------------------+-----------------------------------------+ The date mapping ^^^^^^^^^^^^^^^^ A deposit may contain 4 different dates concerning the software artifacts. The deposit's revision will reflect the most accurate point in time available. Here are all dates that can be available in a deposit: +----------------+---------------------------------+------------------------------------------------+ | dates | location | Description | +================+=================================+================================================+ | reception_date | On SWORD reception (automatic) | the deposit was received at this ts | +----------------+---------------------------------+------------------------------------------------+ | complete_date | On SWH ingestion (automatic) | the ingestion was completed by SWH at this ts | +----------------+---------------------------------+------------------------------------------------+ | dateCreated | metadata in codeMeta (optional) | the software artifact was created at this ts | +----------------+---------------------------------+------------------------------------------------+ | datePublished | metadata in codeMeta (optional) | the software was published (contributed in HAL)| +----------------+---------------------------------+------------------------------------------------+ A visit targeting a snapshot contains one date: +-------------------+----------------------------------------------+----------------+ | SWH visit field | Description | value | +===================+==============================================+================+ | date | the origin pushed the deposit at this date | reception_date | +-------------------+----------------------------------------------+----------------+ A revision contains two dates: +-------------------+-----------------------------------------+----------------+----------------+ | SWH revision field| Description | CodeMeta term | Fallback value | +===================+=========================================+================+================+ | date | date of software artifact modification | dateCreated | reception_date | +-------------------+-----------------------------------------+----------------+----------------+ | committer_date | date of the commit in VCS | datePublished | reception_date | +-------------------+-----------------------------------------+----------------+----------------+ A release contains one date: +-------------------+----------------------------------+----------------+-----------------+ | SWH release field |Description | CodeMeta term | Fallback value | +===================+==================================+================+=================+ | date |release date = publication date | datePublished | reception_date | +-------------------+----------------------------------+----------------+-----------------+ .. code-block:: json { "revision": { "author": { "email": "robot@softwareheritage.org", "fullname": "Software Heritage", "id": 18233048, "name": "Software Heritage" }, "author_url": "/api/1/person/18233048/", "committer": { "email": "robot@softwareheritage.org", "fullname": "Software Heritage", "id": 18233048, "name": "Software Heritage" }, "committer_date": "2019-05-27T16:28:33+02:00", "committer_url": "/api/1/person/18233048/", "date": "2012-01-01T00:00:00+00:00", "directory": "fb13b51abbcfd13de85d9ba8d070a23679576cd7", "directory_url": "/api/1/directory/fb13b51abbcfd13de85d9ba8d070a23679576cd7/", "history_url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/log/", "id": "396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52", "merge": false, "message": "hal: Deposit 282 in collection hal", "metadata": { "@xmlns": "http://www.w3.org/2005/Atom", "@xmlns:codemeta": "https://doi.org/10.5063/SCHEMA/CODEMETA-2.0", "author": { "email": "hal@ccsd.cnrs.fr", "name": "HAL" }, - "client": "hal", "codemeta:applicationCategory": "info", "codemeta:author": { "codemeta:name": "Morane Gruenpeter" }, "codemeta:codeRepository": "www.code-repository.com", "codemeta:contributor": "Morane Gruenpeter", "codemeta:dateCreated": "2012", "codemeta:datePublished": "2019-05-27T16:28:33+02:00", "codemeta:description": "description\\_en test v2", "codemeta:developmentStatus": "Inactif", "codemeta:keywords": "mot_cle_en,mot_cle_2_en,mot_cle_fr", "codemeta:license": [ { "codemeta:name": "MIT License" }, { "codemeta:name": "CeCILL Free Software License Agreement v1.1" } ], "codemeta:name": "Test\\_20190527\\_01", "codemeta:operatingSystem": "OS", "codemeta:programmingLanguage": "Java", "codemeta:referencePublication": null, "codemeta:relatedLink": null, "codemeta:releaseNotes": "releaseNote", "codemeta:runtimePlatform": "outil", "codemeta:softwareVersion": "1.0.1", "codemeta:url": "https://hal.archives-ouvertes.fr/hal-02140606", "codemeta:version": "2", "external_identifier": "hal-02140606", "id": "hal-02140606", "original_artifact": [ { "archive_type": "zip", "blake2s256": "96be3ddedfcee9669ad9c42b0bb3a706daf23824d04311c63505a4d8db02df00", "length": 193072, "name": "archive.zip", "sha1": "5b6ecc9d5bb113ff69fc275dcc9b0d993a8194f1", "sha1_git": "bd10e4d3ede17162692d7e211e08e87e67994488", "sha256": "3e2ce93384251ce6d6da7b8f2a061a8ebdaf8a28b8d8513223ca79ded8a10948" } ] }, "parents": [ { "id": "a9fdc3937d2b704b915852a64de2ab1b4b481003", "url": "/api/1/revision/a9fdc3937d2b704b915852a64de2ab1b4b481003/" } ], "synthetic": true, "type": "tar", "url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/" } } Directory artifact ~~~~~~~~~~~~~~~~~~ The directory artifact is the archive(s)' raw content deposited. .. code-block:: json { "directory": [ { "dir_id": "fb13b51abbcfd13de85d9ba8d070a23679576cd7", "length": null, "name": "AffectationRO", "perms": 16384, "target": "fbc418f9ac2c39e8566b04da5dc24b14e65b23b1", "target_url": "/api/1/directory/fbc418f9ac2c39e8566b04da5dc24b14e65b23b1/", "type": "dir" } ] } Questions raised concerning loading ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - A deposit has one origin, yet an origin can have multiple deposits? No, an origin can have multiple requests for the same deposit. Which should end up in one single deposit (when the client pushes its final request saying deposit 'done' through the header In-Progress). Only update of existing 'partial' deposit is permitted. Other than that, the deposit 'update' operation. To create a new version of a software (already deposited), the client must prior to this create a new deposit. Illustration First deposit loading: HAL's deposit 01535619 = SWH's deposit **01535619-1** :: + 1 origin with url:https://hal.inria.fr/medihal-01535619 + 1 synthetic revision + 1 directory HAL's update on deposit 01535619 = SWH's deposit **01535619-2** (\*with HAL updates can only be on the metadata and a new version is required if the content changes) :: + 1 origin with url:https://hal.inria.fr/medihal-01535619 + new synthetic revision (with new metadata) + same directory HAL's deposit 01535619-v2 = SWH's deposit **01535619-v2-1** :: + same origin + new revision + new directory Scheduling loading ~~~~~~~~~~~~~~~~~~ All ``archive`` and ``metadata`` deposit requests should be aggregated before loading. The loading should be scheduled via the scheduler's api. Only ``deposited`` deposit are concerned by the loading. When the loading is done and successful, the deposit entry is updated: - ``status`` is updated to ``done`` - ``swh-id`` is populated with the resulting :ref:`SWHID ` - ``complete_date`` is updated to the loading's finished time When the loading has failed, the deposit entry is updated: - ``status`` is updated to ``failed`` - ``swh-id`` and ``complete_data`` remains as is *Note:* As a further improvement, we may prefer having a retry policy with graceful delays for further scheduling. Metadata loading ~~~~~~~~~~~~~~~~ - the metadata received with the deposit are kept in a dedicated table ``raw_extrinsic_metadata``, distinct from the ``revision`` and ``origin`` tables. - ``authority`` is computed from the deposit client information, and ``fetcher`` is the deposit loader. diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst index c63f8371..074bf268 100644 --- a/docs/specs/spec-meta-deposit.rst +++ b/docs/specs/spec-meta-deposit.rst @@ -1,135 +1,134 @@ The metadata-deposit ==================== Goal ---- A client wishes to deposit only metadata about an origin or object already present in the Software Heritage archive. The metadata-deposit is a special deposit where no content is provided and the data transferred to Software Heritage is only the metadata about an object in the archive. Requirements ------------ 1. Create a metadata-only deposit through a :ref:`POST request` 2. It is composed of ONLY one xml metadata file 3. It MUST comply with :ref:`the metadata requirements` 4. It MUST reference an **object** or an **origin** in a deposit tag 5. The reference SHOULD exist in the SWH archive 6. The **object** reference MUST be a SWHID on one of the following artifact types: - origin - snapshot - release - revision - directory - content 7. The SWHID MAY be a `core identifier`_ with or without `qualifiers`_ 8. The SWHID MUST NOT reference a fragment of code with the classifier `lines` .. _core identifier: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#core-identifiers .. _qualifiers: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#qualifiers A complete metadata example --------------------------- The reference element is included in the metadata xml atomEntry under the swh namespace: .. code:: xml HAL hal@ccsd.cnrs.fr - hal The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC References ^^^^^^^^^^ The metadata reference can be either on: - an origin - a graph object (core SWHID with or without qualifiers) Origins ======= The metadata may be on an origin, identified by the origin's URL: .. code:: xml Graph objects ============= It may also reference an object in the `SWH graph `: contents, directories, revisions, releases, and snapshots: .. code:: xml .. code:: xml The value of the ``swhid`` attribute must be a `SWHID `, with any context qualifiers in this list: * ``origin`` * ``visit`` * ``anchor`` * ``path`` and they should be provided whenever relevant, especially ``origin``. Other qualifiers are not allowed (for example, ``line`` isn't because SWH cannot store metadata at a finer level than entire contents). Loading procedure ------------------ In this case, the metadata-deposit will be injected as a metadata entry of the relevant object, with the information about the contributor of the deposit. Contrary to the complete and sparse deposit, there will be no object creation. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst index 2d0df06f..c8fbcc8a 100644 --- a/docs/specs/spec-sparse-deposit.rst +++ b/docs/specs/spec-sparse-deposit.rst @@ -1,101 +1,100 @@ The sparse-deposit ================== Goal ---- A client wishes to transfer a tarball for which part of the content is already in the SWH archive. Requirements ------------ To do so, a list of paths with targets must be provided in the metadata and the paths to the missing directories/content should not be included in the tarball. The list will be referred to as the manifest list using the entry name 'bindings' in the metadata. +----------------------+-------------------------------------+ | path | swh-id | +======================+=====================================+ | path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | +----------------------+-------------------------------------+ | path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | +----------------------+-------------------------------------+ Note: the *name* of the file or the directory is given by the path and is not part of the identified object. TODO: see if a trailing "/" is mandatory for implementation. A concrete example ------------------ The manifest list is included in the metadata xml atomEntry under the swh namespace: TODO: publish schema at https://www.softwareheritage.org/schema/2018/deposit .. code:: xml HALit mandatory to have a trailing "/", hal@ccsd.cnrs.fr - hal The assignment problem https://hal.archives-ouvertes.fr/hal-01243573 other identifier, DOI, ARK Domain description author1 Inria UPMC author2 Inria UPMC Deposit verification -------------------- After checking the integrity of the deposit content and metadata, the following checks should be added: 1. validate the manifest list structure with a correct swh-id for each path (syntax check on the swh-id format) 2. verify that the path name corresponds to the object type 3. locate the identifiers in the SWH archive Each failing check should return a different error with the deposit and result in a 'rejected' deposit. Loading procedure ------------------ The injection procedure should include: - load the tarball new data - create new objects using the path name and create links from the path to the SWH object using the identifier - calculate identifier of the new objects at each level - return final swh-id of the new revision Invariant: the same content should yield the same swh-id, that's why a complete deposit with all the content and a sparse-deposit with the correct links will result with the same root directory swh-id. The same is expected with the revision swh-id if the metadata provided is identical. diff --git a/docs/user-manual.rst b/docs/user-manual.rst index e2c9782d..0ba231d2 100644 --- a/docs/user-manual.rst +++ b/docs/user-manual.rst @@ -1,410 +1,409 @@ .. _user-manual: User Manual =========== This is a guide for how to prepare and push a software deposit with the `swh deposit` commands. Requirements ------------ You need to have an account on the Software Heritage deposit application to be able to use the service. Please `contact the Software Heritage team `_ for more information on how to get access to this service. For testing purpose, a test instance `is available `_ [#f1]_ and will be used in the examples below. Once you have an account, you should get a set of access credentials as a `login` and a `password` (identified as ```` and ```` in the remaining of this document). A deposit account also comes with a "provider URL" which is used by SWH to build the :term:`Origin URL` of deposits created using this account. Installation ------------ To install the `swh.deposit` command line tools, you need a working Python 3.7+ environment. It is strongly recommended you use a `virtualenv `_ for this. .. code:: console $ python3 -m virtualenv deposit [...] $ source deposit/bin/activate (deposit)$ pip install swh.deposit [...] (deposit)$ swh deposit --help Usage: swh deposit [OPTIONS] COMMAND [ARGS]... Deposit main command Options: -h, --help Show this message and exit. Commands: admin Server administration tasks (manipulate user or... status Deposit's status upload Software Heritage Public Deposit Client Create/Update... (deposit)$ Note: in the examples below, we use the `jq`_ tool to make json outputs nicer. If you do have it already, you may install it using your distribution's packaging system. For example, on a Debian system: .. _jq: https://stedolan.github.io/jq/ .. code:: console $ sudo apt install jq Prepare a deposit ----------------- * compress the files in a supported archive format: - zip: common zip archive (no multi-disk zip files). - tar: tar archive without compression or optionally any of the following compression algorithm gzip (`.tar.gz`, `.tgz`), bzip2 (`.tar.bz2`) , or lzma (`.tar.lzma`) * (Optional) prepare a metadata file (more details :ref:`deposit-metadata`): Example: Assuming you want to deposit the source code of `belenios `_ version 1.12 .. code:: console (deposit)$ wget https://gitlab.inria.fr/belenios/belenios/-/archive/1.12/belenios-1.12.zip [...] 2020-10-28 11:40:37 (4,56 MB/s) - ‘belenios-1.12.zip’ saved [449880/449880] (deposit)$ Then you need to prepare a metadata file allowing you to give detailed information on your deposited source code. A rather minimal Atom with Codemeta file could be: .. code:: console (deposit)$ cat metadata.xml Verifiable online voting system - belenios belenios-01243065 https://gitlab.inria.fr/belenios/belenios test Online voting Verifiable online voting system 1.12 opam stable ocaml GNU Affero General Public License Belenios belenios@example.com Belenios Test User (deposit)$ Please read the :ref:`deposit-metadata` page for a more detailed view on the metadata file formats and semantics. Push a deposit -------------- You can push a deposit with: * a single deposit (archive + metadata): The user posts in one query a software source code archive and associated metadata. The deposit is directly marked with status ``deposited``. * a multisteps deposit: 1. Create an incomplete deposit (marked with status ``partial``) 2. Add data to a deposit (in multiple requests if needed) 3. Finalize deposit (the status becomes ``deposited``) Overall, a deposit can be a in series of steps as follow: .. figure:: images/status.svg :alt: The important things to notice for now is that it can be: partial: the deposit is partially received expired: deposit has been there too long and is now deemed ready to be garbage collected deposited: deposit is complete and is ready to be checked to ensure data consistency verified: deposit is fully received, checked, and ready for loading loading: loading is ongoing on swh's side done: loading is successful failed: loading is a failure When you push a deposit, it is either in the `deposited` state or in the `partial` state if you asked for a partial upload. Single deposit ^^^^^^^^^^^^^^ Once the files are ready for deposit, we want to do the actual deposit in one shot, i.e. sending both the archive (zip) file and the metadata file. * 1 archive (content-type ``application/zip`` or ``application/x-tar``) * 1 metadata file in atom xml format (``content-type: application/atom+xml;type=entry``) For this, we need to provide the: * arguments: ``--username 'name' --password 'pass'`` as credentials * archive's path (example: ``--archive path/to/archive-name.tgz``) * metadata file path (example: ``--metadata path/to/metadata.xml``) to the `swh deposit upload` command. Example: To push the Belenios 1.12 we prepared previously on the testing instance of the deposit: .. code:: console (deposit)$ ls belenios-1.12.zip metadata.xml deposit (deposit)$ swh deposit upload --username --password \ --url https://deposit.staging.swh.network/1 \ --slug belenios-01243065 \ --archive belenios.zip \ --metadata metadata.xml \ --format json | jq { 'deposit_status': 'deposited', 'deposit_id': '1', 'deposit_date': 'Oct. 28, 2020, 1:52 p.m.', 'deposit_status_detail': None } (deposit)$ You just posted a deposit to your main collection on Software Heritage (staging area)! The returned value is a JSON dict, in which you will notably find the deposit id (needed to check for its status later on) and the current status, which should be `deposited` if no error has occurred. Note: As the deposit is in ``deposited`` status, you can no longer update the deposit after this query. It will be answered with a 403 (Forbidden) answer. If something went wrong, an equivalent response will be given with the `error` and `detail` keys explaining the issue, e.g.: .. code:: console { 'error': 'Unknown collection name xyz', 'detail': None, 'deposit_status': None, 'deposit_status_detail': None, 'deposit_swh_id': None, 'status': 404 } Once the deposit has been done, you can check its status using the `swh deposit status` command: .. code:: console (deposit)$ swh deposit status --username --password \ --url https://deposit.staging.swh.network/1 \ --deposit-id 1 -f json | jq { "deposit_id": "1", "deposit_status": "done", "deposit_status_detail": "The deposit has been successfully loaded into the Software Heritage archive", "deposit_swh_id": "swh:1:dir:63a6fc0ed8f69bf66ccbf99fc0472e30ef0a895a", "deposit_swh_id_context": "swh:1:dir:63a6fc0ed8f69bf66ccbf99fc0472e30ef0a895a;origin=https://softwareheritage.org/belenios-01234065;visit=swh:1:snp:0ae536667689da7047bfb7aa9f37f5958e9f4647;anchor=swh:1:rev:17ad98c940104d45b6b6bd6fba9aa832eeb95638;path=/", "deposit_external_id": "belenios-01234065" } Multisteps deposit ^^^^^^^^^^^^^^^^^^ In this case, the deposit is created by several requests, uploading objects piece by piece. The steps to create a multisteps deposit: 1. Create an partial deposit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First use the ``--partial`` argument to declare there is more to come .. code:: console $ swh deposit upload --username name --password secret \ --archive foo.tar.gz \ --partial 2. Add content or metadata to the deposit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Continue the deposit by using the ``--deposit-id`` argument given as a response for the first step. You can continue adding content or metadata while you use the ``--partial`` argument. To only add one new archive to the deposit: .. code:: console $ swh deposit upload --username name --password secret \ --archive add-foo.tar.gz \ --deposit-id 42 \ --partial To only add metadata to the deposit: .. code:: console $ swh deposit upload --username name --password secret \ --metadata add-foo.tar.gz.metadata.xml \ --deposit-id 42 \ --partial 3. Finalize deposit ~~~~~~~~~~~~~~~~~~~ On your last addition (same command as before), by not declaring it ``--partial``, the deposit will be considered completed. Its status will be changed to ``deposited``: .. code:: console $ swh deposit upload --username name --password secret \ --metadata add-foo.tar.gz.metadata.xml \ --deposit-id 42 Update deposit -------------- * replace deposit: - only possible if the deposit status is ``partial`` and ``--deposit-id `` is provided - by using the ``--replace`` flag - ``--metadata-deposit`` replaces associated existing metadata - ``--archive-deposit`` replaces associated archive(s) - by default, with no flag or both, you'll replace associated metadata and archive(s): .. code:: console $ swh deposit upload --username name --password secret \ --deposit-id 11 \ --archive updated-je-suis-gpl.tgz \ --replace * update a loaded deposit with a new version: - by using the external-id with the ``--slug`` argument, you will link the new deposit with its parent deposit: .. code:: console $ swh deposit upload --username name --password secret \ --archive je-suis-gpl-v2.tgz \ --slug 'je-suis-gpl' \ Check the deposit's status -------------------------- You can check the status of the deposit by using the ``--deposit-id`` argument: .. code:: console $ swh deposit status --username name --password secret \ --deposit-id 11 .. code:: json { "deposit_id": 11, "deposit_status": "deposited", "deposit_swh_id": null, "deposit_status_detail": "Deposit is ready for additional checks \ (tarball ok, metadata, etc...)" } When the deposit has been loaded into the archive, the status will be marked ``done``. In the response, will also be available the , . For example: .. code:: json { "deposit_id": 11, "deposit_status": "done", "deposit_swh_id": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9", "deposit_swh_id_context": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;\ origin=https://forge.softwareheritage.org/source/jesuisgpl/;\ visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;\ anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/", "deposit_status_detail": "The deposit has been successfully \ loaded into the Software Heritage archive" } .. rubric:: Footnotes .. [#f1] the test instance of the deposit is not yet available to external users, but it should be available soon. diff --git a/swh/deposit/tests/data/atom/entry-data-fail-metadata-functional-checks.xml b/swh/deposit/tests/data/atom/entry-data-fail-metadata-functional-checks.xml index 4f06396a..38be30e0 100644 --- a/swh/deposit/tests/data/atom/entry-data-fail-metadata-functional-checks.xml +++ b/swh/deposit/tests/data/atom/entry-data-fail-metadata-functional-checks.xml @@ -1,7 +1,6 @@ - hal urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a 2017-10-07T15:17:08Z diff --git a/swh/deposit/tests/data/atom/entry-data0.xml b/swh/deposit/tests/data/atom/entry-data0.xml index 48a5ef2f..a221f441 100644 --- a/swh/deposit/tests/data/atom/entry-data0.xml +++ b/swh/deposit/tests/data/atom/entry-data0.xml @@ -1,25 +1,24 @@ Awesome Compiler - hal urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a 2017-10-07T15:17:08Z some awesome author something awesome-compiler This is an awesome compiler destined to awesomely compile stuff and other stuff compiler,programming,language 2005-10-07T17:17:08Z 2005-10-07T17:17:08Z release note related link Awesome https://hoster.org/awesome-compiler GNU/Linux 0.0.1 running all diff --git a/swh/deposit/tests/data/atom/entry-data1.xml b/swh/deposit/tests/data/atom/entry-data1.xml index e4f415c7..e1bccc7a 100644 --- a/swh/deposit/tests/data/atom/entry-data1.xml +++ b/swh/deposit/tests/data/atom/entry-data1.xml @@ -1,24 +1,23 @@ - hal urn:uuid:2225c695-cfb8-4ebb-aaaa-80da344efa6a 2017-10-07T15:17:08Z some awesome author something awesome-compiler This is an awesome compiler destined to awesomely compile stuff and other stuff compiler,programming,language 2005-10-07T17:17:08Z 2005-10-07T17:17:08Z release note related link Awesome https://hoster.org/awesome-compiler GNU/Linux 0.0.1 running all diff --git a/swh/deposit/tests/data/atom/error-with-decimal.xml b/swh/deposit/tests/data/atom/error-with-decimal.xml index b2c0c05b..6ba3f4f0 100644 --- a/swh/deposit/tests/data/atom/error-with-decimal.xml +++ b/swh/deposit/tests/data/atom/error-with-decimal.xml @@ -1,37 +1,36 @@ Composing a Web of Audio Applications - hal hal-01243065 https://hal-test.archives-ouvertes.fr/hal-01243065 test DSP programming,Web,Composability,Faust 2017-05-03T16:08:47+02:00 The Web offers a great opportunity to share, deploy and use programs without installation difficulties. In this article we explore the idea of freely combining/composing real-time audio applications deployed on the Web using Faust audio DSP language. 1 10.4 phpstorm stable linux php python C GNU General Public License v3.0 only CeCILL Free Software License Agreement v1.1 HAL hal@ccsd.cnrs.fr Someone Nice someone@nice.fr FFJ diff --git a/swh/deposit/tests/data/atom/error-with-external-identifier.xml b/swh/deposit/tests/data/atom/error-with-external-identifier.xml index beeb4d8d..e6711a17 100644 --- a/swh/deposit/tests/data/atom/error-with-external-identifier.xml +++ b/swh/deposit/tests/data/atom/error-with-external-identifier.xml @@ -1,7 +1,6 @@ Composing a Web of Audio Applications - hal hal-01243065 hal-01243065 diff --git a/swh/deposit/tests/data/atom/metadata.xml b/swh/deposit/tests/data/atom/metadata.xml index e2a16aac..f00484b2 100644 --- a/swh/deposit/tests/data/atom/metadata.xml +++ b/swh/deposit/tests/data/atom/metadata.xml @@ -1,31 +1,30 @@ Composing a Web of Audio Applications - hal hal-01243065 https://hal-test.archives-ouvertes.fr/hal-01243065 test DSP programming this is the description 1 phpstorm stable php python C GNU General Public License v3.0 only CeCILL Free Software License Agreement v1.1 HAL hal@ccsd.cnrs.fr Morane Gruenpeter %s