diff --git a/docs/getting-started.rst b/docs/getting-started.rst index 3ecc9615..48257e5b 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -1,309 +1,309 @@ Getting Started =============== This is a guide for how to prepare and push a software deposit with the swh-deposit commands. The api is rooted at https://deposit.softwareheritage.org/1. For more details, see the `main documentation <./index.html>`__. Requirements ------------ You need to be referenced on SWH's client list to have: * credentials (needed for the basic authentication step) - in this document we reference ```` as the client's name and ```` as its associated authentication password. * an associated collection `Contact us for more information. `__ Prepare a deposit ----------------- * compress the files in a supported archive format: - zip: common zip archive (no multi-disk zip files). - tar: tar archive without compression or optionally any of the following compression algorithm gzip (.tar.gz, .tgz), bzip2 (.tar.bz2) , or lzma (.tar.lzma) * prepare a metadata file (`more details <./metadata.html>`__.): - specify metadata schema/vocabulary (CodeMeta is recommended) - specify *MUST* metadata (url, authors, software name and the external\_identifier) - add all available information under the compatible metadata term An example of an atom entry file with CodeMeta terms: .. code:: xml Je suis GPL swh je-suis-gpl https://forge.softwareheritage.org/source/jesuisgpl/ 2018-01-05 Je suis GPL is a modified version of GNU Hello whose sole purpose is to showcase the usage of Software Heritage for license compliance purposes. 0.1 GNU/Linux stable C GNU General Public License v3.0 or later https://spdx.org/licenses/GPL-3.0-or-later.html Stefano Zacchiroli Maintainer Push deposit ------------ You can push a deposit with: * a single deposit (archive + metadata): The user posts in one query a software source code archive and associated metadata. The deposit is directly marked with status ``deposited``. * a multisteps deposit: 1. Create an incomplete deposit (marked with status ``partial``) 2. Add data to a deposit (in multiple requests if needed) 3. Finalize deposit (the status becomes ``deposited``) Single deposit ^^^^^^^^^^^^^^ Once the files are ready for deposit, we want to do the actual deposit in one shot, sending exactly one POST query: * 1 archive (content-type ``application/zip`` or ``application/x-tar``) * 1 metadata file in atom xml format (``content-type: application/atom+xml;type=entry``) For this, we need to provide the: * arguments: ``--username 'name' --password 'pass'`` as credentials * archive's path (example: ``--archive path/to/archive-name.tgz``) : * (optionally) metadata file's path ``--metadata path/to/file.metadata.xml``. If not provided, the archive's filename will be used to determine the metadata file, e.g: ``path/to/archive-name.tgz.metadata.xml`` * (optionally) ``--slug 'your-id'`` argument, a reference to a unique identifier the client uses for the software object. You can do this with the following command: minimal deposit .. code:: shell - $ swh-deposit client --username name --password secret \ - --archive je-suis-gpl.tgz + $ swh-deposit deposit --username name --password secret \ + --archive je-suis-gpl.tgz with client's external identifier (``slug``) .. code:: shell - $ swh-deposit client --username name --password secret \ - --archive je-suis-gpl.tgz \ - --slug je-suis-gpl + $ swh-deposit deposit --username name --password secret \ + --archive je-suis-gpl.tgz \ + --slug je-suis-gpl to a specific client's collection .. code:: shell - $ swh-deposit client --username name --password secret \ - --archive je-suis-gpl.tgz \ - --collection 'second-collection' + $ swh-deposit deposit --username name --password secret \ + --archive je-suis-gpl.tgz \ + --collection 'second-collection' You just posted a deposit to your collection on Software Heritage If everything went well, the successful response will contain the elements below: .. code:: shell { 'deposit_status': 'deposited', 'deposit_id': '7', 'deposit_date': 'Jan. 29, 2018, 12:29 p.m.' } Note: As the deposit is in ``deposited`` status, you can no longer update the deposit after this query. It will be answered with a 403 forbidden answer. If something went wrong, an equivalent response will be given with the `error` and `detail` keys explaining the issue, e.g.: .. code:: shell { 'error': 'Unknown collection name xyz', 'detail': None, 'deposit_status': None, 'deposit_status_detail': None, 'deposit_swh_id': None, 'status': 404 } multisteps deposit ^^^^^^^^^^^^^^^^^^^^^^^^^ The steps to create a multisteps deposit: 1. Create an incomplete deposit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First use the ``--partial`` argument to declare there is more to come .. code:: shell - $ swh-deposit client --username name --password secret \ - --archive foo.tar.gz \ - --partial + $ swh-deposit deposit --username name --password secret \ + --archive foo.tar.gz \ + --partial 2. Add content or metadata to the deposit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Continue the deposit by using the ``--deposit-id`` argument given as a response for the first step. You can continue adding content or metadata while you use the ``--partial`` argument. .. code:: shell - $ swh-deposit client --username name --password secret \ - --archive add-foo.tar.gz \ - --deposit-id 42 \ - --partial + $ swh-deposit deposit --username name --password secret \ + --archive add-foo.tar.gz \ + --deposit-id 42 \ + --partial In case you want to add only one new archive without metadata: .. code:: shell - $ swh-deposit client --username name --password secret \ - --archive add-foo.tar.gz \ - --archive-deposit \ - --deposit-id 42 \ - --partial \ + $ swh-deposit deposit --username name --password secret \ + --archive add-foo.tar.gz \ + --archive-deposit \ + --deposit-id 42 \ + --partial \ If you want to add only metadata, use: .. code:: shell - $ swh-deposit client --username name --password secret \ - --metadata add-foo.tar.gz.metadata.xml \ - --metadata-deposit \ - --deposit-id 42 \ - --partial + $ swh-deposit deposit --username name --password secret \ + --metadata add-foo.tar.gz.metadata.xml \ + --metadata-deposit \ + --deposit-id 42 \ + --partial 3. Finalize deposit -~~~~~~~~~~~~~~~~~~~ + ~~~~~~~~~~~~~~~~~~~ On your last addition, by not declaring it as ``--partial``, the deposit will be considered as completed and its status will be changed to ``deposited``. Update deposit ---------------- * replace deposit: - only possible if the deposit status is ``partial`` and ``--deposit-id `` is provided - by using the ``--replace`` flag - ``--metadata-deposit`` replaces associated existing metadata - ``--archive-deposit`` replaces associated archive(s) - by default, with no flag or both, you'll replace associated metadata and archive(s) .. code:: shell - $ swh-deposit client --username name --password secret \ - --deposit-id 11 \ - --archive updated-je-suis-gpl.tgz \ - --replace + $ swh-deposit deposit --username name --password secret \ + --deposit-id 11 \ + --archive updated-je-suis-gpl.tgz \ + --replace * update a loaded deposit with a new version: - by using the external-id with the ``--slug`` argument, you will link the new deposit with its parent deposit .. code:: shell - $ swh-deposit client --username name --password secret \ - --archive je-suis-gpl-v2.tgz \ - --slug 'je-suis-gpl' \ + $ swh-deposit deposit --username name --password secret \ + --archive je-suis-gpl-v2.tgz \ + --slug 'je-suis-gpl' \ Check the deposit's status -------------------------- You can check the status of the deposit by using the ``--deposit-id`` argument: .. code:: shell - $ swh-deposit client --username name --password secret --deposit-id '11' --status + $ swh-deposit deposit --username name --password secret --deposit-id '11' --status .. code:: json { 'deposit_id': '11', 'deposit_status': 'deposited', 'deposit_swh_id': None, 'deposit_status_detail': 'Deposit is ready for additional checks \ (tarball ok, metadata, etc...)' } The different statuses: - **partial**: multipart deposit is still ongoing - **deposited**: deposit completed - **rejected**: deposit failed the checks - **verified**: content and metadata verified - **loading**: loading in-progress - **done**: loading completed successfully - **failed**: the deposit loading has failed When the deposit has been loaded into the archive, the status will be marked ``done``. In the response, will also be available the , , , . For example: .. code:: json { 'deposit_id': '11', 'deposit_status': 'done', 'deposit_swh_id': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9', 'deposit_swh_id_context': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/', 'deposit_swh_anchor_id': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb', 'deposit_swh_anchor_id_context': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;origin=https://forge.softwareheritage.org/source/jesuisgpl/', 'deposit_status_detail': 'The deposit has been successfully \ loaded into the Software Heritage archive' } diff --git a/swh/deposit/cli.py b/swh/deposit/cli.py index 282a4695..e5d48e09 100644 --- a/swh/deposit/cli.py +++ b/swh/deposit/cli.py @@ -1,453 +1,453 @@ # Copyright (C) 2017-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information import click import os import logging import uuid from swh.deposit.config import setup_django_for try: from swh.deposit.client import PublicApiDepositClient except ImportError: logging.warn("Optional client subcommand unavailable. " "Install swh.deposit.client to be able to use it.") CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help']) @click.group(context_settings=CONTEXT_SETTINGS) @click.option('--config-file', '-C', default=None, type=click.Path(exists=True, dir_okay=False,), help="Optional extra configuration file.") @click.option('--platform', default='development', type=click.Choice(['development', 'production']), help='development or production platform') @click.option('--verbose/--no-verbose', default=False, help='Verbose mode') @click.pass_context def cli(ctx, config_file, platform, verbose): logger = logging.getLogger(__name__) logger.addHandler(logging.StreamHandler()) _loglevel = logging.DEBUG if verbose else logging.INFO logger.setLevel(_loglevel) ctx.ensure_object(dict) # configuration happens here setup_django_for(platform, config_file=config_file) ctx.obj = {'loglevel': _loglevel} @cli.group('user') @click.pass_context def user(ctx): """Manipulate user.""" pass def _create_collection(name): """Create the collection with name if it does not exist. Args: name (str): collection's name Returns: collection (DepositCollection): the existing collection object (created or not) """ # to avoid loading too early django namespaces from swh.deposit.models import DepositCollection try: collection = DepositCollection.objects.get(name=name) click.echo('Collection %s exists, nothing to do.' % name) except DepositCollection.DoesNotExist: click.echo('Create new collection %s' % name) collection = DepositCollection.objects.create(name=name) click.echo('Collection %s created' % name) return collection @user.command('create') @click.option('--username', required=True, help="User's name") @click.option('--password', required=True, help="Desired user's password (plain).") @click.option('--firstname', default='', help="User's first name") @click.option('--lastname', default='', help="User's last name") @click.option('--email', default='', help="User's email") @click.option('--collection', help="User's collection") @click.pass_context def user_create(ctx, username, password, firstname, lastname, email, collection): """Create a user with some needed information (password, collection) If the collection does not exist, the collection is then created alongside. The password is stored encrypted using django's utilies. """ # to avoid loading too early django namespaces from swh.deposit.models import DepositClient click.echo('collection: %s' % collection) # create the collection if it does not exist collection = _create_collection(collection) # user create/update try: user = DepositClient.objects.get(username=username) click.echo('User %s exists, updating information.' % user) user.set_password(password) except DepositClient.DoesNotExist: click.echo('Create new user %s' % username) user = DepositClient.objects.create_user( username=username, password=password) user.collections = [collection.id] user.first_name = firstname user.last_name = lastname user.email = email user.is_active = True user.save() click.echo('Information registered for user %s' % user) @user.command('list') @click.pass_context def user_list(ctx): """List existing users. This entrypoint is not paginated yet as there is not a lot of entry. """ # to avoid loading too early django namespaces from swh.deposit.models import DepositClient users = DepositClient.objects.all() if not users: output = 'Empty user list' else: output = '\n'.join((user.username for user in users)) click.echo(output) @cli.group('collection') @click.pass_context def collection(ctx): """Manipulate collection.""" pass @collection.command('create') @click.option('--name', required=True, help="Collection's name") @click.pass_context def collection_create(ctx, name): _create_collection(name) @collection.command('list') @click.pass_context def collection_list(ctx): """List existing collections. This entrypoint is not paginated yet as there is not a lot of entry. """ # to avoid loading too early django namespaces from swh.deposit.models import DepositCollection collections = DepositCollection.objects.all() if not collections: output = 'Empty collection list' else: output = '\n'.join((col.name for col in collections)) click.echo(output) class InputError(ValueError): """Input script error """ pass def generate_slug(prefix='swh-sample'): """Generate a slug (sample purposes). """ return '%s-%s' % (prefix, uuid.uuid4()) def client_command_parse_input( username, password, archive, metadata, archive_deposit, metadata_deposit, collection, slug, partial, deposit_id, replace, url, status): """Parse the client subcommand options and make sure the combination is acceptable*. If not, an InputError exception is raised explaining the issue. By acceptable, we mean: - A multipart deposit (create or update) needs both an existing software archive and an existing metadata file - A binary deposit (create/update) needs an existing software archive - A metadata deposit (create/update) needs an existing metadata file - A deposit update needs a deposit_id to be provided This won't prevent all failure cases though. The remaining errors are already dealt with the underlying api client. Raises: InputError explaining the issue Returns: dict with the following keys: 'archive': the software archive to deposit 'username': username 'password': associated password 'metadata': the metadata file to deposit 'collection': the username's associated client 'slug': the slug or external id identifying the deposit to make 'partial': if the deposit is partial or not 'client': instantiated class 'url': deposit's server main entry point 'deposit_type': deposit's type (binary, multipart, metadata) 'deposit_id': optional deposit identifier """ if status and not deposit_id: raise InputError("Deposit id must be provided for status check") if status and deposit_id: # status is higher priority over deposit archive_deposit = False metadata_deposit = False archive = None metadata = None if archive_deposit and metadata_deposit: # too many flags use, remove redundant ones (-> multipart deposit) archive_deposit = False metadata_deposit = False if archive and not os.path.exists(archive): raise InputError('Software Archive %s must exist!' % archive) if archive and not metadata: metadata = '%s.metadata.xml' % archive if metadata_deposit: archive = None if archive_deposit: metadata = None if metadata_deposit and not metadata: raise InputError( "Metadata deposit filepath must be provided for metadata deposit") if metadata and not os.path.exists(metadata): raise InputError('Software Archive metadata %s must exist!' % metadata) if not status and not archive and not metadata: raise InputError( 'Please provide an actionable command. See --help for more ' 'information.') if replace and not deposit_id: raise InputError( 'To update an existing deposit, you must provide its id') client = PublicApiDepositClient({ 'url': url, 'auth': { 'username': username, 'password': password }, }) if not collection: # retrieve user's collection sd_content = client.service_document() if 'error' in sd_content: raise InputError('Service document retrieval: %s' % ( sd_content['error'], )) collection = sd_content['collection'] if not slug: # generate slug slug = generate_slug() return { 'archive': archive, 'username': username, 'password': password, 'metadata': metadata, 'collection': collection, 'slug': slug, 'partial': partial, 'client': client, 'url': url, 'deposit_id': deposit_id, 'replace': replace, } def deposit_status(config, dry_run, logger): logger.debug('Status deposit') client = config['client'] collection = config['collection'] deposit_id = config['deposit_id'] if not dry_run: r = client.deposit_status(collection, deposit_id, logger) return r return {} def deposit_create(config, dry_run, logger): """Delegate the actual deposit to the deposit client. """ logger.debug('Create deposit') client = config['client'] collection = config['collection'] archive_path = config['archive'] metadata_path = config['metadata'] slug = config['slug'] in_progress = config['partial'] if not dry_run: r = client.deposit_create(collection, slug, archive_path, metadata_path, in_progress, logger) return r return {} def deposit_update(config, dry_run, logger): """Delegate the actual deposit to the deposit client. """ logger.debug('Update deposit') client = config['client'] collection = config['collection'] deposit_id = config['deposit_id'] archive_path = config['archive'] metadata_path = config['metadata'] slug = config['slug'] in_progress = config['partial'] replace = config['replace'] if not dry_run: r = client.deposit_update(collection, deposit_id, slug, archive_path, metadata_path, in_progress, replace, logger) return r return {} @cli.command() @click.option('--username', required=1, help="(Mandatory) User's name") @click.option('--password', required=1, help="(Mandatory) User's associated password") @click.option('--archive', help='(Optional) Software archive to deposit') @click.option('--metadata', help="(Optional) Path to xml metadata file. If not provided, this will use a file named .metadata.xml") # noqa @click.option('--archive-deposit/--no-archive-deposit', default=False, help='(Optional) Software archive only deposit') @click.option('--metadata-deposit/--no-metadata-deposit', default=False, help='(Optional) Metadata only deposit') @click.option('--collection', help="(Optional) User's collection. If not provided, this will be fetched.") # noqa @click.option('--slug', help="""(Optional) External system information identifier. If not provided, it will be generated""") # noqa @click.option('--partial/--no-partial', default=False, help='(Optional) The deposit will be partial, other deposits will have to take place to finalize it.') # noqa @click.option('--deposit-id', default=None, help='(Optional) Update an existing partial deposit with its identifier') # noqa @click.option('--replace/--no-replace', default=False, help='(Optional) Update by replacing existing metadata to a deposit') # noqa @click.option('--url', default='https://deposit.softwareheritage.org/1', help="(Optional) Deposit server api endpoint. By default, https://deposit.softwareheritage.org/1") # noqa @click.option('--status/--no-status', default=False, help="(Optional) Deposit's status") @click.option('--dry-run/--no-dry-run', default=False, help='(Optional) No-op deposit') @click.option('--verbose/--no-verbose', default=False, help='Verbose mode') @click.pass_context -def client(ctx, - username, password, archive=None, metadata=None, - archive_deposit=False, metadata_deposit=False, - collection=None, slug=None, partial=False, deposit_id=None, - replace=False, status=False, - url='https://deposit.softwareheritage.org/1', dry_run=True, - verbose=False): +def deposit(ctx, + username, password, archive=None, metadata=None, + archive_deposit=False, metadata_deposit=False, + collection=None, slug=None, partial=False, deposit_id=None, + replace=False, status=False, + url='https://deposit.softwareheritage.org/1', dry_run=True, + verbose=False): """Software Heritage Public Deposit Client Create/Update deposit through the command line or access its status. More documentation can be found at https://docs.softwareheritage.org/devel/swh-deposit/getting-started.html. """ logger = logging.getLogger(__name__) if dry_run: logger.info("**DRY RUN**") config = {} try: logger.debug('Parsing cli options') config = client_command_parse_input( username, password, archive, metadata, archive_deposit, metadata_deposit, collection, slug, partial, deposit_id, replace, url, status) except InputError as e: msg = 'Problem during parsing options: %s' % e r = { 'error': msg, } logger.info(r) return 1 if verbose: logger.info("Parsed configuration: %s" % ( config, )) deposit_id = config['deposit_id'] if status and deposit_id: r = deposit_status(config, dry_run, logger) elif not status and deposit_id: r = deposit_update(config, dry_run, logger) elif not status and not deposit_id: r = deposit_create(config, dry_run, logger) logger.info(r) def main(): return cli(auto_envvar_prefix='SWH_DEPOSIT') if __name__ == '__main__': main()