diff --git a/docs/uri-scheme-browse-snapshot.rst b/docs/uri-scheme-browse-snapshot.rst index 2ce74be0..5f24373a 100644 --- a/docs/uri-scheme-browse-snapshot.rst +++ b/docs/uri-scheme-browse-snapshot.rst @@ -1,258 +1,258 @@ Snapshot ^^^^^^^^ .. http:get:: /browse/snapshot/(snapshot_id)/ HTML view that displays the content of a snapshot from its identifier - (see :func:`swh.model.identifiers.snapshot_identifier` + (see :func:`swh.model.git_objects.snapshot_git_object` in our data model module for details about how they are computed). A snapshot is a set of named branches, which are pointers to objects at any level of the Software Heritage DAG. It represents a full picture of an origin at a given time. Thus, multiple visits of different origins can point to the same snapshot (for instance, when several projects are forks of a common one). Currently, that endpoint simply performs a redirection to :http:get:`/browse/snapshot/(snapshot_id)/directory/` in order to display the root directory associated to the default snapshot branch (usually master). :param string snapshot_id: hexadecimal representation of the snapshot **sha1** identifier :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive **Examples:** .. parsed-literal:: :swh_web_browse:`snapshot/baebc2109e4a2ec22a1129a3859647e191d04df4/` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/` Snapshot directory """""""""""""""""" .. http:get:: /browse/snapshot/(snapshot_id)/directory/ HTML view that displays the content of a directory reachable from a snapshot. The features offered by the view are similar to the one for browsing a directory in an origin context (see :http:get:`/browse/origin/(origin_url)/directory/[(path)/]`). :param string snapshot_id: hexadecimal representation of the snapshot **sha1** identifier :query string path: optional parameter used to specify the path of a directory reachable from the snapshot root one :query string branch: specify the snapshot branch name from which to retrieve the root directory :query string release: specify the snapshot release name from which to retrieve the root directory :query string revision: specify the snapshot revision, identified by the hexadecimal representation of its **sha1_git** value, from which to retrieve the root directory :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive **Examples:** .. parsed-literal:: :swh_web_browse:`snapshot/baebc2109e4a2ec22a1129a3859647e191d04df4/directory/?path=drivers/gpu` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/directory/?path=src/opengl` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/directory/?release=v5.7.0` .. http:get:: /browse/snapshot/(snapshot_id)/directory/(path)/ :deprecated: .. warning:: That endpoint is deprecated, use :http:get:`/browse/snapshot/(snapshot_id)/directory/` instead. HTML view that displays the content of a directory reachable from a snapshot. The features offered by the view are similar to the one for browsing a directory in an origin context (see :http:get:`/browse/origin/(origin_url)/directory/[(path)/]`). :param string snapshot_id: hexadecimal representation of the snapshot **sha1** identifier :param string path: optional parameter used to specify the path of a directory reachable from the snapshot root one :query string branch: specify the snapshot branch name from which to retrieve the root directory :query string release: specify the snapshot release name from which to retrieve the root directory :query string revision: specify the snapshot revision, identified by the hexadecimal representation of its **sha1_git** value, from which to retrieve the root directory :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive **Examples:** .. parsed-literal:: :swh_web_browse:`snapshot/baebc2109e4a2ec22a1129a3859647e191d04df4/directory/drivers/gpu/` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/directory/src/opengl/` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/directory/?release=v5.7.0` Snapshot content """""""""""""""" .. http:get:: /browse/snapshot/(snapshot_id)/content/ HTML view that produces a display of a content reachable from a snapshot. The features offered by the view are similar to the one for browsing a content in an origin context (see :http:get:`/browse/origin/(origin_url)/content/`). :param string snapshot_id: hexadecimal representation of the snapshot **sha1** identifier :query string path: path of a content reachable from the snapshot root directory :query string branch: specify the snapshot branch name from which to retrieve the content :query string release: specify the snapshot release name from which to retrieve the content :query string revision: specify the snapshot revision, identified by the hexadecimal representation of its **sha1_git** value, from which to retrieve the content :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive, or the provided content path does not exist from the origin root directory **Examples:** .. parsed-literal:: :swh_web_browse:`snapshot/baebc2109e4a2ec22a1129a3859647e191d04df4/content/?path=init/initramfs.c` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/content/?path=src/opengl/qglbuffer.h` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/content/?path=src/opengl/qglbuffer.h&?release=v5.0.0` .. http:get:: /browse/snapshot/(snapshot_id)/content/(path)/ :deprecated: .. warning:: That endpoint is deprecated, use :http:get:`/browse/snapshot/(snapshot_id)/content/` instead. HTML view that produces a display of a content reachable from a snapshot. The features offered by the view are similar to the one for browsing a content in an origin context (see :http:get:`/browse/origin/(origin_url)/content/(path)/`). :param string snapshot_id: hexadecimal representation of the snapshot **sha1** identifier :param string path: path of a content reachable from the snapshot root directory :query string branch: specify the snapshot branch name from which to retrieve the content :query string release: specify the snapshot release name from which to retrieve the content :query string revision: specify the snapshot revision, identified by the hexadecimal representation of its **sha1_git** value, from which to retrieve the content :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive, or the provided content path does not exist from the origin root directory **Examples:** .. parsed-literal:: :swh_web_browse:`snapshot/baebc2109e4a2ec22a1129a3859647e191d04df4/content/init/initramfs.c` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/content/src/opengl/qglbuffer.h/` :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/content/src/opengl/qglbuffer.h/?release=v5.0.0` Snapshot history """""""""""""""" .. http:get:: /browse/snapshot/(snapshot_id)/log/ HTML view that produces a display of revisions history (aka the commit log) heading to the last revision collected in a snapshot. The features offered by the view are similar to the one for browsing the history in an origin context (see :http:get:`/browse/origin/(origin_url)/log/`). :param string snapshot_id: hexadecimal representation of the snapshot **sha1** identifier :query int per_page: the number of log entries to display per page (default is 20, max is 50) :query string branch: specify the snapshot branch name from which to retrieve the commit log :query string release: specify the snapshot release name from which to retrieve the commit log :query string revision: specify the snapshot revision, identified by the hexadecimal representation of its **sha1_git** value, from which to retrieve the commit log :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive **Examples:** .. parsed-literal:: :swh_web_browse:`snapshot/a274b44111f777209556e94920b7e71cf5c305cd/log/` :swh_web_browse:`snapshot/9ca9e75279df5f4e3fee19bf5190ed672dcdfb33/log/?branch=refs/heads/emacs-unicode` Snapshot branches """"""""""""""""" .. http:get:: /browse/snapshot/(snapshot_id)/branches/ HTML view that produces a display of the list of branches collected in a snapshot. The features offered by the view are similar to the one for browsing the list of branches in an origin context (see :http:get:`/browse/origin/(origin_url)/branches/`). :param string snapshot_id: hexadecimal representation of the snapshot **sha1** identifier :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive **Examples:** .. parsed-literal:: :swh_web_browse:`snapshot/03d7897352541e78ee7b13a580dc836778e8126a/branches/` :swh_web_browse:`snapshot/f37563b953327f8fd83e39af6ebb929ef85103d5/branches/` Snapshot releases """"""""""""""""" .. http:get:: /browse/snapshot/(snapshot_id)/releases/ HTML view that produces a display of the list of releases collected in a snapshot. The features offered by the view are similar to the one for browsing the list of releases in an origin context (see :http:get:`/browse/origin/(origin_url)/releases/`). :param string snapshot_id: hexadecimal representation of the snapshot **sha1** identifier :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive **Examples:** .. parsed-literal:: :swh_web_browse:`snapshot/673156c31a876c5b99b2fe3e89615529de9a3c44/releases/` :swh_web_browse:`snapshot/23e6fb084a60cc909b9e222d80d89fdb98756dee/releases/` diff --git a/swh/web/api/views/metadata.py b/swh/web/api/views/metadata.py index 70d5a45e..353bf795 100644 --- a/swh/web/api/views/metadata.py +++ b/swh/web/api/views/metadata.py @@ -1,254 +1,254 @@ # Copyright (C) 2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import base64 import re import iso8601 from django.http import HttpResponse -from swh.model import hashutil, identifiers +from swh.model import hashutil, swhids from swh.model.model import MetadataAuthority, MetadataAuthorityType from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.common import archive, converters from swh.web.common.exc import BadInputExc, NotFoundExc from swh.web.common.utils import reverse SWHID_RE = "swh:1:[a-z]{3}:[0-9a-z]{40}" @api_route( f"/raw-extrinsic-metadata/swhid/(?P{SWHID_RE})/", "api-1-raw-extrinsic-metadata-swhid", ) @api_doc("/raw-extrinsic-metadata/swhid/") @format_docstring() def api_raw_extrinsic_metadata_swhid(request, target): """ .. http:get:: /api/1/raw-extrinsic-metadata/swhid/(target) Returns raw `extrinsic metadata `__ collected on a given object. :param string target: The core SWHID of the object whose metadata should be returned :query string authority: A metadata authority identifier, formatted as `` ``. Required. :query string after: An ISO representation of the minimum timestamp of metadata to fetch. Defaults to allowing all metadata. :query int limit: Maximum number of metadata objects to return. {common_headers} :>jsonarr string target: SWHID of the object described by this metadata :>jsonarr string discovery_date: ISO8601 timestamp of the moment this metadata was collected. :>jsonarr object authority: authority this metadata is coming from :>jsonarr object fetcher: tool used to fetch the metadata :>jsonarr string format: short identifier of the format of the metadata :>jsonarr string metadata_url: link to download the metadata "blob" itself :>jsonarr string origin: URL of the origin in which context's the metadata is valid, if any :>jsonarr int visit: identifier of the visit in which context's the metadata is valid, if any :>jsonarr string snapshot: SWHID of the snapshot in which context's the metadata is valid, if any :>jsonarr string release: SWHID of the release in which context's the metadata is valid, if any :>jsonarr string revision: SWHID of the revision in which context's the metadata is valid, if any :>jsonarr string path: SWHID of the path in which context's is valid if any, relative to a release or revision as anchor :>jsonarr string directory: SWHID of the directory in which context's the metadata is valid, if any :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`raw-extrinsic-metadata/swhid/swh:1:dir:a2faa28028657859c16ff506924212b33f0e1307/?authority=forge%20https://pypi.org/` """ # noqa authority_str: str = request.query_params.get("authority") after_str: str = request.query_params.get("after") limit_str: str = request.query_params.get("limit", "100") page_token_str: str = request.query_params.get("page_token") if not authority_str: raise BadInputExc("The 'authority' query parameter is required.") if " " not in authority_str.strip(): raise BadInputExc("The 'authority' query parameter should contain a space.") (authority_type_str, authority_url) = authority_str.split(" ", 1) try: authority_type = MetadataAuthorityType(authority_type_str) except ValueError: raise BadInputExc( f"Invalid 'authority' type, should be one of: " f"{', '.join(member.value for member in MetadataAuthorityType)}" ) authority = MetadataAuthority(authority_type, authority_url) if after_str: try: after = iso8601.parse_date(after_str) except iso8601.ParseError: raise BadInputExc("Invalid format for 'after' parameter.") from None else: after = None try: limit = int(limit_str) except ValueError: raise BadInputExc("'limit' parameter must be an integer.") from None limit = min(limit, 10000) try: - target = identifiers.CoreSWHID.from_string(target).to_extended() - except identifiers.ValidationError as e: + target = swhids.CoreSWHID.from_string(target).to_extended() + except swhids.ValidationError as e: raise BadInputExc(f"Invalid target SWHID: {e.args[0]}") from None if page_token_str: page_token = base64.urlsafe_b64decode(page_token_str) else: page_token = None result_page = archive.storage.raw_extrinsic_metadata_get( target=target, authority=authority, after=after, page_token=page_token, limit=limit, ) results = [] for metadata in result_page.results: result = converters.from_raw_extrinsic_metadata(metadata) # We can't reliably send metadata directly, because it is a bytestring, # and we have to return JSON documents. result["metadata_url"] = reverse( "api-1-raw-extrinsic-metadata-get", url_args={"id": hashutil.hash_to_hex(metadata.id)}, query_params={"filename": f"{target}_metadata"}, request=request, ) results.append(result) response = { "results": results, "headers": {}, } if result_page.next_page_token is not None: response["headers"]["link-next"] = reverse( "api-1-raw-extrinsic-metadata-swhid", url_args={"target": target}, query_params=dict( authority=authority_str, after=after_str, limit=limit_str, page_token=base64.urlsafe_b64encode( result_page.next_page_token.encode() ), ), request=request, ) return response @api_route( "/raw-extrinsic-metadata/get/(?P[0-9a-z]+)/", "api-1-raw-extrinsic-metadata-get", ) def api_raw_extrinsic_metadata_get(request, id): # This is an internal endpoint that should only be accessed via URLs given # by /raw-extrinsic-metadata/swhid/; so it is not documented. metadata = archive.storage.raw_extrinsic_metadata_get_by_ids( [hashutil.hash_to_bytes(id)] ) if not metadata: raise NotFoundExc( "Metadata not found. Use /raw-extrinsic-metadata/swhid/ to access metadata." ) response = HttpResponse( metadata[0].metadata, content_type="application/octet-stream" ) filename = request.query_params.get("filename") if filename and re.match("[a-zA-Z0-9:._-]+", filename): response["Content-disposition"] = f'attachment; filename="{filename}"' else: # It should always be not-None and match the regexp if the URL was created by # /raw-extrinsic-metadata/swhid/, but we're better safe than sorry. response["Content-disposition"] = "attachment" return response @api_route( f"/raw-extrinsic-metadata/swhid/(?P{SWHID_RE})/authorities/", "api-1-raw-extrinsic-metadata-swhid-authorities", ) @api_doc("/raw-extrinsic-metadata/swhid/authorities/") @format_docstring() def api_raw_extrinsic_metadata_swhid_authorities(request, target): """ .. http:get:: /api/1/raw-extrinsic-metadata/swhid/(target)/authorities/ Returns a list of metadata authorities that provided metadata on the given target. They can then be used to get the raw `extrinsic metadata `__ collected on that object from each of the authorities. :param string target: The core SWHID of the object whose metadata-providing authorities should be returned {common_headers} :>jsonarr string type: Type of authority (deposit_client, forge, registry) :>jsonarr string url: Unique IRI identifying the authority :>jsonarr object metadata_list_url: URL to get the list of metadata objects on the given object from this authority :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`raw-extrinsic-metadata/swhid/swh:1:dir:a2faa28028657859c16ff506924212b33f0e1307/authorities/` """ # noqa target_str = target try: - target = identifiers.CoreSWHID.from_string(target_str).to_extended() - except identifiers.ValidationError as e: + target = swhids.CoreSWHID.from_string(target_str).to_extended() + except swhids.ValidationError as e: raise BadInputExc(f"Invalid target SWHID: {e.args[0]}") from None authorities = archive.storage.raw_extrinsic_metadata_get_authorities(target=target) results = [ { **authority.to_dict(), "metadata_list_url": reverse( "api-1-raw-extrinsic-metadata-swhid", url_args={"target": target_str}, query_params={"authority": f"{authority.type.value} {authority.url}"}, request=request, ), } for authority in authorities ] return { "results": results, "headers": {}, } diff --git a/swh/web/api/views/revision.py b/swh/web/api/views/revision.py index 4d40d467..5a54b057 100644 --- a/swh/web/api/views/revision.py +++ b/swh/web/api/views/revision.py @@ -1,215 +1,215 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.http import HttpResponse from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup from swh.web.common import archive DOC_RETURN_REVISION = """ :>json object author: information about the author of the revision :>json object committer: information about the committer of the revision :>json string committer_date: ISO representation of the commit date (in UTC) :>json string date: ISO representation of the revision date (in UTC) :>json string directory: the unique identifier that revision points to :>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>json string id: the revision unique identifier :>json boolean merge: whether or not the revision corresponds to a merge commit :>json string message: the message associated to the revision :>json array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` to get more information about it :>json string type: the type of the revision """ DOC_RETURN_REVISION_ARRAY = DOC_RETURN_REVISION.replace(":>json", ":>jsonarr") @api_route( r"/revision/(?P[0-9a-f]+)/", "api-1-revision", checksum_args=["sha1_git"] ) @api_doc("/revision/") @format_docstring(return_revision=DOC_RETURN_REVISION) def api_revision(request, sha1_git): """ .. http:get:: /api/1/revision/(sha1_git)/ Get information about a revision in the archive. Revisions are identified by **sha1** checksums, compatible with Git commit identifiers. - See :func:`swh.model.identifiers.revision_identifier` in our data model + See :func:`swh.model.git_objects.revision_git_object` in our data model module for details about how they are computed. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier {common_headers} {return_revision} :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/` """ return api_lookup( archive.lookup_revision, sha1_git, notfound_msg="Revision with sha1_git {} not found.".format(sha1_git), enrich_fn=utils.enrich_revision, request=request, ) @api_route( r"/revision/(?P[0-9a-f]+)/raw/", "api-1-revision-raw-message", checksum_args=["sha1_git"], ) @api_doc("/revision/raw/", tags=["hidden"]) def api_revision_raw_message(request, sha1_git): """Return the raw data of the message of revision identified by sha1_git """ raw = archive.lookup_revision_message(sha1_git) response = HttpResponse(raw["message"], content_type="application/octet-stream") response["Content-disposition"] = "attachment;filename=rev_%s_raw" % sha1_git return response @api_route( r"/revision/(?P[0-9a-f]+)/directory/", "api-1-revision-directory", checksum_args=["sha1_git"], ) @api_route( r"/revision/(?P[0-9a-f]+)/directory/(?P.+)/", "api-1-revision-directory", checksum_args=["sha1_git"], ) @api_doc("/revision/directory/") @format_docstring() def api_revision_directory(request, sha1_git, dir_path=None, with_data=False): """ .. http:get:: /api/1/revision/(sha1_git)/directory/[(path)/] Get information about directory (entry) objects associated to revisions. Each revision is associated to a single "root" directory. This endpoint behaves like :http:get:`/api/1/directory/(sha1_git)/[(path)/]`, but operates on the root directory associated to a given revision. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier :param string path: optional parameter to get information about the directory entry pointed by that relative path {common_headers} :>json array content: directory entries as returned by :http:get:`/api/1/directory/(sha1_git)/[(path)/]` :>json string path: path of directory from the revision root one :>json string revision: the unique revision identifier :>json string type: the type of the directory :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/directory/` """ rev_id, result = archive.lookup_directory_through_revision( {"sha1_git": sha1_git}, dir_path, with_data=with_data ) content = result["content"] if result["type"] == "dir": # dir_entries result["content"] = [ utils.enrich_directory_entry(entry, request=request) for entry in content ] elif result["type"] == "file": # content result["content"] = utils.enrich_content(content, request=request) elif result["type"] == "rev": # revision result["content"] = utils.enrich_revision(content, request=request) return result @api_route( r"/revision/(?P[0-9a-f]+)/log/", "api-1-revision-log", checksum_args=["sha1_git"], ) @api_doc("/revision/log/") @format_docstring(return_revision_array=DOC_RETURN_REVISION_ARRAY) def api_revision_log(request, sha1_git): """ .. http:get:: /api/1/revision/(sha1_git)/log/ Get a list of all revisions heading to a given one, in other words show the commit log. The revisions are returned in the breadth-first search order while visiting the revision graph. The number of revisions to return is also bounded by the **limit** query parameter. .. warning:: To get the full BFS traversal of the revision graph when the total number of revisions is greater than 1000, it is up to the client to keep track of the multiple branches of history when there's merge revisions in the returned objects. In other words, identify all the continuation points that need to be followed to get the full history through recursion. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier :query int limit: maximum number of revisions to return when performing BFS traversal on the revision graph (default to 10, can not exceed 1000) {common_headers} {return_revision_array} :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: head revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/e1a315fa3fa734e2a6154ed7b5b9ae0eb8987aad/log/` """ limit = int(request.query_params.get("limit", "10")) limit = min(limit, 1000) error_msg = "Revision with sha1_git %s not found." % sha1_git revisions = api_lookup( archive.lookup_revision_log, sha1_git, limit, notfound_msg=error_msg, enrich_fn=utils.enrich_revision, request=request, ) return {"results": revisions}