diff --git a/docs/developers-info.rst b/docs/developers-info.rst index f9f35bb2..5909d3a6 100644 --- a/docs/developers-info.rst +++ b/docs/developers-info.rst @@ -1,127 +1,127 @@ Developers Information ====================== Sample configuration -------------------- The configuration will be taken from the default configuration file: ``~/.config/swh/web/web.yml``. The following introduces a default configuration file: .. sourcecode:: yaml storage: cls: remote args: url: http://localhost:5002 debug: false throttling: cache_uri: None scopes: swh_api: limiter_rate: default: 120/h exempted_networks: - 127.0.0.0/8 Run server ---------- Either use the django manage script directly (useful in development mode as it offers various commands): .. sourcecode:: shell $ python3 -m swh.web.manage runserver or use the following shortcut: .. sourcecode:: shell $ make run Modules description ------------------- Common to all web applications ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Configuration and settings """""""""""""""""""""""""" * :mod:`swh.web.config`: holds the configuration for the web applications. * :mod:`swh.web.doc_config`: utility module used to extend the sphinx configuration when building the documentation. * :mod:`swh.web.manage`: Django management module for developers. * :mod:`swh.web.urls`: module that holds the whole URI scheme of all the web applications. * :mod:`swh.web.settings.common`: Common Django settings * :mod:`swh.web.settings.development`: Django settings for development * :mod:`swh.web.settings.production`: Django settings for production * :mod:`swh.web.settings.tests`: Django settings for tests Common utilities """""""""""""""" * :mod:`swh.web.common.converters`: conversion module used to transform raw data - to serializable ones. It is used by :mod:`swh.web.common.service`: to convert data + to serializable ones. It is used by :mod:`swh.web.common.archive`: to convert data before transmitting then to Django views. * :mod:`swh.web.common.exc`: module defining exceptions used in the web applications. * :mod:`swh.web.common.highlightjs`: utility module to ease the use of the highlightjs_ library in produced Django views. * :mod:`swh.web.common.query`: Utilities to parse data from HTTP endpoints. It is used - by :mod:`swh.web.common.service`. - * :mod:`swh.web.common.service`: Orchestration layer used by views module + by :mod:`swh.web.common.archive`. + * :mod:`swh.web.common.archive`: Orchestration layer used by views module in charge of communication with :mod:`swh.storage` to retrieve information and perform conversion for the upper layer. * :mod:`swh.web.common.swh_templatetags`: Custom Django template tags library for swh. * :mod:`swh.web.common.urlsindex`: Utilities to help the registering of endpoints for the web applications * :mod:`swh.web.common.utils`: Utility functions used in the web applications implementation swh-web API application ^^^^^^^^^^^^^^^^^^^^^^^ * :mod:`swh.web.api.apidoc`: Utilities to document the web api for its html browsable rendering. * :mod:`swh.web.api.apiresponse`: Utility module to ease the generation of web api responses. * :mod:`swh.web.api.apiurls`: Utilities to facilitate the registration of web api endpoints. * :mod:`swh.web.api.throttling`: Custom request rate limiter to use with the `Django REST Framework `_ * :mod:`swh.web.api.urls`: Module that defines the whole URI scheme for the api endpoints * :mod:`swh.web.api.utils`: Utility functions used in the web api implementation. * :mod:`swh.web.api.views.content`: Implementation of API endpoints for getting information about contents. * :mod:`swh.web.api.views.directory`: Implementation of API endpoints for getting information about directories. * :mod:`swh.web.api.views.origin`: Implementation of API endpoints for getting information about origins. * :mod:`swh.web.api.views.person`: Implementation of API endpoints for getting information about persons. * :mod:`swh.web.api.views.release`: Implementation of API endpoints for getting information about releases. * :mod:`swh.web.api.views.revision`: Implementation of API endpoints for getting information about revisions. * :mod:`swh.web.api.views.snapshot`: Implementation of API endpoints for getting information about snapshots. * :mod:`swh.web.api.views.stat`: Implementation of API endpoints for getting information about archive statistics. * :mod:`swh.web.api.views.utils`: Utilities used in the web api endpoints implementation. swh-web browse application ^^^^^^^^^^^^^^^^^^^^^^^^^^ * :mod:`swh.web.browse.browseurls`: Utilities to facilitate the registration of browse endpoints. * :mod:`swh.web.browse.urls`: Module that defines the whole URI scheme for the browse endpoints. * :mod:`swh.web.browse.utils`: Utilities functions used throughout the browse endpoints implementation. * :mod:`swh.web.browse.views.content`: Implementation of endpoints for browsing contents. * :mod:`swh.web.browse.views.directory`: Implementation of endpoints for browsing directories. * :mod:`swh.web.browse.views.identifiers`: Implementation of endpoints for browsing objects through :ref:`persistent-identifiers`. * :mod:`swh.web.browse.views.origin`: Implementation of endpoints for browsing origins. * :mod:`swh.web.browse.views.person`: Implementation of endpoints for browsing persons. * :mod:`swh.web.browse.views.release`: Implementation of endpoints for browsing releases. * :mod:`swh.web.browse.views.revision`: Implementation of endpoints for browsing revisions. * :mod:`swh.web.browse.views.snapshot`: Implementation of endpoints for browsing snapshots. .. _highlightjs: https://highlightjs.org/ diff --git a/swh/web/api/views/content.py b/swh/web/api/views/content.py index 7e9a77e0..39c85547 100644 --- a/swh/web/api/views/content.py +++ b/swh/web/api/views/content.py @@ -1,409 +1,409 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import functools from django.http import HttpResponse from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import NotFoundExc from swh.web.common.utils import reverse @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/filetype/", "api-1-content-filetype", checksum_args=["q"], ) @api_doc("/content/filetype/") @format_docstring() def api_content_filetype(request, q): """ .. http:get:: /api/1/content/[(hash_type):](hash)/filetype/ Get information about the detected MIME type of a content object. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is `sha1`. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. :>json object content_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/` for getting information about the content :>json string encoding: the detected content encoding :>json string id: the **sha1** identifier of the content :>json string mimetype: the detected MIME type of the content :>json object tool: information about the tool used to detect the content filetype {common_headers} :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/filetype/` """ return api_lookup( - service.lookup_content_filetype, + archive.lookup_content_filetype, q, notfound_msg="No filetype information found for content {}.".format(q), enrich_fn=utils.enrich_metadata_endpoint, request=request, ) @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/language/", "api-1-content-language", checksum_args=["q"], ) @api_doc("/content/language/") @format_docstring() def api_content_language(request, q): """ .. http:get:: /api/1/content/[(hash_type):](hash)/language/ Get information about the programming language used in a content object. Note: this endpoint currently returns no data. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. :>json object content_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/` for getting information about the content :>json string id: the **sha1** identifier of the content :>json string lang: the detected programming language if any :>json object tool: information about the tool used to detect the programming language {common_headers} :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/language/` """ return api_lookup( - service.lookup_content_language, + archive.lookup_content_language, q, notfound_msg="No language information found for content {}.".format(q), enrich_fn=utils.enrich_metadata_endpoint, request=request, ) @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/license/", "api-1-content-license", checksum_args=["q"], ) @api_doc("/content/license/") @format_docstring() def api_content_license(request, q): """ .. http:get:: /api/1/content/[(hash_type):](hash)/license/ Get information about the license of a content object. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. :>json object content_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/` for getting information about the content :>json string id: the **sha1** identifier of the content :>json array licenses: array of strings containing the detected license names :>json object tool: information about the tool used to detect the license {common_headers} :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/license/` """ return api_lookup( - service.lookup_content_license, + archive.lookup_content_license, q, notfound_msg="No license information found for content {}.".format(q), enrich_fn=utils.enrich_metadata_endpoint, request=request, ) @api_route(r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/ctags/", "api-1-content-ctags") @api_doc("/content/ctags/", tags=["hidden"]) def api_content_ctags(request, q): """ Get information about all `Ctags `_-style symbols defined in a content object. """ return api_lookup( - service.lookup_content_ctags, + archive.lookup_content_ctags, q, notfound_msg="No ctags symbol found for content {}.".format(q), enrich_fn=utils.enrich_metadata_endpoint, request=request, ) @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/raw/", "api-1-content-raw", checksum_args=["q"], ) @api_doc("/content/raw/", handle_response=True) def api_content_raw(request, q): """ .. http:get:: /api/1/content/[(hash_type):](hash)/raw/ Get the raw content of a content object (aka a "blob"), as a byte sequence. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. :query string filename: if provided, the downloaded content will get that filename :resheader Content-Type: application/octet-stream :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/raw/` """ def generate(content): yield content["data"] - content_raw = service.lookup_content_raw(q) + content_raw = archive.lookup_content_raw(q) if not content_raw: raise NotFoundExc("Content %s is not found." % q) filename = request.query_params.get("filename") if not filename: filename = "content_%s_raw" % q.replace(":", "_") response = HttpResponse( generate(content_raw), content_type="application/octet-stream" ) response["Content-disposition"] = "attachment; filename=%s" % filename return response @api_route(r"/content/symbol/(?P.+)/", "api-1-content-symbol") @api_doc("/content/symbol/", tags=["hidden"]) def api_content_symbol(request, q=None): """Search content objects by `Ctags `_-style symbol (e.g., function name, data type, method, ...). """ result = {} last_sha1 = request.query_params.get("last_sha1", None) per_page = int(request.query_params.get("per_page", "10")) def lookup_exp(exp, last_sha1=last_sha1, per_page=per_page): - exp = list(service.lookup_expression(exp, last_sha1, per_page)) + exp = list(archive.lookup_expression(exp, last_sha1, per_page)) return exp if exp else None symbols = api_lookup( lookup_exp, q, notfound_msg="No indexed raw content match expression '{}'.".format(q), enrich_fn=functools.partial(utils.enrich_content, top_url=True), request=request, ) if symbols: nb_symbols = len(symbols) if nb_symbols == per_page: query_params = {} new_last_sha1 = symbols[-1]["sha1"] query_params["last_sha1"] = new_last_sha1 if request.query_params.get("per_page"): query_params["per_page"] = per_page result["headers"] = { "link-next": reverse( "api-1-content-symbol", url_args={"q": q}, query_params=query_params, request=request, ) } result.update({"results": symbols}) return result @api_route(r"/content/known/search/", "api-1-content-known", methods=["POST"]) @api_route(r"/content/known/(?P(?!search).*)/", "api-1-content-known") @api_doc("/content/known/", tags=["hidden"]) @format_docstring() def api_check_content_known(request, q=None): """ .. http:get:: /api/1/content/known/(sha1)[,(sha1), ...,(sha1)]/ Check whether some content(s) (aka "blob(s)") is present in the archive based on its **sha1** checksum. :param string sha1: hexadecimal representation of the **sha1** checksum value for the content to check existence. Multiple values can be provided separated by ','. {common_headers} :>json array search_res: array holding the search result for each provided **sha1** :>json object search_stats: some statistics regarding the number of **sha1** provided and the percentage of those found in the archive :statuscode 200: no error :statuscode 400: an invalid **sha1** has been provided **Example:** .. parsed-literal:: :swh_web_api:`content/known/dc2830a9e72f23c1dfebef4413003221baa5fb62,0c3f19cb47ebfbe643fb19fa94c874d18fa62d12/` """ response = {"search_res": None, "search_stats": None} search_stats = {"nbfiles": 0, "pct": 0} search_res = None queries = [] # GET: Many hash separated values request if q: hashes = q.split(",") for v in hashes: queries.append({"filename": None, "sha1": v}) # POST: Many hash requests in post form submission elif request.method == "POST": data = request.data # Remove potential inputs with no associated value for k, v in data.items(): if v is not None: if k == "q" and len(v) > 0: queries.append({"filename": None, "sha1": v}) elif v != "": queries.append({"filename": k, "sha1": v}) if queries: - lookup = service.lookup_multiple_hashes(queries) + lookup = archive.lookup_multiple_hashes(queries) result = [] nb_queries = len(queries) for el in lookup: res_d = {"sha1": el["sha1"], "found": el["found"]} if "filename" in el and el["filename"]: res_d["filename"] = el["filename"] result.append(res_d) search_res = result nbfound = len([x for x in lookup if x["found"]]) search_stats["nbfiles"] = nb_queries search_stats["pct"] = (nbfound / nb_queries) * 100 response["search_res"] = search_res response["search_stats"] = search_stats return response @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/", "api-1-content", checksum_args=["q"] ) @api_doc("/content/") @format_docstring() def api_content_metadata(request, q): """ .. http:get:: /api/1/content/[(hash_type):](hash)/ Get information about a content (aka a "blob") object. In the archive, a content object is identified based on checksum values computed using various hashing algorithms. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. {common_headers} :>json object checksums: object holding the computed checksum values for the requested content :>json string data_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/raw/` for downloading the content raw bytes :>json string filetype_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/filetype/` for getting information about the content MIME type :>json string language_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/language/` for getting information about the programming language used in the content :>json number length: length of the content in bytes :>json string license_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/license/` for getting information about the license of the content :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1_git:fe95a46679d128ff167b7c55df5d02356c5a1ae1/` """ return api_lookup( - service.lookup_content, + archive.lookup_content, q, notfound_msg="Content with {} not found.".format(q), enrich_fn=functools.partial(utils.enrich_content, query_string=q), request=request, ) diff --git a/swh/web/api/views/directory.py b/swh/web/api/views/directory.py index afc51e97..73668262 100644 --- a/swh/web/api/views/directory.py +++ b/swh/web/api/views/directory.py @@ -1,93 +1,93 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import service +from swh.web.common import archive @api_route( r"/directory/(?P[0-9a-f]+)/", "api-1-directory", checksum_args=["sha1_git"], ) @api_route( r"/directory/(?P[0-9a-f]+)/(?P.+)/", "api-1-directory", checksum_args=["sha1_git"], ) @api_doc("/directory/") @format_docstring() def api_directory(request, sha1_git, path=None): """ .. http:get:: /api/1/directory/(sha1_git)/[(path)/] Get information about directory objects. Directories are identified by **sha1** checksums, compatible with Git directory identifiers. See :func:`swh.model.identifiers.directory_identifier` in our data model module for details about how they are computed. When given only a directory identifier, this endpoint returns information about the directory itself, returning its content (usually a list of directory entries). When given a directory identifier and a path, this endpoint returns information about the directory entry pointed by the relative path, starting path resolution from the given directory. :param string sha1_git: hexadecimal representation of the directory **sha1_git** identifier :param string path: optional parameter to get information about the directory entry pointed by that relative path {common_headers} :>jsonarr object checksums: object holding the computed checksum values for a directory entry (only for file entries) :>jsonarr string dir_id: **sha1_git** identifier of the requested directory :>jsonarr number length: length of a directory entry in bytes (only for file entries) for getting information about the content MIME type :>jsonarr string name: the directory entry name :>jsonarr number perms: permissions for the directory entry :>jsonarr string target: **sha1_git** identifier of the directory entry :>jsonarr string target_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/` or :http:get:`/api/1/directory/(sha1_git)/[(path)/]` depending on the directory entry type :>jsonarr string type: the type of the directory entry, can be either ``dir``, ``file`` or ``rev`` :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested directory can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`directory/977fc4b98c0e85816348cebd3b12026407c368b6/` """ if path: error_msg_path = ( "Entry with path %s relative to directory " "with sha1_git %s not found." ) % (path, sha1_git) return api_lookup( - service.lookup_directory_with_path, + archive.lookup_directory_with_path, sha1_git, path, notfound_msg=error_msg_path, enrich_fn=utils.enrich_directory, request=request, ) else: error_msg_nopath = "Directory with sha1_git %s not found." % sha1_git return api_lookup( - service.lookup_directory, + archive.lookup_directory, sha1_git, notfound_msg=error_msg_nopath, enrich_fn=utils.enrich_directory, request=request, ) diff --git a/swh/web/api/views/graph.py b/swh/web/api/views/graph.py index 7a18676c..e91b7903 100644 --- a/swh/web/api/views/graph.py +++ b/swh/web/api/views/graph.py @@ -1,144 +1,144 @@ # Copyright (C) 2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from distutils.util import strtobool import json from typing import Dict import requests from rest_framework.decorators import renderer_classes from rest_framework.request import Request from rest_framework.response import Response from swh.model.identifiers import ORIGIN, parse_swhid from swh.web.api.apidoc import api_doc from swh.web.api.apiurls import api_route from swh.web.api.renderers import PlainTextRenderer -from swh.web.common import service +from swh.web.common import archive from swh.web.config import get_config API_GRAPH_PERM = "swh.web.api.graph" def _resolve_origin_swhid(swhid: str, origin_urls: Dict[str, str]) -> str: """ Resolve origin url from its swhid sha1 representation. """ parsed_swhid = parse_swhid(swhid) if parsed_swhid.object_type == ORIGIN: if parsed_swhid.object_id in origin_urls: return origin_urls[parsed_swhid.object_id] else: origin_info = list( - service.lookup_origins_by_sha1s([parsed_swhid.object_id]) + archive.lookup_origins_by_sha1s([parsed_swhid.object_id]) )[0] assert origin_info is not None origin_urls[parsed_swhid.object_id] = origin_info["url"] return origin_info["url"] else: return swhid def _resolve_origin_swhids_in_graph_response(response: requests.Response) -> str: """ Resolve origin urls from their swhid sha1 representations in graph service responses. """ content_type = response.headers["Content-Type"] origin_urls: Dict[str, str] = {} if content_type == "application/x-ndjson": processed_response = [] for line in response.text.split("\n")[:-1]: swhids = json.loads(line) processed_line = [] for swhid in swhids: processed_line.append(_resolve_origin_swhid(swhid, origin_urls)) processed_response.append(json.dumps(processed_line)) return "\n".join(processed_response) + "\n" elif content_type == "text/plain": processed_response = [] for line in response.text.split("\n")[:-1]: processed_line = [] swhids = line.split(" ") for swhid in swhids: processed_line.append(_resolve_origin_swhid(swhid, origin_urls)) processed_response.append(" ".join(processed_line)) return "\n".join(processed_response) + "\n" return response.text @api_route(r"/graph/", "api-1-graph-doc") @api_doc("/graph/") def api_graph(request: Request) -> None: """ .. http:get:: /api/1/graph/(graph_query)/ Provide fast access to the graph representation of the Software Heritage archive. That endpoint acts as a proxy for the `Software Heritage Graph service `_. It provides fast access to the `graph representation `_ of the Software Heritage archive. The full documentation of the available Graph REST API can be found `here `_. .. warning:: That endpoint is not publicly available and requires authentication and special user permission in order to be able to request it. :param string graph_query: query to forward to the Software Heritage Graph - service (see its `documentation + archive (see its `documentation `_) :query boolean resolve_origins: extra parameter defined by that proxy enabling to resolve origin urls from their sha1 representations :statuscode 200: no error :statuscode 400: an invalid graph query has been provided :statuscode 404: provided graph node cannot be found **Examples:** .. parsed-literal:: :swh_web_api:`graph/leaves/swh:1:dir:432d1b21c1256f7408a07c577b6974bbdbcc1323/` :swh_web_api:`graph/neighbors/swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35/` :swh_web_api:`graph/randomwalk/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2/ori?direction=backward` :swh_web_api:`graph/randomwalk/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2/ori?direction=backward&limit=-2` :swh_web_api:`graph/visit/nodes/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc?direction=backward&resolve_origins=true` :swh_web_api:`graph/visit/edges/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc?direction=backward&resolve_origins=true` :swh_web_api:`graph/visit/paths/swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb?direction=backward&resolve_origins=true` """ return None @api_route(r"/graph/(?P.+)/", "api-1-graph") @renderer_classes([PlainTextRenderer]) def api_graph_proxy(request: Request, graph_query: str) -> Response: if not bool(request.user and request.user.is_authenticated): return Response("Authentication credentials were not provided.", status=401) if not request.user.has_perm(API_GRAPH_PERM): return Response( "You do not have permission to perform this action.", status=403 ) graph_query_url = get_config()["graph"]["server_url"] graph_query_url += graph_query if request.GET: graph_query_url += "?" + request.GET.urlencode(safe="/;:") response = requests.get(graph_query_url) response_text = response.text resolve_origins = strtobool(request.GET.get("resolve_origins", "false")) if response.status_code == 200 and resolve_origins: response_text = _resolve_origin_swhids_in_graph_response(response) return Response( response_text, status=response.status_code, content_type=response.headers["Content-Type"], ) diff --git a/swh/web/api/views/identifiers.py b/swh/web/api/views/identifiers.py index af4497f9..c0bcf8aa 100644 --- a/swh/web/api/views/identifiers.py +++ b/swh/web/api/views/identifiers.py @@ -1,112 +1,112 @@ # Copyright (C) 2018-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import LargePayloadExc from swh.web.common.identifiers import get_swhid, group_swhids, resolve_swhid @api_route(r"/resolve/(?P.*)/", "api-1-resolve-swhid") @api_doc("/resolve/") @format_docstring() def api_resolve_swhid(request, swhid): """ .. http:get:: /api/1/resolve/(swhid)/ Resolve a SoftWare Heritage persistent IDentifier (SWHID) Try to resolve a provided `SoftWare Heritage persistent IDentifier `_ into an url for browsing the pointed archive object. If the provided identifier is valid, the existence of the object in the archive will also be checked. :param string swhid: a SoftWare Heritage persistent IDentifier :>json string browse_url: the url for browsing the pointed object :>json object metadata: object holding optional parts of the SWHID :>json string namespace: the SWHID namespace :>json string object_id: the hash identifier of the pointed object :>json string object_type: the type of the pointed object :>json number scheme_version: the scheme version of the SWHID {common_headers} :statuscode 200: no error :statuscode 400: an invalid SWHID has been provided :statuscode 404: the pointed object does not exist in the archive **Example:** .. parsed-literal:: :swh_web_api:`resolve/swh:1:rev:96db9023b881d7cd9f379b0c154650d6c108e9a3;origin=https://github.com/openssl/openssl/` """ # try to resolve the provided swhid swhid_resolved = resolve_swhid(swhid) # id is well-formed, now check that the pointed # object is present in the archive, NotFoundExc # will be raised otherwise swhid_parsed = swhid_resolved["swhid_parsed"] object_type = swhid_parsed.object_type object_id = swhid_parsed.object_id - service.lookup_object(object_type, object_id) + archive.lookup_object(object_type, object_id) # id is well-formed and the pointed object exists swhid_data = swhid_parsed.to_dict() swhid_data["browse_url"] = request.build_absolute_uri(swhid_resolved["browse_url"]) return swhid_data @api_route(r"/known/", "api-1-known", methods=["POST"]) @api_doc("/known/") @format_docstring() def api_swhid_known(request): """ .. http:post:: /api/1/known/ Check if a list of objects are present in the Software Heritage archive. The objects to check existence must be provided using `SoftWare Heritage persistent IDentifiers `_. :json object : an object whose keys are input SWHIDs and values objects with the following keys: * **known (bool)**: whether the object was found {common_headers} :statuscode 200: no error :statuscode 400: an invalid SWHID was provided :statuscode 413: the input array of SWHIDs is too large """ limit = 1000 if len(request.data) > limit: raise LargePayloadExc( "The maximum number of SWHIDs this endpoint can receive is %s" % limit ) swhids = [get_swhid(swhid) for swhid in request.data] response = {str(swhid): {"known": False} for swhid in swhids} # group swhids by their type swhids_by_type = group_swhids(swhids) # search for hashes not present in the storage - missing_hashes = service.lookup_missing_hashes(swhids_by_type) + missing_hashes = archive.lookup_missing_hashes(swhids_by_type) for swhid in swhids: if swhid.object_id not in missing_hashes: response[str(swhid)]["known"] = True return response diff --git a/swh/web/api/views/origin.py b/swh/web/api/views/origin.py index d5865fbd..c76e0b35 100644 --- a/swh/web/api/views/origin.py +++ b/swh/web/api/views/origin.py @@ -1,468 +1,468 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from distutils.util import strtobool from functools import partial from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.utils import enrich_origin, enrich_origin_visit from swh.web.api.views.utils import api_lookup -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import BadInputExc from swh.web.common.origin_visits import get_origin_visits from swh.web.common.utils import reverse DOC_RETURN_ORIGIN = """ :>json string origin_visits_url: link to in order to get information about the visits for that origin :>json string url: the origin canonical url """ DOC_RETURN_ORIGIN_ARRAY = DOC_RETURN_ORIGIN.replace(":>json", ":>jsonarr") DOC_RETURN_ORIGIN_VISIT = """ :>json string date: ISO representation of the visit date (in UTC) :>json str origin: the origin canonical url :>json string origin_url: link to get information about the origin :>jsonarr string snapshot: the snapshot identifier of the visit (may be null if status is not **full**). :>jsonarr string snapshot_url: link to :http:get:`/api/1/snapshot/(snapshot_id)/` in order to get information about the snapshot of the visit (may be null if status is not **full**). :>json string status: status of the visit (either **full**, **partial** or **ongoing**) :>json number visit: the unique identifier of the visit """ DOC_RETURN_ORIGIN_VISIT_ARRAY = DOC_RETURN_ORIGIN_VISIT.replace(":>json", ":>jsonarr") DOC_RETURN_ORIGIN_VISIT_ARRAY += """ :>jsonarr number id: the unique identifier of the origin :>jsonarr string origin_visit_url: link to :http:get:`/api/1/origin/(origin_url)/visit/(visit_id)/` in order to get information about the visit """ @api_route(r"/origins/", "api-1-origins") @api_doc("/origins/", noargs=True) @format_docstring(return_origin_array=DOC_RETURN_ORIGIN_ARRAY) def api_origins(request): """ .. http:get:: /api/1/origins/ Get list of archived software origins. .. warning:: This endpoint used to provide an `origin_from` query parameter, and guarantee an order on results. This is no longer true, and only the Link header should be used for paginating through results. :query int origin_count: The maximum number of origins to return (default to 100, can not exceed 10000) {return_origin_array} {common_headers} {resheader_link} :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`origins?origin_count=500` """ old_param_origin_from = request.query_params.get("origin_from") if old_param_origin_from: raise BadInputExc("Please use the Link header to browse through result") page_token = request.query_params.get("page_token", None) limit = min(int(request.query_params.get("origin_count", "100")), 10000) - page_result = service.lookup_origins(page_token, limit) + page_result = archive.lookup_origins(page_token, limit) origins = [enrich_origin(o, request=request) for o in page_result.results] next_page_token = page_result.next_page_token response = {"results": origins, "headers": {}} if next_page_token is not None: response["headers"]["link-next"] = reverse( "api-1-origins", query_params={"page_token": next_page_token, "origin_count": limit}, request=request, ) return response @api_route(r"/origin/(?P.+)/get/", "api-1-origin") @api_doc("/origin/") @format_docstring(return_origin=DOC_RETURN_ORIGIN) def api_origin(request, origin_url): """ .. http:get:: /api/1/origin/(origin_url)/get/ Get information about a software origin. :param string origin_url: the origin url {return_origin} {common_headers} :statuscode 200: no error :statuscode 404: requested origin can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/python/cpython/get/` """ ori_dict = {"url": origin_url} error_msg = "Origin with url %s not found." % ori_dict["url"] return api_lookup( - service.lookup_origin, + archive.lookup_origin, ori_dict, notfound_msg=error_msg, enrich_fn=enrich_origin, request=request, ) @api_route( r"/origin/search/(?P.+)/", "api-1-origin-search", throttle_scope="swh_api_origin_search", ) @api_doc("/origin/search/") @format_docstring(return_origin_array=DOC_RETURN_ORIGIN_ARRAY) def api_origin_search(request, url_pattern): """ .. http:get:: /api/1/origin/search/(url_pattern)/ Search for software origins whose urls contain a provided string pattern or match a provided regular expression. The search is performed in a case insensitive way. .. warning:: This endpoint used to provide an `offset` query parameter, and guarantee an order on results. This is no longer true, and only the Link header should be used for paginating through results. :param string url_pattern: a string pattern :query int limit: the maximum number of found origins to return (bounded to 1000) :query boolean with_visit: if true, only return origins with at least one visit by Software heritage {return_origin_array} {common_headers} {resheader_link} :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`origin/search/python/?limit=2` """ result = {} limit = min(int(request.query_params.get("limit", "70")), 1000) page_token = request.query_params.get("page_token") with_visit = request.query_params.get("with_visit", "false") (results, page_token) = api_lookup( - service.search_origin, + archive.search_origin, url_pattern, limit, bool(strtobool(with_visit)), page_token, enrich_fn=enrich_origin, request=request, ) if page_token is not None: query_params = {} query_params["limit"] = limit query_params["page_token"] = page_token result["headers"] = { "link-next": reverse( "api-1-origin-search", url_args={"url_pattern": url_pattern}, query_params=query_params, request=request, ) } result.update({"results": results}) return result @api_route(r"/origin/metadata-search/", "api-1-origin-metadata-search") @api_doc("/origin/metadata-search/", noargs=True) @format_docstring(return_origin_array=DOC_RETURN_ORIGIN_ARRAY) def api_origin_metadata_search(request): """ .. http:get:: /api/1/origin/metadata-search/ Search for software origins whose metadata (expressed as a JSON-LD/CodeMeta dictionary) match the provided criteria. For now, only full-text search on this dictionary is supported. :query str fulltext: a string that will be matched against origin metadata; results are ranked and ordered starting with the best ones. :query int limit: the maximum number of found origins to return (bounded to 100) {return_origin_array} {common_headers} :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`origin/metadata-search/?limit=2&fulltext=Jane%20Doe` """ fulltext = request.query_params.get("fulltext", None) limit = min(int(request.query_params.get("limit", "70")), 100) if not fulltext: content = '"fulltext" must be provided and non-empty.' raise BadInputExc(content) results = api_lookup( - service.search_origin_metadata, fulltext, limit, request=request + archive.search_origin_metadata, fulltext, limit, request=request ) return { "results": results, } @api_route(r"/origin/(?P.*)/visits/", "api-1-origin-visits") @api_doc("/origin/visits/") @format_docstring(return_origin_visit_array=DOC_RETURN_ORIGIN_VISIT_ARRAY) def api_origin_visits(request, origin_url): """ .. http:get:: /api/1/origin/(origin_url)/visits/ Get information about all visits of a software origin. Visits are returned sorted in descending order according to their date. :param str origin_url: a software origin URL :query int per_page: specify the number of visits to list, for pagination purposes :query int last_visit: visit to start listing from, for pagination purposes {common_headers} {resheader_link} {return_origin_visit_array} :statuscode 200: no error :statuscode 404: requested origin can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/hylang/hy/visits/` """ result = {} origin_query = {"url": origin_url} notfound_msg = "No origin {} found".format(origin_url) url_args_next = {"origin_url": origin_url} per_page = int(request.query_params.get("per_page", "10")) last_visit = request.query_params.get("last_visit") if last_visit: last_visit = int(last_visit) def _lookup_origin_visits(origin_query, last_visit=last_visit, per_page=per_page): all_visits = get_origin_visits(origin_query) all_visits.reverse() visits = [] if not last_visit: visits = all_visits[:per_page] else: for i, v in enumerate(all_visits): if v["visit"] == last_visit: visits = all_visits[i + 1 : i + 1 + per_page] break for v in visits: yield v results = api_lookup( _lookup_origin_visits, origin_query, notfound_msg=notfound_msg, enrich_fn=partial( enrich_origin_visit, with_origin_link=False, with_origin_visit_link=True ), request=request, ) if results: nb_results = len(results) if nb_results == per_page: new_last_visit = results[-1]["visit"] query_params = {} query_params["last_visit"] = new_last_visit if request.query_params.get("per_page"): query_params["per_page"] = per_page result["headers"] = { "link-next": reverse( "api-1-origin-visits", url_args=url_args_next, query_params=query_params, request=request, ) } result.update({"results": results}) return result @api_route( r"/origin/(?P.*)/visit/latest/", "api-1-origin-visit-latest", throttle_scope="swh_api_origin_visit_latest", ) @api_doc("/origin/visit/latest/") @format_docstring(return_origin_visit=DOC_RETURN_ORIGIN_VISIT) def api_origin_visit_latest(request, origin_url=None): """ .. http:get:: /api/1/origin/(origin_url)/visit/latest/ Get information about the latest visit of a software origin. :param str origin_url: a software origin URL :query boolean require_snapshot: if true, only return a visit with a snapshot {common_headers} {return_origin_visit} :statuscode 200: no error :statuscode 404: requested origin or visit can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/hylang/hy/visit/latest/` """ require_snapshot = request.query_params.get("require_snapshot", "false") return api_lookup( - service.lookup_origin_visit_latest, + archive.lookup_origin_visit_latest, origin_url, bool(strtobool(require_snapshot)), notfound_msg=("No visit for origin {} found".format(origin_url)), enrich_fn=partial( enrich_origin_visit, with_origin_link=True, with_origin_visit_link=False ), request=request, ) @api_route( r"/origin/(?P.*)/visit/(?P[0-9]+)/", "api-1-origin-visit" ) @api_doc("/origin/visit/") @format_docstring(return_origin_visit=DOC_RETURN_ORIGIN_VISIT) def api_origin_visit(request, visit_id, origin_url): """ .. http:get:: /api/1/origin/(origin_url)/visit/(visit_id)/ Get information about a specific visit of a software origin. :param str origin_url: a software origin URL :param int visit_id: a visit identifier {common_headers} {return_origin_visit} :statuscode 200: no error :statuscode 404: requested origin or visit can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/hylang/hy/visit/1/` """ return api_lookup( - service.lookup_origin_visit, + archive.lookup_origin_visit, origin_url, int(visit_id), notfound_msg=("No visit {} for origin {} found".format(visit_id, origin_url)), enrich_fn=partial( enrich_origin_visit, with_origin_link=True, with_origin_visit_link=False ), request=request, ) @api_route( r"/origin/(?P.+)" "/intrinsic-metadata", "api-origin-intrinsic-metadata" ) @api_doc("/origin/intrinsic-metadata/") @format_docstring() def api_origin_intrinsic_metadata(request, origin_url): """ .. http:get:: /api/1/origin/(origin_url)/intrinsic-metadata Get intrinsic metadata of a software origin (as a JSON-LD/CodeMeta dictionary). :param string origin_url: the origin url :>json string ???: intrinsic metadata field of the origin {common_headers} :statuscode 200: no error :statuscode 404: requested origin can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/python/cpython/intrinsic-metadata` """ return api_lookup( - service.lookup_origin_intrinsic_metadata, + archive.lookup_origin_intrinsic_metadata, origin_url, notfound_msg=f"Origin with url {origin_url} not found", enrich_fn=enrich_origin, request=request, ) diff --git a/swh/web/api/views/release.py b/swh/web/api/views/release.py index 8e2ea03f..a41b51c8 100644 --- a/swh/web/api/views/release.py +++ b/swh/web/api/views/release.py @@ -1,60 +1,60 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import service +from swh.web.common import archive @api_route( r"/release/(?P[0-9a-f]+)/", "api-1-release", checksum_args=["sha1_git"] ) @api_doc("/release/") @format_docstring() def api_release(request, sha1_git): """ .. http:get:: /api/1/release/(sha1_git)/ Get information about a release in the archive. Releases are identified by **sha1** checksums, compatible with Git tag identifiers. See :func:`swh.model.identifiers.release_identifier` in our data model module for details about how they are computed. :param string sha1_git: hexadecimal representation of the release **sha1_git** identifier {common_headers} :>json object author: information about the author of the release :>json string date: ISO representation of the release date (in UTC) :>json string id: the release unique identifier :>json string message: the message associated to the release :>json string name: the name of the release :>json string target: the target identifier of the release :>json string target_type: the type of the target, can be either **release**, **revision**, **content**, **directory** :>json string target_url: a link to the adequate api url based on the target type :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested release can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`release/208f61cc7a5dbc9879ae6e5c2f95891e270f09ef/` """ error_msg = "Release with sha1_git %s not found." % sha1_git return api_lookup( - service.lookup_release, + archive.lookup_release, sha1_git, notfound_msg=error_msg, enrich_fn=utils.enrich_release, request=request, ) diff --git a/swh/web/api/views/revision.py b/swh/web/api/views/revision.py index c3927806..f5cf924f 100644 --- a/swh/web/api/views/revision.py +++ b/swh/web/api/views/revision.py @@ -1,237 +1,237 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.http import HttpResponse from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import service +from swh.web.common import archive DOC_RETURN_REVISION = """ :>json object author: information about the author of the revision :>json object committer: information about the committer of the revision :>json string committer_date: ISO representation of the commit date (in UTC) :>json string date: ISO representation of the revision date (in UTC) :>json string directory: the unique identifier that revision points to :>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>json string id: the revision unique identifier :>json boolean merge: whether or not the revision corresponds to a merge commit :>json string message: the message associated to the revision :>json array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` to get more information about it :>json string type: the type of the revision """ DOC_RETURN_REVISION_ARRAY = DOC_RETURN_REVISION.replace(":>json", ":>jsonarr") def _revision_directory_by(revision, path, request_path, limit=100, with_data=False): """ Compute the revision matching criterion's directory or content data. Args: revision: dictionary of criterions representing a revision to lookup path: directory's path to lookup request_path: request path which holds the original context to limit: optional query parameter to limit the revisions log (default to 100). For now, note that this limit could impede the transitivity conclusion about sha1_git not being an ancestor of with_data: indicate to retrieve the content's raw data if path resolves to a content. """ def enrich_directory_local(dir, context_url=request_path): return utils.enrich_directory(dir, context_url) - rev_id, result = service.lookup_directory_through_revision( + rev_id, result = archive.lookup_directory_through_revision( revision, path, limit=limit, with_data=with_data ) content = result["content"] if result["type"] == "dir": # dir_entries result["content"] = list(map(enrich_directory_local, content)) elif result["type"] == "file": # content result["content"] = utils.enrich_content(content) elif result["type"] == "rev": # revision result["content"] = utils.enrich_revision(content) return result @api_route( r"/revision/(?P[0-9a-f]+)/", "api-1-revision", checksum_args=["sha1_git"] ) @api_doc("/revision/") @format_docstring(return_revision=DOC_RETURN_REVISION) def api_revision(request, sha1_git): """ .. http:get:: /api/1/revision/(sha1_git)/ Get information about a revision in the archive. Revisions are identified by **sha1** checksums, compatible with Git commit identifiers. See :func:`swh.model.identifiers.revision_identifier` in our data model module for details about how they are computed. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier {common_headers} {return_revision} :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/` """ return api_lookup( - service.lookup_revision, + archive.lookup_revision, sha1_git, notfound_msg="Revision with sha1_git {} not found.".format(sha1_git), enrich_fn=utils.enrich_revision, request=request, ) @api_route( r"/revision/(?P[0-9a-f]+)/raw/", "api-1-revision-raw-message", checksum_args=["sha1_git"], ) @api_doc("/revision/raw/", tags=["hidden"], handle_response=True) def api_revision_raw_message(request, sha1_git): """Return the raw data of the message of revision identified by sha1_git """ - raw = service.lookup_revision_message(sha1_git) + raw = archive.lookup_revision_message(sha1_git) response = HttpResponse(raw["message"], content_type="application/octet-stream") response["Content-disposition"] = "attachment;filename=rev_%s_raw" % sha1_git return response @api_route( r"/revision/(?P[0-9a-f]+)/directory/", "api-1-revision-directory", checksum_args=["sha1_git"], ) @api_route( r"/revision/(?P[0-9a-f]+)/directory/(?P.+)/", "api-1-revision-directory", checksum_args=["sha1_git"], ) @api_doc("/revision/directory/") @format_docstring() def api_revision_directory(request, sha1_git, dir_path=None, with_data=False): """ .. http:get:: /api/1/revision/(sha1_git)/directory/[(path)/] Get information about directory (entry) objects associated to revisions. Each revision is associated to a single "root" directory. This endpoint behaves like :http:get:`/api/1/directory/(sha1_git)/[(path)/]`, but operates on the root directory associated to a given revision. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier :param string path: optional parameter to get information about the directory entry pointed by that relative path {common_headers} :>json array content: directory entries as returned by :http:get:`/api/1/directory/(sha1_git)/[(path)/]` :>json string path: path of directory from the revision root one :>json string revision: the unique revision identifier :>json string type: the type of the directory :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/directory/` """ return _revision_directory_by( {"sha1_git": sha1_git}, dir_path, request.path, with_data=with_data ) @api_route( r"/revision/(?P[0-9a-f]+)/log/", "api-1-revision-log", checksum_args=["sha1_git"], ) @api_doc("/revision/log/") @format_docstring(return_revision_array=DOC_RETURN_REVISION_ARRAY) def api_revision_log(request, sha1_git): """ .. http:get:: /api/1/revision/(sha1_git)/log/ Get a list of all revisions heading to a given one, in other words show the commit log. The revisions are returned in the breadth-first search order while visiting the revision graph. The number of revisions to return is also bounded by the **limit** query parameter. .. warning:: To get the full BFS traversal of the revision graph when the total number of revisions is greater than 1000, it is up to the client to keep track of the multiple branches of history when there's merge revisions in the returned objects. In other words, identify all the continuation points that need to be followed to get the full history through recursion. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier :query int limit: maximum number of revisions to return when performing BFS traversal on the revision graph (default to 10, can not exceed 1000) {common_headers} {return_revision_array} :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: head revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/e1a315fa3fa734e2a6154ed7b5b9ae0eb8987aad/log/` """ limit = int(request.query_params.get("limit", "10")) limit = min(limit, 1000) error_msg = "Revision with sha1_git %s not found." % sha1_git revisions = api_lookup( - service.lookup_revision_log, + archive.lookup_revision_log, sha1_git, limit, notfound_msg=error_msg, enrich_fn=utils.enrich_revision, request=request, ) return {"results": revisions} diff --git a/swh/web/api/views/snapshot.py b/swh/web/api/views/snapshot.py index 02b0a969..c6994f29 100644 --- a/swh/web/api/views/snapshot.py +++ b/swh/web/api/views/snapshot.py @@ -1,103 +1,103 @@ # Copyright (C) 2018-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.utils import enrich_snapshot from swh.web.api.views.utils import api_lookup -from swh.web.common import service +from swh.web.common import archive from swh.web.common.utils import reverse from swh.web.config import get_config @api_route( r"/snapshot/(?P[0-9a-f]+)/", "api-1-snapshot", checksum_args=["snapshot_id"], ) @api_doc("/snapshot/") @format_docstring() def api_snapshot(request, snapshot_id): """ .. http:get:: /api/1/snapshot/(snapshot_id)/ Get information about a snapshot in the archive. A snapshot is a set of named branches, which are pointers to objects at any level of the Software Heritage DAG. It represents a full picture of an origin at a given time. As well as pointing to other objects in the Software Heritage DAG, branches can also be aliases, in which case their target is the name of another branch in the same snapshot, or dangling, in which case the target is unknown. A snapshot identifier is a salted sha1. See :func:`swh.model.identifiers.snapshot_identifier` in our data model module for details about how they are computed. :param sha1 snapshot_id: a snapshot identifier :query str branches_from: optional parameter used to skip branches whose name is lesser than it before returning them :query int branches_count: optional parameter used to restrain the amount of returned branches (default to 1000) :query str target_types: optional comma separated list parameter used to filter the target types of branch to return (possible values that can be contained in that list are ``content``, ``directory``, ``revision``, ``release``, ``snapshot`` or ``alias``) {common_headers} {resheader_link} :>json object branches: object containing all branches associated to the snapshot,for each of them the associated target type and id are given but also a link to get information about that target :>json string id: the unique identifier of the snapshot :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`snapshot/6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a/` """ snapshot_content_max_size = get_config()["snapshot_content_max_size"] branches_from = request.GET.get("branches_from", "") branches_count = int(request.GET.get("branches_count", snapshot_content_max_size)) target_types = request.GET.get("target_types", None) target_types = target_types.split(",") if target_types else None results = api_lookup( - service.lookup_snapshot, + archive.lookup_snapshot, snapshot_id, branches_from, branches_count, target_types, notfound_msg="Snapshot with id {} not found.".format(snapshot_id), enrich_fn=enrich_snapshot, request=request, ) response = {"results": results, "headers": {}} if results["next_branch"] is not None: response["headers"]["link-next"] = reverse( "api-1-snapshot", url_args={"snapshot_id": snapshot_id}, query_params={ "branches_from": results["next_branch"], "branches_count": branches_count, "target_types": target_types, }, request=request, ) return response diff --git a/swh/web/api/views/stat.py b/swh/web/api/views/stat.py index 0bb70d39..704e3e94 100644 --- a/swh/web/api/views/stat.py +++ b/swh/web/api/views/stat.py @@ -1,50 +1,50 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route -from swh.web.common import service +from swh.web.common import archive @api_route(r"/stat/counters/", "api-1-stat-counters") @api_doc("/stat/counters/", noargs=True) @format_docstring() def api_stats(request): """ .. http:get:: /api/1/stat/counters/ Get statistics about the content of the archive. :>json number content: current number of content objects (aka files) in the archive :>json number directory: current number of directory objects in the archive :>json number origin: current number of software origins (an origin is a "place" where code source can be found, e.g. a git repository, a tarball, ...) in the archive :>json number origin_visit: current number of visits on software origins to fill the archive :>json number person: current number of persons (code source authors or committers) in the archive :>json number release: current number of releases objects in the archive :>json number revision: current number of revision objects (aka commits) in the archive :>json number skipped_content: current number of content objects (aka files) which where not inserted in the archive :>json number snapshot: current number of snapshot objects (aka set of named branches) in the archive {common_headers} :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`stat/counters/` """ - return service.stat_counters() + return archive.stat_counters() diff --git a/swh/web/api/views/vault.py b/swh/web/api/views/vault.py index 952f7a70..deef227d 100644 --- a/swh/web/api/views/vault.py +++ b/swh/web/api/views/vault.py @@ -1,272 +1,272 @@ # Copyright (C) 2015-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.http import HttpResponse from django.shortcuts import redirect from django.views.decorators.cache import never_cache from swh.model import hashutil from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import query, service +from swh.web.common import archive, query from swh.web.common.utils import reverse # XXX: a bit spaghetti. Would be better with class-based views. def _dispatch_cook_progress(request, obj_type, obj_id): hex_id = hashutil.hash_to_hex(obj_id) object_name = obj_type.split("_")[0] if request.method == "GET": return api_lookup( - service.vault_progress, + archive.vault_progress, obj_type, obj_id, notfound_msg=( "Cooking of {} '{}' was never requested.".format(object_name, hex_id) ), request=request, ) elif request.method == "POST": email = request.POST.get("email", request.GET.get("email", None)) return api_lookup( - service.vault_cook, + archive.vault_cook, obj_type, obj_id, email, notfound_msg=("{} '{}' not found.".format(object_name.title(), hex_id)), request=request, ) @api_route( r"/vault/directory/(?P[0-9a-f]+)/", "api-1-vault-cook-directory", methods=["GET", "POST"], checksum_args=["dir_id"], throttle_scope="swh_vault_cooking", ) @never_cache @api_doc("/vault/directory/") @format_docstring() def api_vault_cook_directory(request, dir_id): """ .. http:get:: /api/1/vault/directory/(dir_id)/ .. http:post:: /api/1/vault/directory/(dir_id)/ Request the cooking of an archive for a directory or check its cooking status. That endpoint enables to create a vault cooking task for a directory through a POST request or check the status of a previously created one through a GET request. Once the cooking task has been executed, the resulting archive can be downloaded using the dedicated endpoint :http:get:`/api/1/vault/directory/(dir_id)/raw/`. Then to extract the cooked directory in the current one, use:: $ tar xvf path/to/directory.tar.gz :param string dir_id: the directory's sha1 identifier :query string email: e-mail to notify when the archive is ready {common_headers} :>json string fetch_url: the url from which to download the archive once it has been cooked (see :http:get:`/api/1/vault/directory/(dir_id)/raw/`) :>json string obj_type: the type of object to cook (directory or revision) :>json string progress_message: message describing the cooking task progress :>json number id: the cooking task id :>json string status: the cooking task status (either **new**, **pending**, **done** or **failed**) :>json string obj_id: the identifier of the object to cook :statuscode 200: no error :statuscode 400: an invalid directory identifier has been provided :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ _, obj_id = query.parse_hash_with_algorithms_or_throws( dir_id, ["sha1"], "Only sha1_git is supported." ) res = _dispatch_cook_progress(request, "directory", obj_id) res["fetch_url"] = reverse( "api-1-vault-fetch-directory", url_args={"dir_id": dir_id} ) return res @api_route( r"/vault/directory/(?P[0-9a-f]+)/raw/", "api-1-vault-fetch-directory", checksum_args=["dir_id"], ) @api_doc("/vault/directory/raw/", handle_response=True) def api_vault_fetch_directory(request, dir_id): """ .. http:get:: /api/1/vault/directory/(dir_id)/raw/ Fetch the cooked archive for a directory. See :http:get:`/api/1/vault/directory/(dir_id)/` to get more details on directory cooking. :param string dir_id: the directory's sha1 identifier :resheader Content-Type: application/octet-stream :statuscode 200: no error :statuscode 400: an invalid directory identifier has been provided :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ _, obj_id = query.parse_hash_with_algorithms_or_throws( dir_id, ["sha1"], "Only sha1_git is supported." ) res = api_lookup( - service.vault_fetch, + archive.vault_fetch, "directory", obj_id, notfound_msg="Cooked archive for directory '{}' not found.".format(dir_id), request=request, ) fname = "{}.tar.gz".format(dir_id) response = HttpResponse(res, content_type="application/gzip") response["Content-disposition"] = "attachment; filename={}".format(fname) return response @api_route( r"/vault/revision/(?P[0-9a-f]+)/gitfast/", "api-1-vault-cook-revision_gitfast", methods=["GET", "POST"], checksum_args=["rev_id"], throttle_scope="swh_vault_cooking", ) @never_cache @api_doc("/vault/revision/gitfast/") @format_docstring() def api_vault_cook_revision_gitfast(request, rev_id): """ .. http:get:: /api/1/vault/revision/(rev_id)/gitfast/ .. http:post:: /api/1/vault/revision/(rev_id)/gitfast/ Request the cooking of a gitfast archive for a revision or check its cooking status. That endpoint enables to create a vault cooking task for a revision through a POST request or check the status of a previously created one through a GET request. Once the cooking task has been executed, the resulting gitfast archive can be downloaded using the dedicated endpoint :http:get:`/api/1/vault/revision/(rev_id)/gitfast/raw/`. Then to import the revision in the current directory, use:: $ git init $ zcat path/to/revision.gitfast.gz | git fast-import $ git checkout HEAD :param string rev_id: the revision's sha1 identifier :query string email: e-mail to notify when the gitfast archive is ready {common_headers} :>json string fetch_url: the url from which to download the archive once it has been cooked (see :http:get:`/api/1/vault/revision/(rev_id)/gitfast/raw/`) :>json string obj_type: the type of object to cook (directory or revision) :>json string progress_message: message describing the cooking task progress :>json number id: the cooking task id :>json string status: the cooking task status (new/pending/done/failed) :>json string obj_id: the identifier of the object to cook :statuscode 200: no error :statuscode 400: an invalid revision identifier has been provided :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ _, obj_id = query.parse_hash_with_algorithms_or_throws( rev_id, ["sha1"], "Only sha1_git is supported." ) res = _dispatch_cook_progress(request, "revision_gitfast", obj_id) res["fetch_url"] = reverse( "api-1-vault-fetch-revision_gitfast", url_args={"rev_id": rev_id} ) return res @api_route( r"/vault/revision/(?P[0-9a-f]+)/gitfast/raw/", "api-1-vault-fetch-revision_gitfast", checksum_args=["rev_id"], ) @api_doc("/vault/revision/gitfast/raw/", handle_response=True) def api_vault_fetch_revision_gitfast(request, rev_id): """ .. http:get:: /api/1/vault/revision/(rev_id)/gitfast/raw/ Fetch the cooked gitfast archive for a revision. See :http:get:`/api/1/vault/revision/(rev_id)/gitfast/` to get more details on directory cooking. :param string rev_id: the revision's sha1 identifier :resheader Content-Type: application/octet-stream :statuscode 200: no error :statuscode 400: an invalid revision identifier has been provided :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ _, obj_id = query.parse_hash_with_algorithms_or_throws( rev_id, ["sha1"], "Only sha1_git is supported." ) res = api_lookup( - service.vault_fetch, + archive.vault_fetch, "revision_gitfast", obj_id, notfound_msg="Cooked archive for revision '{}' not found.".format(rev_id), request=request, ) fname = "{}.gitfast.gz".format(rev_id) response = HttpResponse(res, content_type="application/gzip") response["Content-disposition"] = "attachment; filename={}".format(fname) return response @api_route( r"/vault/revision_gitfast/(?P[0-9a-f]+)/raw/", "api-1-vault-revision_gitfast-raw", checksum_args=["rev_id"], ) @api_doc("/vault/revision_gitfast/raw/", tags=["hidden"], handle_response=True) def _api_vault_revision_gitfast_raw(request, rev_id): """ The vault backend sends an email containing an invalid url to fetch a gitfast archive. So setup a redirection to the correct one as a temporary workaround. """ rev_gitfast_raw_url = reverse( "api-1-vault-fetch-revision_gitfast", url_args={"rev_id": rev_id} ) return redirect(rev_gitfast_raw_url) diff --git a/swh/web/browse/snapshot_context.py b/swh/web/browse/snapshot_context.py index a11fdf67..b37871f4 100644 --- a/swh/web/browse/snapshot_context.py +++ b/swh/web/browse/snapshot_context.py @@ -1,1459 +1,1459 @@ # Copyright (C) 2018-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information # Utility module for browsing the archive in a snapshot context. from collections import defaultdict from copy import copy from typing import Any, Dict, List, Optional, Tuple from django.core.cache import cache from django.shortcuts import render from django.template.defaultfilters import filesizeformat from django.utils.html import escape from swh.model.identifiers import ( CONTENT, DIRECTORY, RELEASE, REVISION, SNAPSHOT, snapshot_identifier, swhid, ) from swh.web.browse.utils import ( content_display_max_size, format_log_entries, gen_content_link, gen_directory_link, gen_release_link, gen_revision_link, gen_revision_log_link, gen_revision_url, gen_snapshot_link, get_directory_entries, get_readme_to_display, prepare_content_for_display, request_content, ) -from swh.web.common import highlightjs, service +from swh.web.common import archive, highlightjs from swh.web.common.exc import BadInputExc, NotFoundExc, handle_view_exception from swh.web.common.identifiers import get_swhids_info from swh.web.common.origin_visits import get_origin_visit from swh.web.common.typing import ( ContentMetadata, DirectoryMetadata, OriginInfo, SnapshotBranchInfo, SnapshotContext, SnapshotReleaseInfo, SWHObjectInfo, ) from swh.web.common.utils import ( format_utc_iso_date, gen_path_info, reverse, swh_object_icons, ) from swh.web.config import get_config _empty_snapshot_id = snapshot_identifier({"branches": {}}) def _get_branch(branches, branch_name, snapshot_id): """ Utility function to get a specific branch from a branches list. Its purpose is to get the default HEAD branch as some software origin (e.g those with svn type) does not have it. In that latter case, check if there is a master branch instead and returns it. """ filtered_branches = [b for b in branches if b["name"] == branch_name] if filtered_branches: return filtered_branches[0] elif branch_name == "HEAD": filtered_branches = [b for b in branches if b["name"].endswith("master")] if filtered_branches: return filtered_branches[0] elif branches: return branches[0] else: # case where a large branches list has been truncated - snp = service.lookup_snapshot( + snp = archive.lookup_snapshot( snapshot_id, branches_from=branch_name, branches_count=1, target_types=["revision", "alias"], ) snp_branch, _ = process_snapshot_branches(snp) if snp_branch and snp_branch[0]["name"] == branch_name: branches.append(snp_branch[0]) return snp_branch[0] def _get_release(releases, release_name, snapshot_id): """ Utility function to get a specific release from a releases list. Returns None if the release can not be found in the list. """ filtered_releases = [r for r in releases if r["name"] == release_name] if filtered_releases: return filtered_releases[0] else: # case where a large branches list has been truncated try: # git origins have specific branches for releases - snp = service.lookup_snapshot( + snp = archive.lookup_snapshot( snapshot_id, branches_from=f"refs/tags/{release_name}", branches_count=1, target_types=["release"], ) except NotFoundExc: - snp = service.lookup_snapshot( + snp = archive.lookup_snapshot( snapshot_id, branches_from=release_name, branches_count=1, target_types=["release"], ) _, snp_release = process_snapshot_branches(snp) if snp_release and snp_release[0]["name"] == release_name: releases.append(snp_release[0]) return snp_release[0] def _branch_not_found( branch_type, branch, snapshot_id, snapshot_sizes, origin_info, timestamp, visit_id ): """ Utility function to raise an exception when a specified branch/release can not be found. """ if branch_type == "branch": branch_type = "Branch" branch_type_plural = "branches" target_type = "revision" else: branch_type = "Release" branch_type_plural = "releases" target_type = "release" if snapshot_id and snapshot_sizes[target_type] == 0: msg = "Snapshot with id %s has an empty list" " of %s!" % ( snapshot_id, branch_type_plural, ) elif snapshot_id: msg = "%s %s for snapshot with id %s" " not found!" % ( branch_type, branch, snapshot_id, ) elif visit_id and snapshot_sizes[target_type] == 0: msg = ( "Origin with url %s" " for visit with id %s has an empty list" " of %s!" % (origin_info["url"], visit_id, branch_type_plural) ) elif visit_id: msg = ( "%s %s associated to visit with" " id %s for origin with url %s" " not found!" % (branch_type, branch, visit_id, origin_info["url"]) ) elif snapshot_sizes[target_type] == 0: msg = ( "Origin with url %s" " for visit with timestamp %s has an empty list" " of %s!" % (origin_info["url"], timestamp, branch_type_plural) ) else: msg = ( "%s %s associated to visit with" " timestamp %s for origin with " "url %s not found!" % (branch_type, branch, timestamp, origin_info["url"]) ) raise NotFoundExc(escape(msg)) def process_snapshot_branches( snapshot: Dict[str, Any] ) -> Tuple[List[SnapshotBranchInfo], List[SnapshotReleaseInfo]]: """ Process a dictionary describing snapshot branches: extract those targeting revisions and releases, put them in two different lists, then sort those lists in lexicographical order of the branches' names. Args: snapshot: A dict describing a snapshot as returned for instance by - :func:`swh.web.common.service.lookup_snapshot` + :func:`swh.web.common.archive.lookup_snapshot` Returns: A tuple whose first member is the sorted list of branches targeting revisions and second member the sorted list of branches targeting releases """ snapshot_branches = snapshot["branches"] branches: Dict[str, SnapshotBranchInfo] = {} branch_aliases: Dict[str, str] = {} releases: Dict[str, SnapshotReleaseInfo] = {} revision_to_branch = defaultdict(set) revision_to_release = defaultdict(set) release_to_branch = defaultdict(set) for branch_name, target in snapshot_branches.items(): if not target: # FIXME: display branches with an unknown target anyway continue target_id = target["target"] target_type = target["target_type"] if target_type == "revision": branches[branch_name] = SnapshotBranchInfo( name=branch_name, revision=target_id, date=None, directory=None, message=None, url=None, ) revision_to_branch[target_id].add(branch_name) elif target_type == "release": release_to_branch[target_id].add(branch_name) elif target_type == "alias": branch_aliases[branch_name] = target_id # FIXME: handle pointers to other object types def _add_release_info(branch, release): releases[branch] = SnapshotReleaseInfo( name=release["name"], branch_name=branch, date=format_utc_iso_date(release["date"]), directory=None, id=release["id"], message=release["message"], target_type=release["target_type"], target=release["target"], url=None, ) def _add_branch_info(branch, revision): branches[branch] = SnapshotBranchInfo( name=branch, revision=revision["id"], directory=revision["directory"], date=format_utc_iso_date(revision["date"]), message=revision["message"], url=None, ) - releases_info = service.lookup_release_multiple(release_to_branch.keys()) + releases_info = archive.lookup_release_multiple(release_to_branch.keys()) for release in releases_info: if release is None: continue branches_to_update = release_to_branch[release["id"]] for branch in branches_to_update: _add_release_info(branch, release) if release["target_type"] == "revision": revision_to_release[release["target"]].update(branches_to_update) - revisions = service.lookup_revision_multiple( + revisions = archive.lookup_revision_multiple( set(revision_to_branch.keys()) | set(revision_to_release.keys()) ) for revision in revisions: if not revision: continue for branch in revision_to_branch[revision["id"]]: _add_branch_info(branch, revision) for release_id in revision_to_release[revision["id"]]: releases[release_id]["directory"] = revision["directory"] for branch_alias, branch_target in branch_aliases.items(): if branch_target in branches: branches[branch_alias] = copy(branches[branch_target]) else: - snp = service.lookup_snapshot( + snp = archive.lookup_snapshot( snapshot["id"], branches_from=branch_target, branches_count=1 ) if snp and branch_target in snp["branches"]: if snp["branches"][branch_target] is None: continue target_type = snp["branches"][branch_target]["target_type"] target = snp["branches"][branch_target]["target"] if target_type == "revision": branches[branch_alias] = snp["branches"][branch_target] - revision = service.lookup_revision(target) + revision = archive.lookup_revision(target) _add_branch_info(branch_alias, revision) elif target_type == "release": - release = service.lookup_release(target) + release = archive.lookup_release(target) _add_release_info(branch_alias, release) if branch_alias in branches: branches[branch_alias]["name"] = branch_alias ret_branches = list(sorted(branches.values(), key=lambda b: b["name"])) ret_releases = list(sorted(releases.values(), key=lambda b: b["name"])) return ret_branches, ret_releases def get_snapshot_content( snapshot_id: str, ) -> Tuple[List[SnapshotBranchInfo], List[SnapshotReleaseInfo]]: """Returns the lists of branches and releases associated to a swh snapshot. That list is put in cache in order to speedup the navigation in the swh-web/browse ui. .. warning:: At most 1000 branches contained in the snapshot will be returned for performance reasons. Args: snapshot_id: hexadecimal representation of the snapshot identifier Returns: A tuple with two members. The first one is a list of dict describing the snapshot branches. The second one is a list of dict describing the snapshot releases. Raises: NotFoundExc if the snapshot does not exist """ cache_entry_id = "swh_snapshot_%s" % snapshot_id cache_entry = cache.get(cache_entry_id) if cache_entry: return cache_entry["branches"], cache_entry["releases"] branches: List[SnapshotBranchInfo] = [] releases: List[SnapshotReleaseInfo] = [] snapshot_content_max_size = get_config()["snapshot_content_max_size"] if snapshot_id: - snapshot = service.lookup_snapshot( + snapshot = archive.lookup_snapshot( snapshot_id, branches_count=snapshot_content_max_size ) branches, releases = process_snapshot_branches(snapshot) cache.set(cache_entry_id, {"branches": branches, "releases": releases,}) return branches, releases def get_origin_visit_snapshot( origin_info: OriginInfo, visit_ts: Optional[str] = None, visit_id: Optional[int] = None, snapshot_id: Optional[str] = None, ) -> Tuple[List[SnapshotBranchInfo], List[SnapshotReleaseInfo]]: """Returns the lists of branches and releases associated to an origin for a given visit. The visit is expressed by either: * a snapshot identifier * a timestamp, if no visit with that exact timestamp is found, the closest one from the provided timestamp will be used. If no visit parameter is provided, it returns the list of branches found for the latest visit. That list is put in cache in order to speedup the navigation in the swh-web/browse ui. .. warning:: At most 1000 branches contained in the snapshot will be returned for performance reasons. Args: origin_info: a dict filled with origin information visit_ts: an ISO 8601 datetime string to parse visit_id: visit id for disambiguation in case several visits have the same timestamp snapshot_id: if provided, visit associated to the snapshot will be processed Returns: A tuple with two members. The first one is a list of dict describing the origin branches for the given visit. The second one is a list of dict describing the origin releases for the given visit. Raises: NotFoundExc if the origin or its visit are not found """ visit_info = get_origin_visit(origin_info, visit_ts, visit_id, snapshot_id) return get_snapshot_content(visit_info["snapshot"]) def get_snapshot_context( snapshot_id: Optional[str] = None, origin_url: Optional[str] = None, timestamp: Optional[str] = None, visit_id: Optional[int] = None, branch_name: Optional[str] = None, release_name: Optional[str] = None, revision_id: Optional[str] = None, path: Optional[str] = None, browse_context: str = "directory", ) -> SnapshotContext: """ Utility function to compute relevant information when navigating the archive in a snapshot context. The snapshot is either referenced by its id or it will be retrieved from an origin visit. Args: snapshot_id: hexadecimal representation of a snapshot identifier origin_url: an origin_url timestamp: a datetime string for retrieving the closest visit of the origin visit_id: optional visit id for disambiguation in case of several visits with the same timestamp branch_name: optional branch name set when browsing the snapshot in that scope (will default to "HEAD" if not provided) release_name: optional release name set when browsing the snapshot in that scope revision_id: optional revision identifier set when browsing the snapshot in that scope path: optional path of the object currently browsed in the snapshot browse_context: indicates which type of object is currently browsed Returns: A dict filled with snapshot context information. Raises: swh.web.common.exc.NotFoundExc: if no snapshot is found for the visit of an origin. """ assert origin_url is not None or snapshot_id is not None origin_info = None visit_info = None url_args = {} query_params: Dict[str, Any] = {} origin_visits_url = None if origin_url: if visit_id is not None: query_params["visit_id"] = visit_id elif snapshot_id is not None: query_params["snapshot"] = snapshot_id - origin_info = service.lookup_origin({"url": origin_url}) + origin_info = archive.lookup_origin({"url": origin_url}) visit_info = get_origin_visit(origin_info, timestamp, visit_id, snapshot_id) formatted_date = format_utc_iso_date(visit_info["date"]) visit_info["formatted_date"] = formatted_date snapshot_id = visit_info["snapshot"] if not snapshot_id: raise NotFoundExc( "No snapshot associated to the visit of origin " "%s on %s" % (escape(origin_url), formatted_date) ) # provided timestamp is not necessarily equals to the one # of the retrieved visit, so get the exact one in order # to use it in the urls generated below if timestamp: timestamp = visit_info["date"] branches, releases = get_origin_visit_snapshot( origin_info, timestamp, visit_id, snapshot_id ) query_params["origin_url"] = origin_info["url"] origin_visits_url = reverse( "browse-origin-visits", query_params={"origin_url": origin_info["url"]} ) if timestamp is not None: query_params["timestamp"] = format_utc_iso_date( timestamp, "%Y-%m-%dT%H:%M:%SZ" ) visit_url = reverse("browse-origin-directory", query_params=query_params) visit_info["url"] = visit_url branches_url = reverse("browse-origin-branches", query_params=query_params) releases_url = reverse("browse-origin-releases", query_params=query_params) else: assert snapshot_id is not None branches, releases = get_snapshot_content(snapshot_id) url_args = {"snapshot_id": snapshot_id} branches_url = reverse("browse-snapshot-branches", url_args=url_args) releases_url = reverse("browse-snapshot-releases", url_args=url_args) releases = list(reversed(releases)) - snapshot_sizes = service.lookup_snapshot_sizes(snapshot_id) + snapshot_sizes = archive.lookup_snapshot_sizes(snapshot_id) is_empty = sum(snapshot_sizes.values()) == 0 swh_snp_id = swhid("snapshot", snapshot_id) if visit_info: timestamp = format_utc_iso_date(visit_info["date"]) if origin_info: browse_view_name = f"browse-origin-{browse_context}" else: browse_view_name = f"browse-snapshot-{browse_context}" release_id = None root_directory = None snapshot_total_size = sum(snapshot_sizes.values()) if path is not None: query_params["path"] = path if snapshot_total_size and revision_id is not None: - revision = service.lookup_revision(revision_id) + revision = archive.lookup_revision(revision_id) root_directory = revision["directory"] branches.append( SnapshotBranchInfo( name=revision_id, revision=revision_id, directory=root_directory, date=revision["date"], message=revision["message"], url=None, ) ) branch_name = revision_id query_params["revision"] = revision_id elif snapshot_total_size and release_name: release = _get_release(releases, release_name, snapshot_id) if release is None: _branch_not_found( "release", release_name, snapshot_id, snapshot_sizes, origin_info, timestamp, visit_id, ) else: root_directory = release["directory"] revision_id = release["target"] release_id = release["id"] query_params["release"] = release_name elif snapshot_total_size: if branch_name: query_params["branch"] = branch_name branch = _get_branch(branches, branch_name or "HEAD", snapshot_id) if branch is None: _branch_not_found( "branch", branch_name, snapshot_id, snapshot_sizes, origin_info, timestamp, visit_id, ) else: branch_name = branch["name"] revision_id = branch["revision"] root_directory = branch["directory"] for b in branches: branch_query_params = dict(query_params) branch_query_params.pop("release", None) if b["name"] != b["revision"]: branch_query_params.pop("revision", None) branch_query_params["branch"] = b["name"] b["url"] = reverse( browse_view_name, url_args=url_args, query_params=branch_query_params ) for r in releases: release_query_params = dict(query_params) release_query_params.pop("branch", None) release_query_params.pop("revision", None) release_query_params["release"] = r["name"] r["url"] = reverse( browse_view_name, url_args=url_args, query_params=release_query_params, ) revision_info = None if revision_id: try: - revision_info = service.lookup_revision(revision_id) + revision_info = archive.lookup_revision(revision_id) except NotFoundExc: pass else: revision_info["date"] = format_utc_iso_date(revision_info["date"]) revision_info["committer_date"] = format_utc_iso_date( revision_info["committer_date"] ) if revision_info["message"]: message_lines = revision_info["message"].split("\n") revision_info["message_header"] = message_lines[0] else: revision_info["message_header"] = "" snapshot_context = SnapshotContext( branch=branch_name, branches=branches, branches_url=branches_url, is_empty=is_empty, origin_info=origin_info, origin_visits_url=origin_visits_url, release=release_name, release_id=release_id, query_params=query_params, releases=releases, releases_url=releases_url, revision_id=revision_id, revision_info=revision_info, root_directory=root_directory, snapshot_id=snapshot_id, snapshot_sizes=snapshot_sizes, snapshot_swhid=swh_snp_id, url_args=url_args, visit_info=visit_info, ) if revision_info: revision_info["revision_url"] = gen_revision_url(revision_id, snapshot_context) return snapshot_context def _build_breadcrumbs(snapshot_context: SnapshotContext, path: str): origin_info = snapshot_context["origin_info"] url_args = snapshot_context["url_args"] query_params = dict(snapshot_context["query_params"]) root_directory = snapshot_context["root_directory"] path_info = gen_path_info(path) if origin_info: browse_view_name = "browse-origin-directory" else: browse_view_name = "browse-snapshot-directory" breadcrumbs = [] if root_directory: query_params.pop("path", None) breadcrumbs.append( { "name": root_directory[:7], "url": reverse( browse_view_name, url_args=url_args, query_params=query_params ), } ) for pi in path_info: query_params["path"] = pi["path"] breadcrumbs.append( { "name": pi["name"], "url": reverse( browse_view_name, url_args=url_args, query_params=query_params ), } ) return breadcrumbs def _check_origin_url(snapshot_id, origin_url): if snapshot_id is None and origin_url is None: raise BadInputExc("An origin URL must be provided as query parameter.") def browse_snapshot_directory( request, snapshot_id=None, origin_url=None, timestamp=None, path=None ): """ Django view implementation for browsing a directory in a snapshot context. """ try: _check_origin_url(snapshot_id, origin_url) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=request.GET.get("visit_id"), path=path, browse_context="directory", branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), ) root_directory = snapshot_context["root_directory"] sha1_git = root_directory if root_directory and path: - dir_info = service.lookup_directory_with_path(root_directory, path) + dir_info = archive.lookup_directory_with_path(root_directory, path) sha1_git = dir_info["target"] dirs = [] files = [] if sha1_git: dirs, files = get_directory_entries(sha1_git) except Exception as exc: return handle_view_exception(request, exc) origin_info = snapshot_context["origin_info"] visit_info = snapshot_context["visit_info"] url_args = snapshot_context["url_args"] query_params = dict(snapshot_context["query_params"]) revision_id = snapshot_context["revision_id"] snapshot_id = snapshot_context["snapshot_id"] if origin_info: browse_view_name = "browse-origin-directory" else: browse_view_name = "browse-snapshot-directory" breadcrumbs = _build_breadcrumbs(snapshot_context, path) path = "" if path is None else (path + "/") for d in dirs: if d["type"] == "rev": d["url"] = reverse("browse-revision", url_args={"sha1_git": d["target"]}) else: query_params["path"] = path + d["name"] d["url"] = reverse( browse_view_name, url_args=url_args, query_params=query_params ) sum_file_sizes = 0 readmes = {} if origin_info: browse_view_name = "browse-origin-content" else: browse_view_name = "browse-snapshot-content" for f in files: query_params["path"] = path + f["name"] f["url"] = reverse( browse_view_name, url_args=url_args, query_params=query_params ) if f["length"] is not None: sum_file_sizes += f["length"] f["length"] = filesizeformat(f["length"]) if f["name"].lower().startswith("readme"): readmes[f["name"]] = f["checksums"]["sha1"] readme_name, readme_url, readme_html = get_readme_to_display(readmes) if origin_info: browse_view_name = "browse-origin-log" else: browse_view_name = "browse-snapshot-log" history_url = None if snapshot_id != _empty_snapshot_id: query_params.pop("path", None) history_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) nb_files = None nb_dirs = None dir_path = None if root_directory: nb_files = len(files) nb_dirs = len(dirs) sum_file_sizes = filesizeformat(sum_file_sizes) dir_path = "/" + path browse_dir_link = gen_directory_link(sha1_git) browse_rev_link = gen_revision_link(revision_id) browse_snp_link = gen_snapshot_link(snapshot_id) revision_found = True if sha1_git is None and revision_id is not None: try: - service.lookup_revision(revision_id) + archive.lookup_revision(revision_id) except NotFoundExc: revision_found = False swh_objects = [ SWHObjectInfo(object_type=DIRECTORY, object_id=sha1_git), SWHObjectInfo(object_type=REVISION, object_id=revision_id), SWHObjectInfo(object_type=SNAPSHOT, object_id=snapshot_id), ] visit_date = None visit_type = None if visit_info: visit_date = format_utc_iso_date(visit_info["date"]) visit_type = visit_info["type"] release_id = snapshot_context["release_id"] browse_rel_link = None if release_id: swh_objects.append(SWHObjectInfo(object_type=RELEASE, object_id=release_id)) browse_rel_link = gen_release_link(release_id) dir_metadata = DirectoryMetadata( object_type=DIRECTORY, object_id=sha1_git, directory=sha1_git, directory_url=browse_dir_link, nb_files=nb_files, nb_dirs=nb_dirs, sum_file_sizes=sum_file_sizes, root_directory=root_directory, path=dir_path, revision=revision_id, revision_found=revision_found, revision_url=browse_rev_link, release=release_id, release_url=browse_rel_link, snapshot=snapshot_id, snapshot_url=browse_snp_link, origin_url=origin_url, visit_date=visit_date, visit_type=visit_type, ) vault_cooking = { "directory_context": True, "directory_id": sha1_git, "revision_context": True, "revision_id": revision_id, } swhids_info = get_swhids_info(swh_objects, snapshot_context, dir_metadata) dir_path = "/".join([bc["name"] for bc in breadcrumbs]) + "/" context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading = "Directory - %s - %s - %s" % ( dir_path, snapshot_context["branch"], context_found, ) top_right_link = None if not snapshot_context["is_empty"]: top_right_link = { "url": history_url, "icon": swh_object_icons["revisions history"], "text": "History", } return render( request, "browse/directory.html", { "heading": heading, "swh_object_name": "Directory", "swh_object_metadata": dir_metadata, "dirs": dirs, "files": files, "breadcrumbs": breadcrumbs if root_directory else [], "top_right_link": top_right_link, "readme_name": readme_name, "readme_url": readme_url, "readme_html": readme_html, "snapshot_context": snapshot_context, "vault_cooking": vault_cooking, "show_actions": True, "swhids_info": swhids_info, }, ) def browse_snapshot_content( request, snapshot_id=None, origin_url=None, timestamp=None, path=None, selected_language=None, ): """ Django view implementation for browsing a content in a snapshot context. """ try: _check_origin_url(snapshot_id, origin_url) if path is None: raise BadInputExc("The path of a content must be given as query parameter.") snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=request.GET.get("visit_id"), path=path, browse_context="content", branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), ) root_directory = snapshot_context["root_directory"] sha1_git = None query_string = None content_data = {} directory_id = None split_path = path.split("/") filename = split_path[-1] filepath = path[: -len(filename)] if root_directory: - content_info = service.lookup_directory_with_path(root_directory, path) + content_info = archive.lookup_directory_with_path(root_directory, path) sha1_git = content_info["target"] query_string = "sha1_git:" + sha1_git content_data = request_content(query_string, raise_if_unavailable=False) if filepath: - dir_info = service.lookup_directory_with_path(root_directory, filepath) + dir_info = archive.lookup_directory_with_path(root_directory, filepath) directory_id = dir_info["target"] else: directory_id = root_directory except Exception as exc: return handle_view_exception(request, exc) revision_id = snapshot_context["revision_id"] origin_info = snapshot_context["origin_info"] visit_info = snapshot_context["visit_info"] snapshot_id = snapshot_context["snapshot_id"] if content_data.get("raw_data") is not None: content_display_data = prepare_content_for_display( content_data["raw_data"], content_data["mimetype"], path ) content_data.update(content_display_data) # Override language with user-selected language if selected_language is not None: content_data["language"] = selected_language available_languages = None if content_data.get("mimetype") is not None and "text/" in content_data["mimetype"]: available_languages = highlightjs.get_supported_languages() breadcrumbs = _build_breadcrumbs(snapshot_context, filepath) breadcrumbs.append({"name": filename, "url": None}) browse_content_link = gen_content_link(sha1_git) content_raw_url = None if query_string: content_raw_url = reverse( "browse-content-raw", url_args={"query_string": query_string}, query_params={"filename": filename}, ) browse_rev_link = gen_revision_link(revision_id) browse_dir_link = gen_directory_link(directory_id) content_checksums = content_data.get("checksums", {}) swh_objects = [ SWHObjectInfo(object_type=CONTENT, object_id=content_checksums.get("sha1_git")), SWHObjectInfo(object_type=DIRECTORY, object_id=directory_id), SWHObjectInfo(object_type=REVISION, object_id=revision_id), SWHObjectInfo(object_type=SNAPSHOT, object_id=snapshot_id), ] visit_date = None visit_type = None if visit_info: visit_date = format_utc_iso_date(visit_info["date"]) visit_type = visit_info["type"] release_id = snapshot_context["release_id"] browse_rel_link = None if release_id: swh_objects.append(SWHObjectInfo(object_type=RELEASE, object_id=release_id)) browse_rel_link = gen_release_link(release_id) content_metadata = ContentMetadata( object_type=CONTENT, object_id=content_checksums.get("sha1_git"), sha1=content_checksums.get("sha1"), sha1_git=content_checksums.get("sha1_git"), sha256=content_checksums.get("sha256"), blake2s256=content_checksums.get("blake2s256"), content_url=browse_content_link, mimetype=content_data.get("mimetype"), encoding=content_data.get("encoding"), size=filesizeformat(content_data.get("length", 0)), language=content_data.get("language"), root_directory=root_directory, path=f"/{filepath}", filename=filename, directory=directory_id, directory_url=browse_dir_link, revision=revision_id, revision_url=browse_rev_link, release=release_id, release_url=browse_rel_link, snapshot=snapshot_id, snapshot_url=gen_snapshot_link(snapshot_id), origin_url=origin_url, visit_date=visit_date, visit_type=visit_type, ) swhids_info = get_swhids_info(swh_objects, snapshot_context, content_metadata) content_path = "/".join([bc["name"] for bc in breadcrumbs]) context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading = "Content - %s - %s - %s" % ( content_path, snapshot_context["branch"], context_found, ) top_right_link = None if not snapshot_context["is_empty"]: top_right_link = { "url": content_raw_url, "icon": swh_object_icons["content"], "text": "Raw File", } return render( request, "browse/content.html", { "heading": heading, "swh_object_name": "Content", "swh_object_metadata": content_metadata, "content": content_data.get("content_data"), "content_size": content_data.get("length"), "max_content_size": content_display_max_size, "filename": filename, "encoding": content_data.get("encoding"), "mimetype": content_data.get("mimetype"), "language": content_data.get("language"), "available_languages": available_languages, "breadcrumbs": breadcrumbs if root_directory else [], "top_right_link": top_right_link, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": True, "swhids_info": swhids_info, "error_code": content_data.get("error_code"), "error_message": content_data.get("error_message"), "error_description": content_data.get("error_description"), }, status=content_data.get("error_code", 200), ) PER_PAGE = 100 def browse_snapshot_log(request, snapshot_id=None, origin_url=None, timestamp=None): """ Django view implementation for browsing a revision history in a snapshot context. """ try: _check_origin_url(snapshot_id, origin_url) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=request.GET.get("visit_id"), browse_context="log", branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), ) revision_id = snapshot_context["revision_id"] per_page = int(request.GET.get("per_page", PER_PAGE)) offset = int(request.GET.get("offset", 0)) revs_ordering = request.GET.get("revs_ordering", "committer_date") session_key = "rev_%s_log_ordering_%s" % (revision_id, revs_ordering) rev_log_session = request.session.get(session_key, None) rev_log = [] revs_walker_state = None if rev_log_session: rev_log = rev_log_session["rev_log"] revs_walker_state = rev_log_session["revs_walker_state"] if len(rev_log) < offset + per_page: - revs_walker = service.get_revisions_walker( + revs_walker = archive.get_revisions_walker( revs_ordering, revision_id, max_revs=offset + per_page + 1, state=revs_walker_state, ) rev_log += [rev["id"] for rev in revs_walker] revs_walker_state = revs_walker.export_state() revs = rev_log[offset : offset + per_page] - revision_log = service.lookup_revision_multiple(revs) + revision_log = archive.lookup_revision_multiple(revs) request.session[session_key] = { "rev_log": rev_log, "revs_walker_state": revs_walker_state, } except Exception as exc: return handle_view_exception(request, exc) origin_info = snapshot_context["origin_info"] visit_info = snapshot_context["visit_info"] url_args = snapshot_context["url_args"] query_params = snapshot_context["query_params"] snapshot_id = snapshot_context["snapshot_id"] query_params["per_page"] = per_page revs_ordering = request.GET.get("revs_ordering", "") query_params["revs_ordering"] = revs_ordering or None if origin_info: browse_view_name = "browse-origin-log" else: browse_view_name = "browse-snapshot-log" prev_log_url = None if len(rev_log) > offset + per_page: query_params["offset"] = offset + per_page prev_log_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) next_log_url = None if offset != 0: query_params["offset"] = offset - per_page next_log_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) revision_log_data = format_log_entries(revision_log, per_page, snapshot_context) browse_rev_link = gen_revision_link(revision_id) browse_log_link = gen_revision_log_link(revision_id) browse_snp_link = gen_snapshot_link(snapshot_id) revision_metadata = { "context-independent revision": browse_rev_link, "context-independent revision history": browse_log_link, "context-independent snapshot": browse_snp_link, "snapshot": snapshot_id, } if origin_info: revision_metadata["origin url"] = origin_info["url"] revision_metadata["origin visit date"] = format_utc_iso_date(visit_info["date"]) revision_metadata["origin visit type"] = visit_info["type"] swh_objects = [ SWHObjectInfo(object_type=REVISION, object_id=revision_id), SWHObjectInfo(object_type=SNAPSHOT, object_id=snapshot_id), ] release_id = snapshot_context["release_id"] if release_id: swh_objects.append(SWHObjectInfo(object_type=RELEASE, object_id=release_id)) browse_rel_link = gen_release_link(release_id) revision_metadata["release"] = release_id revision_metadata["context-independent release"] = browse_rel_link swhids_info = get_swhids_info(swh_objects, snapshot_context) context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading = "Revision history - %s - %s" % (snapshot_context["branch"], context_found) return render( request, "browse/revision-log.html", { "heading": heading, "swh_object_name": "Revisions history", "swh_object_metadata": revision_metadata, "revision_log": revision_log_data, "revs_ordering": revs_ordering, "next_log_url": next_log_url, "prev_log_url": prev_log_url, "breadcrumbs": None, "top_right_link": None, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": True, "swhids_info": swhids_info, }, ) def browse_snapshot_branches( request, snapshot_id=None, origin_url=None, timestamp=None ): """ Django view implementation for browsing a list of branches in a snapshot context. """ try: _check_origin_url(snapshot_id, origin_url) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=request.GET.get("visit_id"), ) branches_bc = request.GET.get("branches_breadcrumbs", "") branches_bc = branches_bc.split(",") if branches_bc else [] branches_from = branches_bc[-1] if branches_bc else "" origin_info = snapshot_context["origin_info"] url_args = snapshot_context["url_args"] query_params = snapshot_context["query_params"] if origin_info: browse_view_name = "browse-origin-directory" else: browse_view_name = "browse-snapshot-directory" - snapshot = service.lookup_snapshot( + snapshot = archive.lookup_snapshot( snapshot_context["snapshot_id"], branches_from, PER_PAGE + 1, target_types=["revision", "alias"], ) displayed_branches, _ = process_snapshot_branches(snapshot) except Exception as exc: return handle_view_exception(request, exc) for branch in displayed_branches: rev_query_params = {} if origin_info: rev_query_params["origin_url"] = origin_info["url"] revision_url = reverse( "browse-revision", url_args={"sha1_git": branch["revision"]}, query_params=query_params, ) query_params["branch"] = branch["name"] directory_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) del query_params["branch"] branch["revision_url"] = revision_url branch["directory_url"] = directory_url if origin_info: browse_view_name = "browse-origin-branches" else: browse_view_name = "browse-snapshot-branches" prev_branches_url = None next_branches_url = None if branches_bc: query_params_prev = dict(query_params) query_params_prev["branches_breadcrumbs"] = ",".join(branches_bc[:-1]) prev_branches_url = reverse( browse_view_name, url_args=url_args, query_params=query_params_prev ) elif branches_from: prev_branches_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) if snapshot["next_branch"] is not None: query_params_next = dict(query_params) next_branch = displayed_branches[-1]["name"] del displayed_branches[-1] branches_bc.append(next_branch) query_params_next["branches_breadcrumbs"] = ",".join(branches_bc) next_branches_url = reverse( browse_view_name, url_args=url_args, query_params=query_params_next ) heading = "Branches - " if origin_info: heading += "origin: %s" % origin_info["url"] else: heading += "snapshot: %s" % snapshot_id return render( request, "browse/branches.html", { "heading": heading, "swh_object_name": "Branches", "swh_object_metadata": {}, "top_right_link": None, "displayed_branches": displayed_branches, "prev_branches_url": prev_branches_url, "next_branches_url": next_branches_url, "snapshot_context": snapshot_context, }, ) def browse_snapshot_releases( request, snapshot_id=None, origin_url=None, timestamp=None ): """ Django view implementation for browsing a list of releases in a snapshot context. """ try: _check_origin_url(snapshot_id, origin_url) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=request.GET.get("visit_id"), ) rel_bc = request.GET.get("releases_breadcrumbs", "") rel_bc = rel_bc.split(",") if rel_bc else [] rel_from = rel_bc[-1] if rel_bc else "" origin_info = snapshot_context["origin_info"] url_args = snapshot_context["url_args"] query_params = snapshot_context["query_params"] - snapshot = service.lookup_snapshot( + snapshot = archive.lookup_snapshot( snapshot_context["snapshot_id"], rel_from, PER_PAGE + 1, target_types=["release", "alias"], ) _, displayed_releases = process_snapshot_branches(snapshot) except Exception as exc: return handle_view_exception(request, exc) for release in displayed_releases: query_params_tgt = {"snapshot": snapshot_id} if origin_info: query_params_tgt["origin_url"] = origin_info["url"] release_url = reverse( "browse-release", url_args={"sha1_git": release["id"]}, query_params=query_params_tgt, ) target_url = "" if release["target_type"] == "revision": target_url = reverse( "browse-revision", url_args={"sha1_git": release["target"]}, query_params=query_params_tgt, ) elif release["target_type"] == "directory": target_url = reverse( "browse-directory", url_args={"sha1_git": release["target"]}, query_params=query_params_tgt, ) elif release["target_type"] == "content": target_url = reverse( "browse-content", url_args={"query_string": release["target"]}, query_params=query_params_tgt, ) elif release["target_type"] == "release": target_url = reverse( "browse-release", url_args={"sha1_git": release["target"]}, query_params=query_params_tgt, ) release["release_url"] = release_url release["target_url"] = target_url if origin_info: browse_view_name = "browse-origin-releases" else: browse_view_name = "browse-snapshot-releases" prev_releases_url = None next_releases_url = None if rel_bc: query_params_prev = dict(query_params) query_params_prev["releases_breadcrumbs"] = ",".join(rel_bc[:-1]) prev_releases_url = reverse( browse_view_name, url_args=url_args, query_params=query_params_prev ) elif rel_from: prev_releases_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) if snapshot["next_branch"] is not None: query_params_next = dict(query_params) next_rel = displayed_releases[-1]["branch_name"] del displayed_releases[-1] rel_bc.append(next_rel) query_params_next["releases_breadcrumbs"] = ",".join(rel_bc) next_releases_url = reverse( browse_view_name, url_args=url_args, query_params=query_params_next ) heading = "Releases - " if origin_info: heading += "origin: %s" % origin_info["url"] else: heading += "snapshot: %s" % snapshot_id return render( request, "browse/releases.html", { "heading": heading, "top_panel_visible": False, "top_panel_collapsible": False, "swh_object_name": "Releases", "swh_object_metadata": {}, "top_right_link": None, "displayed_releases": displayed_releases, "prev_releases_url": prev_releases_url, "next_releases_url": next_releases_url, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": False, }, ) diff --git a/swh/web/browse/utils.py b/swh/web/browse/utils.py index ac0dbfea..bb3401da 100644 --- a/swh/web/browse/utils.py +++ b/swh/web/browse/utils.py @@ -1,736 +1,736 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import base64 import stat import textwrap from threading import Lock import magic import sentry_sdk from django.core.cache import cache from django.utils.html import escape from django.utils.safestring import mark_safe -from swh.web.common import highlightjs, service +from swh.web.common import archive, highlightjs from swh.web.common.exc import http_status_code_message from swh.web.common.utils import ( browsers_supported_image_mimes, format_utc_iso_date, reverse, rst_to_html, ) from swh.web.config import get_config def get_directory_entries(sha1_git): """Function that retrieves the content of a directory from the archive. The directories entries are first sorted in lexicographical order. Sub-directories and regular files are then extracted. Args: sha1_git: sha1_git identifier of the directory Returns: A tuple whose first member corresponds to the sub-directories list and second member the regular files list Raises: NotFoundExc if the directory is not found """ cache_entry_id = "directory_entries_%s" % sha1_git cache_entry = cache.get(cache_entry_id) if cache_entry: return cache_entry - entries = list(service.lookup_directory(sha1_git)) + entries = list(archive.lookup_directory(sha1_git)) for e in entries: e["perms"] = stat.filemode(e["perms"]) if e["type"] == "rev": # modify dir entry name to explicitly show it points # to a revision e["name"] = "%s @ %s" % (e["name"], e["target"][:7]) dirs = [e for e in entries if e["type"] in ("dir", "rev")] files = [e for e in entries if e["type"] == "file"] dirs = sorted(dirs, key=lambda d: d["name"]) files = sorted(files, key=lambda f: f["name"]) cache.set(cache_entry_id, (dirs, files)) return dirs, files _lock = Lock() def get_mimetype_and_encoding_for_content(content): """Function that returns the mime type and the encoding associated to a content buffer using the magic module under the hood. Args: content (bytes): a content buffer Returns: A tuple (mimetype, encoding), for instance ('text/plain', 'us-ascii'), associated to the provided content. """ m = magic.Magic(mime=True, mime_encoding=True) mime_encoding = m.from_buffer(content) mime_type, encoding = mime_encoding.split(";") encoding = encoding.replace(" charset=", "") return mime_type, encoding # maximum authorized content size in bytes for HTML display # with code highlighting content_display_max_size = get_config()["content_display_max_size"] def _re_encode_content(mimetype, encoding, content_data): # encode textual content to utf-8 if needed if mimetype.startswith("text/"): # probably a malformed UTF-8 content, re-encode it # by replacing invalid chars with a substitution one if encoding == "unknown-8bit": content_data = content_data.decode("utf-8", "replace").encode("utf-8") elif encoding not in ["utf-8", "binary"]: content_data = content_data.decode(encoding, "replace").encode("utf-8") elif mimetype.startswith("application/octet-stream"): # file may detect a text content as binary # so try to decode it for display encodings = ["us-ascii", "utf-8"] encodings += ["iso-8859-%s" % i for i in range(1, 17)] for enc in encodings: try: content_data = content_data.decode(enc).encode("utf-8") except Exception as exc: sentry_sdk.capture_exception(exc) else: # ensure display in content view encoding = enc mimetype = "text/plain" break return mimetype, encoding, content_data def request_content( query_string, max_size=content_display_max_size, raise_if_unavailable=True, re_encode=True, ): """Function that retrieves a content from the archive. Raw bytes content is first retrieved, then the content mime type. If the mime type is not stored in the archive, it will be computed using Python magic module. Args: query_string: a string of the form "[ALGO_HASH:]HASH" where optional ALGO_HASH can be either ``sha1``, ``sha1_git``, ``sha256``, or ``blake2s256`` (default to ``sha1``) and HASH the hexadecimal representation of the hash value max_size: the maximum size for a content to retrieve (default to 1MB, no size limit if None) Returns: A tuple whose first member corresponds to the content raw bytes and second member the content mime type Raises: NotFoundExc if the content is not found """ - content_data = service.lookup_content(query_string) + content_data = archive.lookup_content(query_string) filetype = None language = None # requests to the indexer db may fail so properly handle # those cases in order to avoid content display errors try: - filetype = service.lookup_content_filetype(query_string) - language = service.lookup_content_language(query_string) + filetype = archive.lookup_content_filetype(query_string) + language = archive.lookup_content_language(query_string) except Exception as exc: sentry_sdk.capture_exception(exc) mimetype = "unknown" encoding = "unknown" if filetype: mimetype = filetype["mimetype"] encoding = filetype["encoding"] # workaround when encountering corrupted data due to implicit # conversion from bytea to text in the indexer db (see T818) # TODO: Remove that code when all data have been correctly converted if mimetype.startswith("\\"): filetype = None content_data["error_code"] = 200 content_data["error_message"] = "" content_data["error_description"] = "" if not max_size or content_data["length"] < max_size: try: - content_raw = service.lookup_content_raw(query_string) + content_raw = archive.lookup_content_raw(query_string) except Exception as exc: if raise_if_unavailable: raise exc else: sentry_sdk.capture_exception(exc) content_data["raw_data"] = None content_data["error_code"] = 404 content_data["error_description"] = ( "The bytes of the content are currently not available " "in the archive." ) content_data["error_message"] = http_status_code_message[ content_data["error_code"] ] else: content_data["raw_data"] = content_raw["data"] if not filetype: mimetype, encoding = get_mimetype_and_encoding_for_content( content_data["raw_data"] ) if re_encode: mimetype, encoding, raw_data = _re_encode_content( mimetype, encoding, content_data["raw_data"] ) content_data["raw_data"] = raw_data else: content_data["raw_data"] = None content_data["mimetype"] = mimetype content_data["encoding"] = encoding if language: content_data["language"] = language["lang"] else: content_data["language"] = "not detected" return content_data def prepare_content_for_display(content_data, mime_type, path): """Function that prepares a content for HTML display. The function tries to associate a programming language to a content in order to perform syntax highlighting client-side using highlightjs. The language is determined using either the content filename or its mime type. If the mime type corresponds to an image format supported by web browsers, the content will be encoded in base64 for displaying the image. Args: content_data (bytes): raw bytes of the content mime_type (string): mime type of the content path (string): path of the content including filename Returns: A dict containing the content bytes (possibly different from the one provided as parameter if it is an image) under the key 'content_data and the corresponding highlightjs language class under the key 'language'. """ language = highlightjs.get_hljs_language_from_filename(path) if not language: language = highlightjs.get_hljs_language_from_mime_type(mime_type) if not language: language = "nohighlight" elif mime_type.startswith("application/"): mime_type = mime_type.replace("application/", "text/") if mime_type.startswith("image/"): if mime_type in browsers_supported_image_mimes: content_data = base64.b64encode(content_data).decode("ascii") if mime_type.startswith("image/svg"): mime_type = "image/svg+xml" if mime_type.startswith("text/"): content_data = content_data.decode("utf-8", errors="replace") return {"content_data": content_data, "language": language, "mimetype": mime_type} def gen_link(url, link_text=None, link_attrs=None): """ Utility function for generating an HTML link to insert in Django templates. Args: url (str): an url link_text (str): optional text for the produced link, if not provided the url will be used link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ attrs = " " if link_attrs: for k, v in link_attrs.items(): attrs += '%s="%s" ' % (k, v) if not link_text: link_text = url link = '%s' % (attrs, escape(url), escape(link_text)) return mark_safe(link) def _snapshot_context_query_params(snapshot_context): query_params = {} if not snapshot_context: return query_params if snapshot_context and snapshot_context["origin_info"]: origin_info = snapshot_context["origin_info"] snp_query_params = snapshot_context["query_params"] query_params = {"origin_url": origin_info["url"]} if "timestamp" in snp_query_params: query_params["timestamp"] = snp_query_params["timestamp"] if "visit_id" in snp_query_params: query_params["visit_id"] = snp_query_params["visit_id"] if "snapshot" in snp_query_params and "visit_id" not in query_params: query_params["snapshot"] = snp_query_params["snapshot"] elif snapshot_context: query_params = {"snapshot": snapshot_context["snapshot_id"]} if snapshot_context["release"]: query_params["release"] = snapshot_context["release"] elif snapshot_context["branch"] and snapshot_context["branch"] not in ( "HEAD", snapshot_context["revision_id"], ): query_params["branch"] = snapshot_context["branch"] elif snapshot_context["revision_id"]: query_params["revision"] = snapshot_context["revision_id"] return query_params def gen_revision_url(revision_id, snapshot_context=None): """ Utility function for generating an url to a revision. Args: revision_id (str): a revision id snapshot_context (dict): if provided, generate snapshot-dependent browsing url Returns: str: The url to browse the revision """ query_params = _snapshot_context_query_params(snapshot_context) query_params.pop("revision", None) return reverse( "browse-revision", url_args={"sha1_git": revision_id}, query_params=query_params ) def gen_revision_link( revision_id, shorten_id=False, snapshot_context=None, link_text="Browse", link_attrs={"class": "btn btn-default btn-sm", "role": "button"}, ): """ Utility function for generating a link to a revision HTML view to insert in Django templates. Args: revision_id (str): a revision id shorten_id (boolean): whether to shorten the revision id to 7 characters for the link text snapshot_context (dict): if provided, generate snapshot-dependent browsing link link_text (str): optional text for the generated link (the revision id will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: str: An HTML link in the form 'revision_id' """ if not revision_id: return None revision_url = gen_revision_url(revision_id, snapshot_context) if shorten_id: return gen_link(revision_url, revision_id[:7], link_attrs) else: if not link_text: link_text = revision_id return gen_link(revision_url, link_text, link_attrs) def gen_directory_link( sha1_git, snapshot_context=None, link_text="Browse", link_attrs={"class": "btn btn-default btn-sm", "role": "button"}, ): """ Utility function for generating a link to a directory HTML view to insert in Django templates. Args: sha1_git (str): directory identifier link_text (str): optional text for the generated link (the directory id will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ if not sha1_git: return None query_params = _snapshot_context_query_params(snapshot_context) directory_url = reverse( "browse-directory", url_args={"sha1_git": sha1_git}, query_params=query_params ) if not link_text: link_text = sha1_git return gen_link(directory_url, link_text, link_attrs) def gen_snapshot_link( snapshot_id, snapshot_context=None, link_text="Browse", link_attrs={"class": "btn btn-default btn-sm", "role": "button"}, ): """ Utility function for generating a link to a snapshot HTML view to insert in Django templates. Args: snapshot_id (str): snapshot identifier link_text (str): optional text for the generated link (the snapshot id will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ query_params = _snapshot_context_query_params(snapshot_context) snapshot_url = reverse( "browse-snapshot", url_args={"snapshot_id": snapshot_id}, query_params=query_params, ) if not link_text: link_text = snapshot_id return gen_link(snapshot_url, link_text, link_attrs) def gen_content_link( sha1_git, snapshot_context=None, link_text="Browse", link_attrs={"class": "btn btn-default btn-sm", "role": "button"}, ): """ Utility function for generating a link to a content HTML view to insert in Django templates. Args: sha1_git (str): content identifier link_text (str): optional text for the generated link (the content sha1_git will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ if not sha1_git: return None query_params = _snapshot_context_query_params(snapshot_context) content_url = reverse( "browse-content", url_args={"query_string": "sha1_git:" + sha1_git}, query_params=query_params, ) if not link_text: link_text = sha1_git return gen_link(content_url, link_text, link_attrs) def get_revision_log_url(revision_id, snapshot_context=None): """ Utility function for getting the URL for a revision log HTML view (possibly in the context of an origin). Args: revision_id (str): revision identifier the history heads to snapshot_context (dict): if provided, generate snapshot-dependent browsing link Returns: The revision log view URL """ query_params = {} if snapshot_context: query_params = _snapshot_context_query_params(snapshot_context) query_params["revision"] = revision_id if snapshot_context and snapshot_context["origin_info"]: revision_log_url = reverse("browse-origin-log", query_params=query_params) elif snapshot_context: url_args = {"snapshot_id": snapshot_context["snapshot_id"]} del query_params["snapshot"] revision_log_url = reverse( "browse-snapshot-log", url_args=url_args, query_params=query_params ) else: revision_log_url = reverse( "browse-revision-log", url_args={"sha1_git": revision_id} ) return revision_log_url def gen_revision_log_link( revision_id, snapshot_context=None, link_text="Browse", link_attrs={"class": "btn btn-default btn-sm", "role": "button"}, ): """ Utility function for generating a link to a revision log HTML view (possibly in the context of an origin) to insert in Django templates. Args: revision_id (str): revision identifier the history heads to snapshot_context (dict): if provided, generate snapshot-dependent browsing link link_text (str): optional text to use for the generated link (the revision id will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ if not revision_id: return None revision_log_url = get_revision_log_url(revision_id, snapshot_context) if not link_text: link_text = revision_id return gen_link(revision_log_url, link_text, link_attrs) def gen_person_mail_link(person, link_text=None): """ Utility function for generating a mail link to a person to insert in Django templates. Args: person (dict): dictionary containing person data (*name*, *email*, *fullname*) link_text (str): optional text to use for the generated mail link (the person name will be used by default) Returns: str: A mail link to the person or the person name if no email is present in person data """ person_name = person["name"] or person["fullname"] or "None" if link_text is None: link_text = person_name person_email = person["email"] if person["email"] else None if person_email is None and "@" in person_name and " " not in person_name: person_email = person_name if person_email: return gen_link(url="mailto:%s" % person_email, link_text=link_text) else: return person_name def gen_release_link( sha1_git, snapshot_context=None, link_text="Browse", link_attrs={"class": "btn btn-default btn-sm", "role": "button"}, ): """ Utility function for generating a link to a release HTML view to insert in Django templates. Args: sha1_git (str): release identifier link_text (str): optional text for the generated link (the release id will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ query_params = _snapshot_context_query_params(snapshot_context) release_url = reverse( "browse-release", url_args={"sha1_git": sha1_git}, query_params=query_params ) if not link_text: link_text = sha1_git return gen_link(release_url, link_text, link_attrs) def format_log_entries(revision_log, per_page, snapshot_context=None): """ Utility functions that process raw revision log data for HTML display. Its purpose is to: * add links to relevant browse views * format date in human readable format * truncate the message log Args: revision_log (list): raw revision log as returned by the swh-web api per_page (int): number of log entries per page snapshot_context (dict): if provided, generate snapshot-dependent browsing link """ revision_log_data = [] for i, rev in enumerate(revision_log): if i == per_page: break author_name = "None" author_fullname = "None" committer_fullname = "None" if rev["author"]: author_name = gen_person_mail_link(rev["author"]) author_fullname = rev["author"]["fullname"] if rev["committer"]: committer_fullname = rev["committer"]["fullname"] author_date = format_utc_iso_date(rev["date"]) committer_date = format_utc_iso_date(rev["committer_date"]) tooltip = "revision %s\n" % rev["id"] tooltip += "author: %s\n" % author_fullname tooltip += "author date: %s\n" % author_date tooltip += "committer: %s\n" % committer_fullname tooltip += "committer date: %s\n\n" % committer_date if rev["message"]: tooltip += textwrap.indent(rev["message"], " " * 4) revision_log_data.append( { "author": author_name, "id": rev["id"][:7], "message": rev["message"], "date": author_date, "commit_date": committer_date, "url": gen_revision_url(rev["id"], snapshot_context), "tooltip": tooltip, } ) return revision_log_data # list of common readme names ordered by preference # (lower indices have higher priority) _common_readme_names = [ "readme.markdown", "readme.md", "readme.rst", "readme.txt", "readme", ] def get_readme_to_display(readmes): """ Process a list of readme files found in a directory in order to find the adequate one to display. Args: readmes: a list of dict where keys are readme file names and values are readme sha1s Returns: A tuple (readme_name, readme_sha1) """ readme_name = None readme_url = None readme_sha1 = None readme_html = None lc_readmes = {k.lower(): {"orig_name": k, "sha1": v} for k, v in readmes.items()} # look for readme names according to the preference order # defined by the _common_readme_names list for common_readme_name in _common_readme_names: if common_readme_name in lc_readmes: readme_name = lc_readmes[common_readme_name]["orig_name"] readme_sha1 = lc_readmes[common_readme_name]["sha1"] readme_url = reverse( "browse-content-raw", url_args={"query_string": readme_sha1}, query_params={"re_encode": "true"}, ) break # otherwise pick the first readme like file if any if not readme_name and len(readmes.items()) > 0: readme_name = next(iter(readmes)) readme_sha1 = readmes[readme_name] readme_url = reverse( "browse-content-raw", url_args={"query_string": readme_sha1}, query_params={"re_encode": "true"}, ) # convert rst README to html server side as there is # no viable solution to perform that task client side if readme_name and readme_name.endswith(".rst"): cache_entry_id = "readme_%s" % readme_sha1 cache_entry = cache.get(cache_entry_id) if cache_entry: readme_html = cache_entry else: try: rst_doc = request_content(readme_sha1) readme_html = rst_to_html(rst_doc["raw_data"]) cache.set(cache_entry_id, readme_html) except Exception as exc: sentry_sdk.capture_exception(exc) readme_html = "Readme bytes are not available" return readme_name, readme_url, readme_html diff --git a/swh/web/browse/views/content.py b/swh/web/browse/views/content.py index 9d17693c..eec4958d 100644 --- a/swh/web/browse/views/content.py +++ b/swh/web/browse/views/content.py @@ -1,407 +1,407 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import difflib from distutils.util import strtobool import sentry_sdk from django.http import HttpResponse, JsonResponse from django.shortcuts import render from django.template.defaultfilters import filesizeformat from swh.model.hashutil import hash_to_hex from swh.model.identifiers import CONTENT, DIRECTORY, RELEASE, REVISION, SNAPSHOT from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( content_display_max_size, gen_directory_link, gen_link, prepare_content_for_display, request_content, ) -from swh.web.common import highlightjs, query, service +from swh.web.common import archive, highlightjs, query from swh.web.common.exc import NotFoundExc, handle_view_exception from swh.web.common.identifiers import get_swhids_info from swh.web.common.typing import ContentMetadata, SWHObjectInfo from swh.web.common.utils import gen_path_info, reverse, swh_object_icons @browse_route( r"content/(?P[0-9a-z_:]*[0-9a-f]+.)/raw/", view_name="browse-content-raw", checksum_args=["query_string"], ) def content_raw(request, query_string): """Django view that produces a raw display of a content identified by its hash value. The url that points to it is :http:get:`/browse/content/[(algo_hash):](hash)/raw/` """ try: re_encode = bool(strtobool(request.GET.get("re_encode", "false"))) algo, checksum = query.parse_hash(query_string) checksum = hash_to_hex(checksum) content_data = request_content(query_string, max_size=None, re_encode=re_encode) except Exception as exc: return handle_view_exception(request, exc) filename = request.GET.get("filename", None) if not filename: filename = "%s_%s" % (algo, checksum) if ( content_data["mimetype"].startswith("text/") or content_data["mimetype"] == "inode/x-empty" ): response = HttpResponse(content_data["raw_data"], content_type="text/plain") response["Content-disposition"] = "filename=%s" % filename else: response = HttpResponse( content_data["raw_data"], content_type="application/octet-stream" ) response["Content-disposition"] = "attachment; filename=%s" % filename return response _auto_diff_size_limit = 20000 @browse_route( r"content/(?P.*)/diff/(?P.*)", view_name="diff-contents", ) def _contents_diff(request, from_query_string, to_query_string): """ Browse endpoint used to compute unified diffs between two contents. Diffs are generated only if the two contents are textual. By default, diffs whose size are greater than 20 kB will not be generated. To force the generation of large diffs, the 'force' boolean query parameter must be used. Args: request: input django http request from_query_string: a string of the form "[ALGO_HASH:]HASH" where optional ALGO_HASH can be either ``sha1``, ``sha1_git``, ``sha256``, or ``blake2s256`` (default to ``sha1``) and HASH the hexadecimal representation of the hash value identifying the first content to_query_string: same as above for identifying the second content Returns: A JSON object containing the unified diff. """ diff_data = {} content_from = None content_to = None content_from_size = 0 content_to_size = 0 content_from_lines = [] content_to_lines = [] force = request.GET.get("force", "false") path = request.GET.get("path", None) language = "nohighlight" force = bool(strtobool(force)) if from_query_string == to_query_string: diff_str = "File renamed without changes" else: try: text_diff = True if from_query_string: content_from = request_content(from_query_string, max_size=None) content_from_display_data = prepare_content_for_display( content_from["raw_data"], content_from["mimetype"], path ) language = content_from_display_data["language"] content_from_size = content_from["length"] if not ( content_from["mimetype"].startswith("text/") or content_from["mimetype"] == "inode/x-empty" ): text_diff = False if text_diff and to_query_string: content_to = request_content(to_query_string, max_size=None) content_to_display_data = prepare_content_for_display( content_to["raw_data"], content_to["mimetype"], path ) language = content_to_display_data["language"] content_to_size = content_to["length"] if not ( content_to["mimetype"].startswith("text/") or content_to["mimetype"] == "inode/x-empty" ): text_diff = False diff_size = abs(content_to_size - content_from_size) if not text_diff: diff_str = "Diffs are not generated for non textual content" language = "nohighlight" elif not force and diff_size > _auto_diff_size_limit: diff_str = "Large diffs are not automatically computed" language = "nohighlight" else: if content_from: content_from_lines = ( content_from["raw_data"].decode("utf-8").splitlines(True) ) if content_from_lines and content_from_lines[-1][-1] != "\n": content_from_lines[-1] += "[swh-no-nl-marker]\n" if content_to: content_to_lines = ( content_to["raw_data"].decode("utf-8").splitlines(True) ) if content_to_lines and content_to_lines[-1][-1] != "\n": content_to_lines[-1] += "[swh-no-nl-marker]\n" diff_lines = difflib.unified_diff(content_from_lines, content_to_lines) diff_str = "".join(list(diff_lines)[2:]) except Exception as exc: sentry_sdk.capture_exception(exc) diff_str = str(exc) diff_data["diff_str"] = diff_str diff_data["language"] = language return JsonResponse(diff_data) @browse_route( r"content/(?P[0-9a-z_:]*[0-9a-f]+.)/", view_name="browse-content", checksum_args=["query_string"], ) def content_display(request, query_string): """Django view that produces an HTML display of a content identified by its hash value. The url that points to it is :http:get:`/browse/content/[(algo_hash):](hash)/` """ try: algo, checksum = query.parse_hash(query_string) checksum = hash_to_hex(checksum) content_data = request_content(query_string, raise_if_unavailable=False) origin_url = request.GET.get("origin_url") selected_language = request.GET.get("language") if not origin_url: origin_url = request.GET.get("origin") snapshot_id = request.GET.get("snapshot") path = request.GET.get("path") snapshot_context = None if origin_url is not None or snapshot_id is not None: try: snapshot_context = get_snapshot_context( origin_url=origin_url, snapshot_id=snapshot_id, branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), path=path, browse_context=CONTENT, ) except NotFoundExc as e: if str(e).startswith("Origin"): raw_cnt_url = reverse( "browse-content", url_args={"query_string": query_string} ) error_message = ( "The Software Heritage archive has a content " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the content " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_cnt_url)) ) raise NotFoundExc(error_message) else: raise e except Exception as exc: return handle_view_exception(request, exc) content = None language = None mimetype = None if content_data["raw_data"] is not None: content_display_data = prepare_content_for_display( content_data["raw_data"], content_data["mimetype"], path ) content = content_display_data["content_data"] language = content_display_data["language"] mimetype = content_display_data["mimetype"] # Override language with user-selected language if selected_language is not None: language = selected_language available_languages = None if mimetype and "text/" in mimetype: available_languages = highlightjs.get_supported_languages() filename = None path_info = None directory_id = None directory_url = None root_dir = None if snapshot_context: root_dir = snapshot_context.get("root_directory") query_params = snapshot_context["query_params"] if snapshot_context else {} breadcrumbs = [] if path: split_path = path.split("/") root_dir = root_dir or split_path[0] filename = split_path[-1] if root_dir != path: path = path.replace(root_dir + "/", "") path = path[: -len(filename)] path_info = gen_path_info(path) query_params.pop("path", None) dir_url = reverse( "browse-directory", url_args={"sha1_git": root_dir}, query_params=query_params, ) breadcrumbs.append({"name": root_dir[:7], "url": dir_url}) for pi in path_info: query_params["path"] = pi["path"] dir_url = reverse( "browse-directory", url_args={"sha1_git": root_dir}, query_params=query_params, ) breadcrumbs.append({"name": pi["name"], "url": dir_url}) breadcrumbs.append({"name": filename, "url": None}) if path and root_dir != path: try: - dir_info = service.lookup_directory_with_path(root_dir, path) + dir_info = archive.lookup_directory_with_path(root_dir, path) directory_id = dir_info["target"] except Exception as exc: return handle_view_exception(request, exc) elif root_dir != path: directory_id = root_dir else: root_dir = None if directory_id: directory_url = gen_directory_link(directory_id) query_params = {"filename": filename} content_checksums = content_data["checksums"] content_url = reverse( "browse-content", url_args={"query_string": f'sha1_git:{content_checksums["sha1_git"]}'}, ) content_raw_url = reverse( "browse-content-raw", url_args={"query_string": query_string}, query_params=query_params, ) content_metadata = ContentMetadata( object_type=CONTENT, object_id=content_checksums["sha1_git"], sha1=content_checksums["sha1"], sha1_git=content_checksums["sha1_git"], sha256=content_checksums["sha256"], blake2s256=content_checksums["blake2s256"], content_url=content_url, mimetype=content_data["mimetype"], encoding=content_data["encoding"], size=filesizeformat(content_data["length"]), language=content_data["language"], root_directory=root_dir, path=f"/{path}" if path else None, filename=filename or "", directory=directory_id, directory_url=directory_url, revision=None, release=None, snapshot=None, origin_url=origin_url, ) swh_objects = [ SWHObjectInfo(object_type=CONTENT, object_id=content_checksums["sha1_git"]) ] if directory_id: swh_objects.append(SWHObjectInfo(object_type=DIRECTORY, object_id=directory_id)) if snapshot_context: swh_objects.append( SWHObjectInfo( object_type=REVISION, object_id=snapshot_context["revision_id"] ) ) swh_objects.append( SWHObjectInfo( object_type=SNAPSHOT, object_id=snapshot_context["snapshot_id"] ) ) if snapshot_context["release_id"]: swh_objects.append( SWHObjectInfo( object_type=RELEASE, object_id=snapshot_context["release_id"] ) ) swhids_info = get_swhids_info( swh_objects, snapshot_context, extra_context=content_metadata, ) heading = "Content - %s" % content_checksums["sha1_git"] if breadcrumbs: content_path = "/".join([bc["name"] for bc in breadcrumbs]) heading += " - %s" % content_path return render( request, "browse/content.html", { "heading": heading, "swh_object_id": swhids_info[0]["swhid"], "swh_object_name": "Content", "swh_object_metadata": content_metadata, "content": content, "content_size": content_data["length"], "max_content_size": content_display_max_size, "filename": filename, "encoding": content_data["encoding"], "mimetype": mimetype, "language": language, "available_languages": available_languages, "breadcrumbs": breadcrumbs, "top_right_link": { "url": content_raw_url, "icon": swh_object_icons["content"], "text": "Raw File", }, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": True, "swhids_info": swhids_info, "error_code": content_data["error_code"], "error_message": content_data["error_message"], "error_description": content_data["error_description"], }, status=content_data["error_code"], ) diff --git a/swh/web/browse/views/directory.py b/swh/web/browse/views/directory.py index b4bcf3a5..ffb223b8 100644 --- a/swh/web/browse/views/directory.py +++ b/swh/web/browse/views/directory.py @@ -1,272 +1,272 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import os import sentry_sdk from django.http import HttpResponse from django.shortcuts import redirect, render from django.template.defaultfilters import filesizeformat from swh.model.identifiers import DIRECTORY, RELEASE, REVISION, SNAPSHOT from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import gen_link, get_directory_entries, get_readme_to_display -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import NotFoundExc, handle_view_exception from swh.web.common.identifiers import get_swhids_info from swh.web.common.typing import DirectoryMetadata, SWHObjectInfo from swh.web.common.utils import gen_path_info, reverse, swh_object_icons def _directory_browse(request, sha1_git, path=None): root_sha1_git = sha1_git try: if path: - dir_info = service.lookup_directory_with_path(sha1_git, path) + dir_info = archive.lookup_directory_with_path(sha1_git, path) sha1_git = dir_info["target"] dirs, files = get_directory_entries(sha1_git) origin_url = request.GET.get("origin_url") if not origin_url: origin_url = request.GET.get("origin") snapshot_id = request.GET.get("snapshot") snapshot_context = None if origin_url is not None or snapshot_id is not None: try: snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), path=path, ) except NotFoundExc as e: if str(e).startswith("Origin"): raw_dir_url = reverse( "browse-directory", url_args={"sha1_git": sha1_git} ) error_message = ( "The Software Heritage archive has a directory " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the directory " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_dir_url)) ) raise NotFoundExc(error_message) else: raise e except Exception as exc: return handle_view_exception(request, exc) path_info = gen_path_info(path) query_params = snapshot_context["query_params"] if snapshot_context else {} breadcrumbs = [] breadcrumbs.append( { "name": root_sha1_git[:7], "url": reverse( "browse-directory", url_args={"sha1_git": root_sha1_git}, query_params=query_params, ), } ) for pi in path_info: breadcrumbs.append( { "name": pi["name"], "url": reverse( "browse-directory", url_args={"sha1_git": root_sha1_git}, query_params={"path": pi["path"], **query_params}, ), } ) path = "" if path is None else (path + "/") for d in dirs: if d["type"] == "rev": d["url"] = reverse( "browse-revision", url_args={"sha1_git": d["target"]}, query_params=query_params, ) else: d["url"] = reverse( "browse-directory", url_args={"sha1_git": root_sha1_git}, query_params={"path": path + d["name"], **query_params}, ) sum_file_sizes = 0 readmes = {} for f in files: query_string = "sha1_git:" + f["target"] f["url"] = reverse( "browse-content", url_args={"query_string": query_string}, query_params={ "path": root_sha1_git + "/" + path + f["name"], **query_params, }, ) if f["length"] is not None: sum_file_sizes += f["length"] f["length"] = filesizeformat(f["length"]) if f["name"].lower().startswith("readme"): readmes[f["name"]] = f["checksums"]["sha1"] readme_name, readme_url, readme_html = get_readme_to_display(readmes) sum_file_sizes = filesizeformat(sum_file_sizes) dir_metadata = DirectoryMetadata( object_type=DIRECTORY, object_id=sha1_git, directory=root_sha1_git, nb_files=len(files), nb_dirs=len(dirs), sum_file_sizes=sum_file_sizes, root_directory=root_sha1_git, path=f"/{path}" if path else None, revision=None, revision_found=None, release=None, snapshot=None, ) vault_cooking = { "directory_context": True, "directory_id": sha1_git, "revision_context": False, "revision_id": None, } swh_objects = [SWHObjectInfo(object_type=DIRECTORY, object_id=sha1_git)] if snapshot_context: swh_objects.append( SWHObjectInfo( object_type=REVISION, object_id=snapshot_context["revision_id"] ) ) swh_objects.append( SWHObjectInfo( object_type=SNAPSHOT, object_id=snapshot_context["snapshot_id"] ) ) if snapshot_context["release_id"]: swh_objects.append( SWHObjectInfo( object_type=RELEASE, object_id=snapshot_context["release_id"] ) ) swhids_info = get_swhids_info(swh_objects, snapshot_context, dir_metadata) heading = "Directory - %s" % sha1_git if breadcrumbs: dir_path = "/".join([bc["name"] for bc in breadcrumbs]) + "/" heading += " - %s" % dir_path top_right_link = None if snapshot_context is not None and not snapshot_context["is_empty"]: history_url = reverse( "browse-revision-log", url_args={"sha1_git": snapshot_context["revision_id"]}, query_params=query_params, ) top_right_link = { "url": history_url, "icon": swh_object_icons["revisions history"], "text": "History", } return render( request, "browse/directory.html", { "heading": heading, "swh_object_id": swhids_info[0]["swhid"], "swh_object_name": "Directory", "swh_object_metadata": dir_metadata, "dirs": dirs, "files": files, "breadcrumbs": breadcrumbs, "top_right_link": top_right_link, "readme_name": readme_name, "readme_url": readme_url, "readme_html": readme_html, "snapshot_context": snapshot_context, "vault_cooking": vault_cooking, "show_actions": True, "swhids_info": swhids_info, }, ) @browse_route( r"directory/(?P[0-9a-f]+)/", view_name="browse-directory", checksum_args=["sha1_git"], ) def directory_browse(request, sha1_git): """Django view for browsing the content of a directory identified by its sha1_git value. The url that points to it is :http:get:`/browse/directory/(sha1_git)/` """ return _directory_browse(request, sha1_git, request.GET.get("path")) @browse_route( r"directory/(?P[0-9a-f]+)/(?P.+)/", view_name="browse-directory-legacy", checksum_args=["sha1_git"], ) def directory_browse_legacy(request, sha1_git, path): """Django view for browsing the content of a directory identified by its sha1_git value. The url that points to it is :http:get:`/browse/directory/(sha1_git)/(path)/` """ return _directory_browse(request, sha1_git, path) @browse_route( r"directory/resolve/content-path/(?P[0-9a-f]+)/", view_name="browse-directory-resolve-content-path", checksum_args=["sha1_git"], ) def _directory_resolve_content_path(request, sha1_git): """ Internal endpoint redirecting to data url for a specific file path relative to a root directory. """ try: path = os.path.normpath(request.GET.get("path")) if not path.startswith("../"): - dir_info = service.lookup_directory_with_path(sha1_git, path) + dir_info = archive.lookup_directory_with_path(sha1_git, path) if dir_info["type"] == "file": sha1 = dir_info["checksums"]["sha1"] data_url = reverse( "browse-content-raw", url_args={"query_string": sha1} ) return redirect(data_url) except Exception as exc: sentry_sdk.capture_exception(exc) return HttpResponse(status=404) diff --git a/swh/web/browse/views/origin.py b/swh/web/browse/views/origin.py index 1ebcee5d..a8cac672 100644 --- a/swh/web/browse/views/origin.py +++ b/swh/web/browse/views/origin.py @@ -1,319 +1,319 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.shortcuts import redirect, render from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import ( browse_snapshot_branches, browse_snapshot_content, browse_snapshot_directory, browse_snapshot_log, browse_snapshot_releases, get_snapshot_context, ) -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import BadInputExc, handle_view_exception from swh.web.common.origin_visits import get_origin_visits from swh.web.common.utils import format_utc_iso_date, parse_iso8601_date_to_utc, reverse @browse_route( r"origin/directory/", view_name="browse-origin-directory", ) def origin_directory_browse(request): """Django view for browsing the content of a directory associated to an origin for a given visit. The URL that points to it is :http:get:`/browse/origin/directory/` """ return browse_snapshot_directory( request, origin_url=request.GET.get("origin_url"), snapshot_id=request.GET.get("snapshot"), timestamp=request.GET.get("timestamp"), path=request.GET.get("path"), ) @browse_route( r"origin/(?P.+)/visit/(?P.+)/directory/", r"origin/(?P.+)/visit/(?P.+)/directory/(?P.+)/", r"origin/(?P.+)/directory/(?P.+)/", r"origin/(?P.+)/directory/", view_name="browse-origin-directory-legacy", ) def origin_directory_browse_legacy(request, origin_url, timestamp=None, path=None): """Django view for browsing the content of a directory associated to an origin for a given visit. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/directory/[(path)/]` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/directory/[(path)/]` """ return browse_snapshot_directory( request, origin_url=origin_url, snapshot_id=request.GET.get("snapshot"), timestamp=timestamp, path=path, ) @browse_route( r"origin/content/", view_name="browse-origin-content", ) def origin_content_browse(request): """Django view that produces an HTML display of a content associated to an origin for a given visit. The URL that points to it is :http:get:`/browse/origin/content/` """ return browse_snapshot_content( request, origin_url=request.GET.get("origin_url"), snapshot_id=request.GET.get("snapshot"), timestamp=request.GET.get("timestamp"), path=request.GET.get("path"), selected_language=request.GET.get("language"), ) @browse_route( r"origin/(?P.+)/visit/(?P.+)/content/(?P.+)/", r"origin/(?P.+)/content/(?P.+)/", r"origin/(?P.+)/content/", view_name="browse-origin-content-legacy", ) def origin_content_browse_legacy(request, origin_url, path=None, timestamp=None): """Django view that produces an HTML display of a content associated to an origin for a given visit. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/content/(path)/` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/content/(path)/` """ return browse_snapshot_content( request, origin_url=origin_url, snapshot_id=request.GET.get("snapshot"), timestamp=timestamp, path=path, selected_language=request.GET.get("language"), ) @browse_route( r"origin/log/", view_name="browse-origin-log", ) def origin_log_browse(request): """Django view that produces an HTML display of revisions history (aka the commit log) associated to a software origin. The URL that points to it is :http:get:`/browse/origin/log/` """ return browse_snapshot_log( request, origin_url=request.GET.get("origin_url"), snapshot_id=request.GET.get("snapshot"), timestamp=request.GET.get("timestamp"), ) @browse_route( r"origin/(?P.+)/visit/(?P.+)/log/", r"origin/(?P.+)/log/", view_name="browse-origin-log-legacy", ) def origin_log_browse_legacy(request, origin_url, timestamp=None): """Django view that produces an HTML display of revisions history (aka the commit log) associated to a software origin. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/log/` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/log/` """ return browse_snapshot_log( request, origin_url=origin_url, snapshot_id=request.GET.get("snapshot"), timestamp=timestamp, ) @browse_route( r"origin/branches/", view_name="browse-origin-branches", ) def origin_branches_browse(request): """Django view that produces an HTML display of the list of branches associated to an origin for a given visit. The URL that points to it is :http:get:`/browse/origin/branches/` """ return browse_snapshot_branches( request, origin_url=request.GET.get("origin_url"), snapshot_id=request.GET.get("snapshot"), timestamp=request.GET.get("timestamp"), ) @browse_route( r"origin/(?P.+)/visit/(?P.+)/branches/", r"origin/(?P.+)/branches/", view_name="browse-origin-branches-legacy", ) def origin_branches_browse_legacy(request, origin_url, timestamp=None): """Django view that produces an HTML display of the list of branches associated to an origin for a given visit. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/branches/` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/branches/` """ return browse_snapshot_branches( request, origin_url=origin_url, snapshot_id=request.GET.get("snapshot"), timestamp=timestamp, ) @browse_route( r"origin/releases/", view_name="browse-origin-releases", ) def origin_releases_browse(request): """Django view that produces an HTML display of the list of releases associated to an origin for a given visit. The URL that points to it is :http:get:`/browse/origin/releases/` """ return browse_snapshot_releases( request, origin_url=request.GET.get("origin_url"), snapshot_id=request.GET.get("snapshot"), timestamp=request.GET.get("timestamp"), ) @browse_route( r"origin/(?P.+)/visit/(?P.+)/releases/", r"origin/(?P.+)/releases/", view_name="browse-origin-releases-legacy", ) def origin_releases_browse_legacy(request, origin_url, timestamp=None): """Django view that produces an HTML display of the list of releases associated to an origin for a given visit. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/releases/` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/releases/` """ return browse_snapshot_releases( request, origin_url=origin_url, snapshot_id=request.GET.get("snapshot"), timestamp=timestamp, ) def _origin_visits_browse(request, origin_url): try: if origin_url is None: raise BadInputExc("An origin URL must be provided as query parameter.") - origin_info = service.lookup_origin({"url": origin_url}) + origin_info = archive.lookup_origin({"url": origin_url}) origin_visits = get_origin_visits(origin_info) snapshot_context = get_snapshot_context(origin_url=origin_url) except Exception as exc: return handle_view_exception(request, exc) for i, visit in enumerate(origin_visits): url_date = format_utc_iso_date(visit["date"], "%Y-%m-%dT%H:%M:%SZ") visit["formatted_date"] = format_utc_iso_date(visit["date"]) query_params = {"origin_url": origin_url, "timestamp": url_date} if i < len(origin_visits) - 1: if visit["date"] == origin_visits[i + 1]["date"]: query_params = {"visit_id": visit["visit"]} if i > 0: if visit["date"] == origin_visits[i - 1]["date"]: query_params = {"visit_id": visit["visit"]} snapshot = visit["snapshot"] if visit["snapshot"] else "" visit["url"] = reverse("browse-origin-directory", query_params=query_params,) if not snapshot: visit["snapshot"] = "" visit["date"] = parse_iso8601_date_to_utc(visit["date"]).timestamp() heading = "Origin visits - %s" % origin_url return render( request, "browse/origin-visits.html", { "heading": heading, "swh_object_name": "Visits", "swh_object_metadata": origin_info, "origin_visits": origin_visits, "origin_info": origin_info, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": False, }, ) @browse_route(r"origin/visits/", view_name="browse-origin-visits") def origin_visits_browse(request): """Django view that produces an HTML display of visits reporting for a given origin. The URL that points to it is :http:get:`/browse/origin/visits/`. """ return _origin_visits_browse(request, request.GET.get("origin_url")) @browse_route( r"origin/(?P.+)/visits/", view_name="browse-origin-visits-legacy" ) def origin_visits_browse_legacy(request, origin_url): """Django view that produces an HTML display of visits reporting for a given origin. The URL that points to it is :http:get:`/browse/origin/(origin_url)/visits/`. """ return _origin_visits_browse(request, origin_url) @browse_route(r"origin/", view_name="browse-origin") def origin_browse(request): """Django view that redirects to the display of the latest archived snapshot for a given software origin. """ last_snapshot_url = reverse("browse-origin-directory", query_params=request.GET,) return redirect(last_snapshot_url) @browse_route(r"origin/(?P.+)/", view_name="browse-origin-legacy") def origin_browse_legacy(request, origin_url): """Django view that redirects to the display of the latest archived snapshot for a given software origin. """ last_snapshot_url = reverse( "browse-origin-directory", query_params={"origin_url": origin_url, **request.GET}, ) return redirect(last_snapshot_url) diff --git a/swh/web/browse/views/release.py b/swh/web/browse/views/release.py index c3162865..4f7a5b9e 100644 --- a/swh/web/browse/views/release.py +++ b/swh/web/browse/views/release.py @@ -1,242 +1,242 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import sentry_sdk from django.shortcuts import render from swh.model.identifiers import CONTENT, DIRECTORY, RELEASE, REVISION, SNAPSHOT from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( gen_content_link, gen_directory_link, gen_link, gen_person_mail_link, gen_release_link, gen_revision_link, gen_snapshot_link, ) -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import NotFoundExc, handle_view_exception from swh.web.common.identifiers import get_swhids_info from swh.web.common.typing import ReleaseMetadata, SWHObjectInfo from swh.web.common.utils import format_utc_iso_date, reverse @browse_route( r"release/(?P[0-9a-f]+)/", view_name="browse-release", checksum_args=["sha1_git"], ) def release_browse(request, sha1_git): """ Django view that produces an HTML display of a release identified by its id. The url that points to it is :http:get:`/browse/release/(sha1_git)/`. """ try: - release = service.lookup_release(sha1_git) + release = archive.lookup_release(sha1_git) snapshot_context = {} origin_info = None snapshot_id = request.GET.get("snapshot_id") if not snapshot_id: snapshot_id = request.GET.get("snapshot") origin_url = request.GET.get("origin_url") if not origin_url: origin_url = request.GET.get("origin") timestamp = request.GET.get("timestamp") visit_id = request.GET.get("visit_id") if origin_url: try: snapshot_context = get_snapshot_context( snapshot_id, origin_url, timestamp, visit_id ) except NotFoundExc as e: raw_rel_url = reverse("browse-release", url_args={"sha1_git": sha1_git}) error_message = ( "The Software Heritage archive has a release " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the release " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_rel_url)) ) if str(e).startswith("Origin"): raise NotFoundExc(error_message) else: raise e origin_info = snapshot_context["origin_info"] elif snapshot_id: snapshot_context = get_snapshot_context(snapshot_id) except Exception as exc: return handle_view_exception(request, exc) target_url = None if release["target_type"] == REVISION: target_url = gen_revision_link(release["target"]) elif release["target_type"] == CONTENT: target_url = gen_content_link(release["target"]) elif release["target_type"] == DIRECTORY: target_url = gen_directory_link(release["target"]) elif release["target_type"] == RELEASE: target_url = gen_release_link(release["target"]) snapshot_id = None browse_snp_link = None if snapshot_context: snapshot_id = snapshot_context["snapshot_id"] browse_snp_link = gen_snapshot_link(snapshot_id) release_metadata = ReleaseMetadata( object_type=RELEASE, object_id=sha1_git, release=sha1_git, release_url=gen_release_link(release["id"]), author=release["author"]["fullname"] if release["author"] else "None", author_url=gen_person_mail_link(release["author"]) if release["author"] else "None", date=format_utc_iso_date(release["date"]), name=release["name"], synthetic=release["synthetic"], target=release["target"], target_type=release["target_type"], target_url=target_url, snapshot=snapshot_context.get("snapshot_id", None), snapshot_url=browse_snp_link, origin_url=origin_url, ) release_note_lines = [] if release["message"]: release_note_lines = release["message"].split("\n") vault_cooking = None rev_directory = None target_link = None if release["target_type"] == REVISION: target_link = gen_revision_link( release["target"], snapshot_context=snapshot_context, link_text=None, link_attrs=None, ) try: - revision = service.lookup_revision(release["target"]) + revision = archive.lookup_revision(release["target"]) rev_directory = revision["directory"] vault_cooking = { "directory_context": True, "directory_id": rev_directory, "revision_context": True, "revision_id": release["target"], } except Exception as exc: sentry_sdk.capture_exception(exc) elif release["target_type"] == DIRECTORY: target_link = gen_directory_link( release["target"], snapshot_context=snapshot_context, link_text=None, link_attrs=None, ) try: # check directory exists - service.lookup_directory(release["target"]) + archive.lookup_directory(release["target"]) vault_cooking = { "directory_context": True, "directory_id": release["target"], "revision_context": False, "revision_id": None, } except Exception as exc: sentry_sdk.capture_exception(exc) elif release["target_type"] == CONTENT: target_link = gen_content_link( release["target"], snapshot_context=snapshot_context, link_text=None, link_attrs=None, ) elif release["target_type"] == RELEASE: target_link = gen_release_link( release["target"], snapshot_context=snapshot_context, link_text=None, link_attrs=None, ) rev_directory_url = None if rev_directory is not None: if origin_info: rev_directory_url = reverse( "browse-origin-directory", query_params={ "origin_url": origin_info["url"], "release": release["name"], "snapshot": snapshot_id, }, ) elif snapshot_id: rev_directory_url = reverse( "browse-snapshot-directory", url_args={"snapshot_id": snapshot_id}, query_params={"release": release["name"]}, ) else: rev_directory_url = reverse( "browse-directory", url_args={"sha1_git": rev_directory} ) directory_link = None if rev_directory_url is not None: directory_link = gen_link(rev_directory_url, rev_directory) release["directory_link"] = directory_link release["target_link"] = target_link swh_objects = [SWHObjectInfo(object_type=RELEASE, object_id=sha1_git)] if snapshot_context: snapshot_id = snapshot_context["snapshot_id"] if snapshot_id: swh_objects.append(SWHObjectInfo(object_type=SNAPSHOT, object_id=snapshot_id)) swhids_info = get_swhids_info(swh_objects, snapshot_context) note_header = "None" if len(release_note_lines) > 0: note_header = release_note_lines[0] release["note_header"] = note_header release["note_body"] = "\n".join(release_note_lines[1:]) heading = "Release - %s" % release["name"] if snapshot_context: context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading += " - %s" % context_found return render( request, "browse/release.html", { "heading": heading, "swh_object_id": swhids_info[0]["swhid"], "swh_object_name": "Release", "swh_object_metadata": release_metadata, "release": release, "snapshot_context": snapshot_context, "show_actions": True, "breadcrumbs": None, "vault_cooking": vault_cooking, "top_right_link": None, "swhids_info": swhids_info, }, ) diff --git a/swh/web/browse/views/revision.py b/swh/web/browse/views/revision.py index 915dc45e..493a4198 100644 --- a/swh/web/browse/views/revision.py +++ b/swh/web/browse/views/revision.py @@ -1,599 +1,599 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import hashlib import json import textwrap from django.http import JsonResponse from django.shortcuts import render from django.template.defaultfilters import filesizeformat from django.utils.safestring import mark_safe from swh.model.identifiers import CONTENT, DIRECTORY, REVISION, SNAPSHOT, swhid from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( content_display_max_size, format_log_entries, gen_directory_link, gen_link, gen_person_mail_link, gen_revision_link, gen_revision_url, gen_snapshot_link, get_directory_entries, get_readme_to_display, get_revision_log_url, prepare_content_for_display, request_content, ) -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import NotFoundExc, handle_view_exception from swh.web.common.identifiers import get_swhids_info from swh.web.common.typing import RevisionMetadata, SWHObjectInfo from swh.web.common.utils import ( format_utc_iso_date, gen_path_info, reverse, swh_object_icons, ) def _gen_content_url(revision, query_string, path, snapshot_context): if snapshot_context: query_params = snapshot_context["query_params"] query_params["path"] = path query_params["revision"] = revision["id"] content_url = reverse("browse-origin-content", query_params=query_params) else: content_path = "%s/%s" % (revision["directory"], path) content_url = reverse( "browse-content", url_args={"query_string": query_string}, query_params={"path": content_path}, ) return content_url def _gen_diff_link(idx, diff_anchor, link_text): if idx < _max_displayed_file_diffs: return gen_link(diff_anchor, link_text) else: return link_text # TODO: put in conf _max_displayed_file_diffs = 1000 def _gen_revision_changes_list(revision, changes, snapshot_context): """ Returns a HTML string describing the file changes introduced in a revision. As this string will be displayed in the browse revision view, links to adequate file diffs are also generated. Args: revision (str): hexadecimal representation of a revision identifier changes (list): list of file changes in the revision snapshot_context (dict): optional origin context used to reverse the content urls Returns: A string to insert in a revision HTML view. """ changes_msg = [] for i, change in enumerate(changes): hasher = hashlib.sha1() from_query_string = "" to_query_string = "" diff_id = "diff-" if change["from"]: from_query_string = "sha1_git:" + change["from"]["target"] diff_id += change["from"]["target"] + "-" + change["from_path"] diff_id += "-" if change["to"]: to_query_string = "sha1_git:" + change["to"]["target"] diff_id += change["to"]["target"] + change["to_path"] change["path"] = change["to_path"] or change["from_path"] url_args = { "from_query_string": from_query_string, "to_query_string": to_query_string, } query_params = {"path": change["path"]} change["diff_url"] = reverse( "diff-contents", url_args=url_args, query_params=query_params ) hasher.update(diff_id.encode("utf-8")) diff_id = hasher.hexdigest() change["id"] = diff_id diff_link = "#diff_" + diff_id if change["type"] == "modify": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) changes_msg.append( "modified: %s" % _gen_diff_link(i, diff_link, change["to_path"]) ) elif change["type"] == "insert": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) changes_msg.append( "new file: %s" % _gen_diff_link(i, diff_link, change["to_path"]) ) elif change["type"] == "delete": - parent = service.lookup_revision(revision["parents"][0]) + parent = archive.lookup_revision(revision["parents"][0]) change["content_url"] = _gen_content_url( parent, from_query_string, change["from_path"], snapshot_context ) changes_msg.append( "deleted: %s" % _gen_diff_link(i, diff_link, change["from_path"]) ) elif change["type"] == "rename": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) link_text = change["from_path"] + " → " + change["to_path"] changes_msg.append( "renamed: %s" % _gen_diff_link(i, diff_link, link_text) ) if not changes: changes_msg.append("No changes") return mark_safe("\n".join(changes_msg)) @browse_route( r"revision/(?P[0-9a-f]+)/diff/", view_name="diff-revision", checksum_args=["sha1_git"], ) def _revision_diff(request, sha1_git): """ Browse internal endpoint to compute revision diff """ try: - revision = service.lookup_revision(sha1_git) + revision = archive.lookup_revision(sha1_git) snapshot_context = None origin_url = request.GET.get("origin_url", None) if not origin_url: origin_url = request.GET.get("origin", None) timestamp = request.GET.get("timestamp", None) visit_id = request.GET.get("visit_id", None) if origin_url: snapshot_context = get_snapshot_context( origin_url=origin_url, timestamp=timestamp, visit_id=visit_id ) except Exception as exc: return handle_view_exception(request, exc) - changes = service.diff_revision(sha1_git) + changes = archive.diff_revision(sha1_git) changes_msg = _gen_revision_changes_list(revision, changes, snapshot_context) diff_data = { "total_nb_changes": len(changes), "changes": changes[:_max_displayed_file_diffs], "changes_msg": changes_msg, } return JsonResponse(diff_data) NB_LOG_ENTRIES = 100 @browse_route( r"revision/(?P[0-9a-f]+)/log/", view_name="browse-revision-log", checksum_args=["sha1_git"], ) def revision_log_browse(request, sha1_git): """ Django view that produces an HTML display of the history log for a revision identified by its id. The url that points to it is :http:get:`/browse/revision/(sha1_git)/log/` """ try: origin_url = request.GET.get("origin_url") snapshot_id = request.GET.get("snapshot") snapshot_context = None if origin_url or snapshot_id: snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=request.GET.get("timestamp"), visit_id=request.GET.get("visit_id"), branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=sha1_git, ) per_page = int(request.GET.get("per_page", NB_LOG_ENTRIES)) offset = int(request.GET.get("offset", 0)) revs_ordering = request.GET.get("revs_ordering", "committer_date") session_key = "rev_%s_log_ordering_%s" % (sha1_git, revs_ordering) rev_log_session = request.session.get(session_key, None) rev_log = [] revs_walker_state = None if rev_log_session: rev_log = rev_log_session["rev_log"] revs_walker_state = rev_log_session["revs_walker_state"] if len(rev_log) < offset + per_page: - revs_walker = service.get_revisions_walker( + revs_walker = archive.get_revisions_walker( revs_ordering, sha1_git, max_revs=offset + per_page + 1, state=revs_walker_state, ) rev_log += [rev["id"] for rev in revs_walker] revs_walker_state = revs_walker.export_state() revs = rev_log[offset : offset + per_page] - revision_log = service.lookup_revision_multiple(revs) + revision_log = archive.lookup_revision_multiple(revs) request.session[session_key] = { "rev_log": rev_log, "revs_walker_state": revs_walker_state, } except Exception as exc: return handle_view_exception(request, exc) revs_ordering = request.GET.get("revs_ordering", "") prev_log_url = None if len(rev_log) > offset + per_page: prev_log_url = reverse( "browse-revision-log", url_args={"sha1_git": sha1_git}, query_params={ "per_page": per_page, "offset": offset + per_page, "revs_ordering": revs_ordering or None, }, ) next_log_url = None if offset != 0: next_log_url = reverse( "browse-revision-log", url_args={"sha1_git": sha1_git}, query_params={ "per_page": per_page, "offset": offset - per_page, "revs_ordering": revs_ordering or None, }, ) revision_log_data = format_log_entries(revision_log, per_page) swh_rev_id = swhid("revision", sha1_git) return render( request, "browse/revision-log.html", { "heading": "Revision history", "swh_object_id": swh_rev_id, "swh_object_name": "Revisions history", "swh_object_metadata": None, "revision_log": revision_log_data, "revs_ordering": revs_ordering, "next_log_url": next_log_url, "prev_log_url": prev_log_url, "breadcrumbs": None, "top_right_link": None, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": True, "swhids_info": None, }, ) @browse_route( r"revision/(?P[0-9a-f]+)/", view_name="browse-revision", checksum_args=["sha1_git"], ) def revision_browse(request, sha1_git): """ Django view that produces an HTML display of a revision identified by its id. The url that points to it is :http:get:`/browse/revision/(sha1_git)/`. """ try: - revision = service.lookup_revision(sha1_git) + revision = archive.lookup_revision(sha1_git) origin_info = None snapshot_context = None origin_url = request.GET.get("origin_url") if not origin_url: origin_url = request.GET.get("origin") timestamp = request.GET.get("timestamp") visit_id = request.GET.get("visit_id") snapshot_id = request.GET.get("snapshot_id") if not snapshot_id: snapshot_id = request.GET.get("snapshot") path = request.GET.get("path") dir_id = None dirs, files = None, None content_data = {} if origin_url: try: snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=visit_id, branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=sha1_git, ) except NotFoundExc as e: raw_rev_url = reverse( "browse-revision", url_args={"sha1_git": sha1_git} ) error_message = ( "The Software Heritage archive has a revision " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the revision " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_rev_url)) ) if str(e).startswith("Origin"): raise NotFoundExc(error_message) else: raise e origin_info = snapshot_context["origin_info"] snapshot_id = snapshot_context["snapshot_id"] elif snapshot_id: snapshot_context = get_snapshot_context(snapshot_id) if path: - file_info = service.lookup_directory_with_path(revision["directory"], path) + file_info = archive.lookup_directory_with_path(revision["directory"], path) if file_info["type"] == "dir": dir_id = file_info["target"] else: query_string = "sha1_git:" + file_info["target"] content_data = request_content(query_string, raise_if_unavailable=False) else: dir_id = revision["directory"] if dir_id: path = "" if path is None else (path + "/") dirs, files = get_directory_entries(dir_id) except Exception as exc: return handle_view_exception(request, exc) revision_metadata = RevisionMetadata( object_type=REVISION, object_id=sha1_git, revision=sha1_git, revision_url=gen_revision_link(sha1_git), author=revision["author"]["fullname"] if revision["author"] else "None", author_url=gen_person_mail_link(revision["author"]) if revision["author"] else "None", committer=revision["committer"]["fullname"] if revision["committer"] else "None", committer_url=gen_person_mail_link(revision["committer"]) if revision["committer"] else "None", committer_date=format_utc_iso_date(revision["committer_date"]), date=format_utc_iso_date(revision["date"]), directory=revision["directory"], directory_url=gen_directory_link(revision["directory"]), merge=revision["merge"], metadata=json.dumps( revision["metadata"], sort_keys=True, indent=4, separators=(",", ": ") ), parents=revision["parents"], synthetic=revision["synthetic"], type=revision["type"], snapshot=snapshot_id, snapshot_url=gen_snapshot_link(snapshot_id) if snapshot_id else None, origin_url=origin_url, ) message_lines = ["None"] if revision["message"]: message_lines = revision["message"].split("\n") parents = [] for p in revision["parents"]: parent_url = gen_revision_url(p, snapshot_context) parents.append({"id": p, "url": parent_url}) path_info = gen_path_info(path) query_params = snapshot_context["query_params"] if snapshot_context else {} breadcrumbs = [] breadcrumbs.append( { "name": revision["directory"][:7], "url": reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ), } ) for pi in path_info: query_params["path"] = pi["path"] breadcrumbs.append( { "name": pi["name"], "url": reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ), } ) vault_cooking = { "directory_context": False, "directory_id": None, "revision_context": True, "revision_id": sha1_git, } swh_objects = [SWHObjectInfo(object_type=REVISION, object_id=sha1_git)] content = None content_size = None filename = None mimetype = None language = None readme_name = None readme_url = None readme_html = None readmes = {} error_code = 200 error_message = "" error_description = "" extra_context = dict(revision_metadata) extra_context["path"] = f"/{path}" if path else None if content_data: breadcrumbs[-1]["url"] = None content_size = content_data["length"] mimetype = content_data["mimetype"] if content_data["raw_data"]: content_display_data = prepare_content_for_display( content_data["raw_data"], content_data["mimetype"], path ) content = content_display_data["content_data"] language = content_display_data["language"] mimetype = content_display_data["mimetype"] if path: filename = path_info[-1]["name"] query_params["filename"] = filename filepath = "/".join(pi["name"] for pi in path_info[:-1]) extra_context["path"] = f"/{filepath}/" if filepath else "/" extra_context["filename"] = filename top_right_link = { "url": reverse( "browse-content-raw", url_args={"query_string": query_string}, query_params={"filename": filename}, ), "icon": swh_object_icons["content"], "text": "Raw File", } swh_objects.append( SWHObjectInfo(object_type=CONTENT, object_id=file_info["target"]) ) error_code = content_data["error_code"] error_message = content_data["error_message"] error_description = content_data["error_description"] else: for d in dirs: if d["type"] == "rev": d["url"] = reverse( "browse-revision", url_args={"sha1_git": d["target"]} ) else: query_params["path"] = path + d["name"] d["url"] = reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ) for f in files: query_params["path"] = path + f["name"] f["url"] = reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ) if f["length"] is not None: f["length"] = filesizeformat(f["length"]) if f["name"].lower().startswith("readme"): readmes[f["name"]] = f["checksums"]["sha1"] readme_name, readme_url, readme_html = get_readme_to_display(readmes) top_right_link = { "url": get_revision_log_url(sha1_git, snapshot_context), "icon": swh_object_icons["revisions history"], "text": "History", } vault_cooking["directory_context"] = True vault_cooking["directory_id"] = dir_id swh_objects.append(SWHObjectInfo(object_type=DIRECTORY, object_id=dir_id)) query_params.pop("path", None) diff_revision_url = reverse( "diff-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ) if snapshot_id: swh_objects.append(SWHObjectInfo(object_type=SNAPSHOT, object_id=snapshot_id)) swhids_info = get_swhids_info(swh_objects, snapshot_context, extra_context) heading = "Revision - %s - %s" % ( sha1_git[:7], textwrap.shorten(message_lines[0], width=70), ) if snapshot_context: context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading += " - %s" % context_found return render( request, "browse/revision.html", { "heading": heading, "swh_object_id": swhids_info[0]["swhid"], "swh_object_name": "Revision", "swh_object_metadata": revision_metadata, "message_header": message_lines[0], "message_body": "\n".join(message_lines[1:]), "parents": parents, "snapshot_context": snapshot_context, "dirs": dirs, "files": files, "content": content, "content_size": content_size, "max_content_size": content_display_max_size, "filename": filename, "encoding": content_data.get("encoding"), "mimetype": mimetype, "language": language, "readme_name": readme_name, "readme_url": readme_url, "readme_html": readme_html, "breadcrumbs": breadcrumbs, "top_right_link": top_right_link, "vault_cooking": vault_cooking, "diff_revision_url": diff_revision_url, "show_actions": True, "swhids_info": swhids_info, "error_code": error_code, "error_message": error_message, "error_description": error_description, }, status=error_code, ) diff --git a/swh/web/common/service.py b/swh/web/common/archive.py similarity index 100% rename from swh/web/common/service.py rename to swh/web/common/archive.py diff --git a/swh/web/common/identifiers.py b/swh/web/common/identifiers.py index 14cbb46f..9d2f28e1 100644 --- a/swh/web/common/identifiers.py +++ b/swh/web/common/identifiers.py @@ -1,388 +1,388 @@ # Copyright (C) 2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Any, Dict, Iterable, List, Optional, cast from urllib.parse import quote, unquote from typing_extensions import TypedDict from django.http import QueryDict from swh.model.exceptions import ValidationError from swh.model.hashutil import hash_to_bytes from swh.model.identifiers import ( CONTENT, DIRECTORY, ORIGIN, RELEASE, REVISION, SNAPSHOT, SWHID, parse_swhid, swhid, ) -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import BadInputExc from swh.web.common.typing import ( QueryParameters, SnapshotContext, SWHIDContext, SWHIDInfo, SWHObjectInfo, ) from swh.web.common.utils import reverse def gen_swhid( object_type: str, object_id: str, scheme_version: int = 1, metadata: SWHIDContext = {}, ) -> str: """ Returns the SoftWare Heritage persistent IDentifier for a swh object based on: * the object type * the object id * the SWHID scheme version Args: object_type: the swh object type (content/directory/release/revision/snapshot) object_id: the swh object id (hexadecimal representation of its hash value) scheme_version: the scheme version of the SWHIDs Returns: the SWHID of the object Raises: BadInputExc: if the provided parameters do not enable to generate a valid identifier """ try: obj_swhid = swhid( object_type, object_id, scheme_version, cast(Dict[str, Any], {k: v for k, v in metadata.items() if v is not None}), ) except ValidationError as e: raise BadInputExc("Invalid object (%s) for SWHID. %s" % (object_id, e)) else: return obj_swhid class ResolvedSWHID(TypedDict): """parsed SWHID with context""" swhid_parsed: SWHID """URL to browse object according to SWHID context""" browse_url: Optional[str] def resolve_swhid( swhid: str, query_params: Optional[QueryParameters] = None ) -> ResolvedSWHID: """ Try to resolve a SoftWare Heritage persistent IDentifier into an url for browsing the targeted object. Args: swhid: a SoftWare Heritage persistent IDentifier query_params: optional dict filled with query parameters to append to the browse url Returns: a dict with the following keys: * **swhid_parsed**: the parsed identifier * **browse_url**: the url for browsing the targeted object """ swhid_parsed = get_swhid(swhid) object_type = swhid_parsed.object_type object_id = swhid_parsed.object_id browse_url = None url_args = {} query_dict = QueryDict("", mutable=True) fragment = "" anchor_swhid_parsed = None process_lines = object_type is CONTENT if query_params and len(query_params) > 0: for k in sorted(query_params.keys()): query_dict[k] = query_params[k] if "origin" in swhid_parsed.metadata: query_dict["origin_url"] = unquote(swhid_parsed.metadata["origin"]) if "anchor" in swhid_parsed.metadata: anchor_swhid_parsed = get_swhid(swhid_parsed.metadata["anchor"]) if "path" in swhid_parsed.metadata and swhid_parsed.metadata["path"] != "/": query_dict["path"] = unquote(swhid_parsed.metadata["path"]) if anchor_swhid_parsed: directory = "" if anchor_swhid_parsed.object_type == DIRECTORY: directory = anchor_swhid_parsed.object_id elif anchor_swhid_parsed.object_type == REVISION: - revision = service.lookup_revision(anchor_swhid_parsed.object_id) + revision = archive.lookup_revision(anchor_swhid_parsed.object_id) directory = revision["directory"] elif anchor_swhid_parsed.object_type == RELEASE: - release = service.lookup_release(anchor_swhid_parsed.object_id) + release = archive.lookup_release(anchor_swhid_parsed.object_id) if release["target_type"] == REVISION: - revision = service.lookup_revision(release["target"]) + revision = archive.lookup_revision(release["target"]) directory = revision["directory"] if object_type == CONTENT: if "origin" not in swhid_parsed.metadata: # when no origin context, content objects need to have their # path prefixed by root directory id for proper breadcrumbs display query_dict["path"] = directory + query_dict["path"] else: # remove leading slash from SWHID content path query_dict["path"] = query_dict["path"][1:] elif object_type == DIRECTORY: object_id = directory # remove leading and trailing slashes from SWHID directory path query_dict["path"] = query_dict["path"][1:-1] # snapshot context if "visit" in swhid_parsed.metadata: snp_swhid_parsed = get_swhid(swhid_parsed.metadata["visit"]) if snp_swhid_parsed.object_type != SNAPSHOT: raise BadInputExc("Visit must be a snapshot SWHID.") query_dict["snapshot"] = snp_swhid_parsed.object_id if anchor_swhid_parsed: if anchor_swhid_parsed.object_type == REVISION: # check if the anchor revision is the tip of a branch - branch_name = service.lookup_snapshot_branch_name_from_tip_revision( + branch_name = archive.lookup_snapshot_branch_name_from_tip_revision( snp_swhid_parsed.object_id, anchor_swhid_parsed.object_id ) if branch_name: query_dict["branch"] = branch_name elif object_type != REVISION: query_dict["revision"] = anchor_swhid_parsed.object_id elif anchor_swhid_parsed.object_type == RELEASE: - release = service.lookup_release(anchor_swhid_parsed.object_id) + release = archive.lookup_release(anchor_swhid_parsed.object_id) if release: query_dict["release"] = release["name"] if object_type == REVISION and "release" not in query_dict: - branch_name = service.lookup_snapshot_branch_name_from_tip_revision( + branch_name = archive.lookup_snapshot_branch_name_from_tip_revision( snp_swhid_parsed.object_id, object_id ) if branch_name: query_dict["branch"] = branch_name # browsing content or directory without snapshot context elif object_type in (CONTENT, DIRECTORY) and anchor_swhid_parsed: if anchor_swhid_parsed.object_type == REVISION: # anchor revision, objects are browsed from its view object_type = REVISION object_id = anchor_swhid_parsed.object_id elif object_type == DIRECTORY and anchor_swhid_parsed.object_type == DIRECTORY: # a directory is browsed from its root object_id = anchor_swhid_parsed.object_id if object_type == CONTENT: url_args["query_string"] = f"sha1_git:{object_id}" elif object_type == DIRECTORY: url_args["sha1_git"] = object_id elif object_type == RELEASE: url_args["sha1_git"] = object_id elif object_type == REVISION: url_args["sha1_git"] = object_id elif object_type == SNAPSHOT: url_args["snapshot_id"] = object_id elif object_type == ORIGIN: raise BadInputExc( ( "Origin SWHIDs are not publicly resolvable because they are for " "internal usage only" ) ) if "lines" in swhid_parsed.metadata and process_lines: lines = swhid_parsed.metadata["lines"].split("-") fragment += "#L" + lines[0] if len(lines) > 1: fragment += "-L" + lines[1] if url_args: browse_url = ( reverse( f"browse-{object_type}", url_args=url_args, query_params=query_dict, ) + fragment ) return ResolvedSWHID(swhid_parsed=swhid_parsed, browse_url=browse_url) def get_swhid(swhid: str) -> SWHID: """Check if a SWHID is valid and return it parsed. Args: swhid: a SoftWare Heritage persistent IDentifier. Raises: BadInputExc: if the provided SWHID can not be parsed. Return: A parsed SWHID. """ try: swhid_parsed = parse_swhid(swhid) except ValidationError as ve: raise BadInputExc("Error when parsing identifier: %s" % " ".join(ve.messages)) else: return swhid_parsed def group_swhids(swhids: Iterable[SWHID],) -> Dict[str, List[bytes]]: """ Groups many SoftWare Heritage persistent IDentifiers into a dictionary depending on their type. Args: swhids: an iterable of SoftWare Heritage persistent IDentifier objects Returns: A dictionary with: keys: object types values: object hashes """ swhids_by_type: Dict[str, List[bytes]] = { CONTENT: [], DIRECTORY: [], REVISION: [], RELEASE: [], SNAPSHOT: [], } for obj_swhid in swhids: obj_id = obj_swhid.object_id obj_type = obj_swhid.object_type swhids_by_type[obj_type].append(hash_to_bytes(obj_id)) return swhids_by_type def get_swhids_info( swh_objects: Iterable[SWHObjectInfo], snapshot_context: Optional[SnapshotContext] = None, extra_context: Optional[Dict[str, Any]] = None, ) -> List[SWHIDInfo]: """ Returns a list of dict containing info related to SWHIDs of objects. Args: swh_objects: an iterable of dict describing archived objects snapshot_context: optional dict parameter describing the snapshot in which the objects have been found extra_context: optional dict filled with extra contextual info about the objects Returns: a list of dict containing SWHIDs info """ swhids_info = [] for swh_object in swh_objects: if not swh_object["object_id"]: swhids_info.append( SWHIDInfo( object_type=swh_object["object_type"], object_id="", swhid="", swhid_url="", context={}, swhid_with_context=None, swhid_with_context_url=None, ) ) continue object_type = swh_object["object_type"] object_id = swh_object["object_id"] swhid_context: SWHIDContext = {} if snapshot_context: if snapshot_context["origin_info"] is not None: swhid_context["origin"] = quote( snapshot_context["origin_info"]["url"], safe="/?:@&" ) if object_type != SNAPSHOT: swhid_context["visit"] = gen_swhid( SNAPSHOT, snapshot_context["snapshot_id"] ) if object_type in (CONTENT, DIRECTORY): if snapshot_context["release_id"] is not None: swhid_context["anchor"] = gen_swhid( RELEASE, snapshot_context["release_id"] ) elif snapshot_context["revision_id"] is not None: swhid_context["anchor"] = gen_swhid( REVISION, snapshot_context["revision_id"] ) if object_type in (CONTENT, DIRECTORY): if ( extra_context and "revision" in extra_context and extra_context["revision"] and "anchor" not in swhid_context ): swhid_context["anchor"] = gen_swhid(REVISION, extra_context["revision"]) elif ( extra_context and "root_directory" in extra_context and extra_context["root_directory"] and "anchor" not in swhid_context and ( object_type != DIRECTORY or extra_context["root_directory"] != object_id ) ): swhid_context["anchor"] = gen_swhid( DIRECTORY, extra_context["root_directory"] ) path = None if extra_context and "path" in extra_context: path = extra_context["path"] or "/" if "filename" in extra_context and object_type == CONTENT: path += extra_context["filename"] if object_type == DIRECTORY and path == "/": path = None if path: swhid_context["path"] = quote(path, safe="/?:@&") swhid = gen_swhid(object_type, object_id) swhid_url = reverse("browse-swhid", url_args={"swhid": swhid}) swhid_with_context = None swhid_with_context_url = None if swhid_context: swhid_with_context = gen_swhid( object_type, object_id, metadata=swhid_context ) swhid_with_context_url = reverse( "browse-swhid", url_args={"swhid": swhid_with_context} ) swhids_info.append( SWHIDInfo( object_type=object_type, object_id=object_id, swhid=swhid, swhid_url=swhid_url, context=swhid_context, swhid_with_context=swhid_with_context, swhid_with_context_url=swhid_with_context_url, ) ) return swhids_info diff --git a/swh/web/common/origin_save.py b/swh/web/common/origin_save.py index a4258bb3..edd1f55e 100644 --- a/swh/web/common/origin_save.py +++ b/swh/web/common/origin_save.py @@ -1,623 +1,623 @@ # Copyright (C) 2018-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from bisect import bisect_right from datetime import datetime, timedelta, timezone from itertools import product import json import logging from typing import Any, Dict from prometheus_client import Gauge import requests import sentry_sdk from django.core.exceptions import ObjectDoesNotExist, ValidationError from django.core.validators import URLValidator from django.utils.html import escape from swh.scheduler.utils import create_oneshot_task_dict from swh.web import config -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import BadInputExc, ForbiddenExc, NotFoundExc from swh.web.common.models import ( SAVE_REQUEST_ACCEPTED, SAVE_REQUEST_PENDING, SAVE_REQUEST_REJECTED, SAVE_TASK_FAILED, SAVE_TASK_NOT_CREATED, SAVE_TASK_NOT_YET_SCHEDULED, SAVE_TASK_RUNNING, SAVE_TASK_SCHEDULED, SAVE_TASK_SUCCEEDED, SaveAuthorizedOrigin, SaveOriginRequest, SaveUnauthorizedOrigin, ) from swh.web.common.origin_visits import get_origin_visits from swh.web.common.utils import SWH_WEB_METRICS_REGISTRY, parse_iso8601_date_to_utc scheduler = config.scheduler() logger = logging.getLogger(__name__) def get_origin_save_authorized_urls(): """ Get the list of origin url prefixes authorized to be immediately loaded into the archive (whitelist). Returns: list: The list of authorized origin url prefix """ return [origin.url for origin in SaveAuthorizedOrigin.objects.all()] def get_origin_save_unauthorized_urls(): """ Get the list of origin url prefixes forbidden to be loaded into the archive (blacklist). Returns: list: the list of unauthorized origin url prefix """ return [origin.url for origin in SaveUnauthorizedOrigin.objects.all()] def can_save_origin(origin_url): """ Check if a software origin can be saved into the archive. Based on the origin url, the save request will be either: * immediately accepted if the url is whitelisted * rejected if the url is blacklisted * put in pending state for manual review otherwise Args: origin_url (str): the software origin url to check Returns: str: the origin save request status, either **accepted**, **rejected** or **pending** """ # origin url may be blacklisted for url_prefix in get_origin_save_unauthorized_urls(): if origin_url.startswith(url_prefix): return SAVE_REQUEST_REJECTED # if the origin url is in the white list, it can be immediately saved for url_prefix in get_origin_save_authorized_urls(): if origin_url.startswith(url_prefix): return SAVE_REQUEST_ACCEPTED # otherwise, the origin url needs to be manually verified return SAVE_REQUEST_PENDING # map visit type to scheduler task # TODO: do not hardcode the task name here (T1157) _visit_type_task = {"git": "load-git", "hg": "load-hg", "svn": "load-svn"} # map scheduler task status to origin save status _save_task_status = { "next_run_not_scheduled": SAVE_TASK_NOT_YET_SCHEDULED, "next_run_scheduled": SAVE_TASK_SCHEDULED, "completed": SAVE_TASK_SUCCEEDED, "disabled": SAVE_TASK_FAILED, } def get_savable_visit_types(): return sorted(list(_visit_type_task.keys())) def _check_visit_type_savable(visit_type): """ Get the list of visit types that can be performed through a save request. Returns: list: the list of saveable visit types """ allowed_visit_types = ", ".join(get_savable_visit_types()) if visit_type not in _visit_type_task: raise BadInputExc( "Visit of type %s can not be saved! " "Allowed types are the following: %s" % (visit_type, allowed_visit_types) ) _validate_url = URLValidator(schemes=["http", "https", "svn", "git"]) def _check_origin_url_valid(origin_url): try: _validate_url(origin_url) except ValidationError: raise BadInputExc( "The provided origin url (%s) is not valid!" % escape(origin_url) ) def _get_visit_info_for_save_request(save_request): visit_date = None visit_status = None time_now = datetime.now(tz=timezone.utc) time_delta = time_now - save_request.request_date # stop trying to find a visit date one month after save request submission # as those requests to storage are expensive and associated loading task # surely ended up with errors if time_delta.days <= 30: try: origin = {"url": save_request.origin_url} - origin_info = service.lookup_origin(origin) + origin_info = archive.lookup_origin(origin) origin_visits = get_origin_visits(origin_info) visit_dates = [parse_iso8601_date_to_utc(v["date"]) for v in origin_visits] i = bisect_right(visit_dates, save_request.request_date) if i != len(visit_dates): visit_date = visit_dates[i] visit_status = origin_visits[i]["status"] if origin_visits[i]["status"] not in ("full", "partial"): visit_date = None except Exception as exc: sentry_sdk.capture_exception(exc) return visit_date, visit_status def _check_visit_update_status(save_request, save_task_status): visit_date, visit_status = _get_visit_info_for_save_request(save_request) save_request.visit_date = visit_date # visit has been performed, mark the saving task as succeed if visit_date and visit_status is not None: save_task_status = SAVE_TASK_SUCCEEDED elif visit_status in ("created", "ongoing"): save_task_status = SAVE_TASK_RUNNING else: time_now = datetime.now(tz=timezone.utc) time_delta = time_now - save_request.request_date # consider the task as failed if it is still in scheduled state # 30 days after its submission if time_delta.days > 30: save_task_status = SAVE_TASK_FAILED return visit_date, save_task_status def _save_request_dict(save_request, task=None): must_save = False visit_date = save_request.visit_date # save task still in scheduler db if task: save_task_status = _save_task_status[task["status"]] # Consider request from which a visit date has already been found # as succeeded to avoid retrieving it again if save_task_status == SAVE_TASK_SCHEDULED and visit_date: save_task_status = SAVE_TASK_SUCCEEDED if ( save_task_status in (SAVE_TASK_FAILED, SAVE_TASK_SUCCEEDED) and not visit_date ): visit_date, _ = _get_visit_info_for_save_request(save_request) save_request.visit_date = visit_date must_save = True # Check tasks still marked as scheduled / not yet scheduled if save_task_status in (SAVE_TASK_SCHEDULED, SAVE_TASK_NOT_YET_SCHEDULED): visit_date, save_task_status = _check_visit_update_status( save_request, save_task_status ) # save task may have been archived else: save_task_status = save_request.loading_task_status if save_task_status in (SAVE_TASK_SCHEDULED, SAVE_TASK_NOT_YET_SCHEDULED): visit_date, save_task_status = _check_visit_update_status( save_request, save_task_status ) else: save_task_status = save_request.loading_task_status if save_request.loading_task_status != save_task_status: save_request.loading_task_status = save_task_status must_save = True if must_save: save_request.save() return { "id": save_request.id, "visit_type": save_request.visit_type, "origin_url": save_request.origin_url, "save_request_date": save_request.request_date.isoformat(), "save_request_status": save_request.status, "save_task_status": save_task_status, "visit_date": visit_date.isoformat() if visit_date else None, } def create_save_origin_request(visit_type, origin_url): """ Create a loading task to save a software origin into the archive. This function aims to create a software origin loading task trough the use of the swh-scheduler component. First, some checks are performed to see if the visit type and origin url are valid but also if the the save request can be accepted. If those checks passed, the loading task is then created. Otherwise, the save request is put in pending or rejected state. All the submitted save requests are logged into the swh-web database to keep track of them. Args: visit_type (str): the type of visit to perform (currently only ``git`` but ``svn`` and ``hg`` will soon be available) origin_url (str): the url of the origin to save Raises: BadInputExc: the visit type or origin url is invalid ForbiddenExc: the provided origin url is blacklisted Returns: dict: A dict describing the save request with the following keys: * **visit_type**: the type of visit to perform * **origin_url**: the url of the origin * **save_request_date**: the date the request was submitted * **save_request_status**: the request status, either **accepted**, **rejected** or **pending** * **save_task_status**: the origin loading task status, either **not created**, **not yet scheduled**, **scheduled**, **succeed** or **failed** """ _check_visit_type_savable(visit_type) _check_origin_url_valid(origin_url) save_request_status = can_save_origin(origin_url) task = None # if the origin save request is accepted, create a scheduler # task to load it into the archive if save_request_status == SAVE_REQUEST_ACCEPTED: # create a task with high priority kwargs = { "priority": "high", "url": origin_url, } sor = None # get list of previously sumitted save requests current_sors = list( SaveOriginRequest.objects.filter( visit_type=visit_type, origin_url=origin_url ) ) can_create_task = False # if no save requests previously submitted, create the scheduler task if not current_sors: can_create_task = True else: # get the latest submitted save request sor = current_sors[0] # if it was in pending state, we need to create the scheduler task # and update the save request info in the database if sor.status == SAVE_REQUEST_PENDING: can_create_task = True # a task has already been created to load the origin elif sor.loading_task_id != -1: # get the scheduler task and its status tasks = scheduler.get_tasks([sor.loading_task_id]) task = tasks[0] if tasks else None task_status = _save_request_dict(sor, task)["save_task_status"] # create a new scheduler task only if the previous one has been # already executed if ( task_status == SAVE_TASK_FAILED or task_status == SAVE_TASK_SUCCEEDED ): can_create_task = True sor = None else: can_create_task = False if can_create_task: # effectively create the scheduler task task_dict = create_oneshot_task_dict(_visit_type_task[visit_type], **kwargs) task = scheduler.create_tasks([task_dict])[0] # pending save request has been accepted if sor: sor.status = SAVE_REQUEST_ACCEPTED sor.loading_task_id = task["id"] sor.save() else: sor = SaveOriginRequest.objects.create( visit_type=visit_type, origin_url=origin_url, status=save_request_status, loading_task_id=task["id"], ) # save request must be manually reviewed for acceptation elif save_request_status == SAVE_REQUEST_PENDING: # check if there is already such a save request already submitted, # no need to add it to the database in that case try: sor = SaveOriginRequest.objects.get( visit_type=visit_type, origin_url=origin_url, status=save_request_status ) # if not add it to the database except ObjectDoesNotExist: sor = SaveOriginRequest.objects.create( visit_type=visit_type, origin_url=origin_url, status=save_request_status ) # origin can not be saved as its url is blacklisted, # log the request to the database anyway else: sor = SaveOriginRequest.objects.create( visit_type=visit_type, origin_url=origin_url, status=save_request_status ) if save_request_status == SAVE_REQUEST_REJECTED: raise ForbiddenExc( ( 'The "save code now" request has been rejected ' "because the provided origin url is blacklisted." ) ) return _save_request_dict(sor, task) def get_save_origin_requests_from_queryset(requests_queryset): """ Get all save requests from a SaveOriginRequest queryset. Args: requests_queryset (django.db.models.QuerySet): input SaveOriginRequest queryset Returns: list: A list of save origin requests dict as described in :func:`swh.web.common.origin_save.create_save_origin_request` """ task_ids = [] for sor in requests_queryset: task_ids.append(sor.loading_task_id) save_requests = [] if task_ids: tasks = scheduler.get_tasks(task_ids) tasks = {task["id"]: task for task in tasks} for sor in requests_queryset: sr_dict = _save_request_dict(sor, tasks.get(sor.loading_task_id)) save_requests.append(sr_dict) return save_requests def get_save_origin_requests(visit_type, origin_url): """ Get all save requests for a given software origin. Args: visit_type (str): the type of visit origin_url (str): the url of the origin Raises: BadInputExc: the visit type or origin url is invalid swh.web.common.exc.NotFoundExc: no save requests can be found for the given origin Returns: list: A list of save origin requests dict as described in :func:`swh.web.common.origin_save.create_save_origin_request` """ _check_visit_type_savable(visit_type) _check_origin_url_valid(origin_url) sors = SaveOriginRequest.objects.filter( visit_type=visit_type, origin_url=origin_url ) if sors.count() == 0: raise NotFoundExc( ("No save requests found for visit of type " "%s on origin with url %s.") % (visit_type, origin_url) ) return get_save_origin_requests_from_queryset(sors) def get_save_origin_task_info( save_request_id: int, full_info: bool = True ) -> Dict[str, Any]: """ Get detailed information about an accepted save origin request and its associated loading task. If the associated loading task info is archived and removed from the scheduler database, returns an empty dictionary. Args: save_request_id: identifier of a save origin request full_info: whether to return detailed info for staff users Returns: A dictionary with the following keys: - **type**: loading task type - **arguments**: loading task arguments - **id**: loading task database identifier - **backend_id**: loading task celery identifier - **scheduled**: loading task scheduling date - **ended**: loading task termination date - **status**: loading task execution status Depending on the availability of the task logs in the elasticsearch cluster of Software Heritage, the returned dictionary may also contain the following keys: - **name**: associated celery task name - **message**: relevant log message from task execution - **duration**: task execution time (only if it succeeded) - **worker**: name of the worker that executed the task """ try: save_request = SaveOriginRequest.objects.get(id=save_request_id) except ObjectDoesNotExist: return {} task = scheduler.get_tasks([save_request.loading_task_id]) task = task[0] if task else None if task is None: return {} task_run = scheduler.get_task_runs([task["id"]]) task_run = task_run[0] if task_run else None if task_run is None: return {} task_run["type"] = task["type"] task_run["arguments"] = task["arguments"] task_run["id"] = task_run["task"] del task_run["task"] del task_run["metadata"] es_workers_index_url = config.get_config()["es_workers_index_url"] if not es_workers_index_url: return task_run es_workers_index_url += "/_search" if save_request.visit_date: min_ts = save_request.visit_date max_ts = min_ts + timedelta(days=7) else: min_ts = save_request.request_date max_ts = min_ts + timedelta(days=30) min_ts_unix = int(min_ts.timestamp()) * 1000 max_ts_unix = int(max_ts.timestamp()) * 1000 save_task_status = _save_task_status[task["status"]] priority = "3" if save_task_status == SAVE_TASK_FAILED else "6" query = { "bool": { "must": [ {"match_phrase": {"priority": {"query": priority}}}, {"match_phrase": {"swh_task_id": {"query": task_run["backend_id"]}}}, { "range": { "@timestamp": { "gte": min_ts_unix, "lte": max_ts_unix, "format": "epoch_millis", } } }, ] } } try: response = requests.post( es_workers_index_url, json={"query": query, "sort": ["@timestamp"]}, timeout=30, ) results = json.loads(response.text) if results["hits"]["total"]["value"] >= 1: task_run_info = results["hits"]["hits"][-1]["_source"] if "swh_logging_args_runtime" in task_run_info: duration = task_run_info["swh_logging_args_runtime"] task_run["duration"] = duration if "message" in task_run_info: task_run["message"] = task_run_info["message"] if "swh_logging_args_name" in task_run_info: task_run["name"] = task_run_info["swh_logging_args_name"] elif "swh_task_name" in task_run_info: task_run["name"] = task_run_info["swh_task_name"] if "hostname" in task_run_info: task_run["worker"] = task_run_info["hostname"] elif "host" in task_run_info: task_run["worker"] = task_run_info["host"] except Exception as exc: logger.warning("Request to Elasticsearch failed\n%s", exc) sentry_sdk.capture_exception(exc) if not full_info: for field in ("id", "backend_id", "worker"): # remove some staff only fields task_run.pop(field, None) if "message" in task_run and "Loading failure" in task_run["message"]: # hide traceback for non staff users, only display exception message_lines = task_run["message"].split("\n") message = "" for line in message_lines: if line.startswith("Traceback"): break message += f"{line}\n" message += message_lines[-1] task_run["message"] = message return task_run SUBMITTED_SAVE_REQUESTS_METRIC = "swh_web_submitted_save_requests" _submitted_save_requests_gauge = Gauge( name=SUBMITTED_SAVE_REQUESTS_METRIC, documentation="Number of submitted origin save requests", labelnames=["status", "visit_type"], registry=SWH_WEB_METRICS_REGISTRY, ) ACCEPTED_SAVE_REQUESTS_METRIC = "swh_web_accepted_save_requests" _accepted_save_requests_gauge = Gauge( name=ACCEPTED_SAVE_REQUESTS_METRIC, documentation="Number of accepted origin save requests", labelnames=["load_task_status", "visit_type"], registry=SWH_WEB_METRICS_REGISTRY, ) def compute_save_requests_metrics(): """Compute a couple of Prometheus metrics related to origin save requests""" request_statuses = ( SAVE_REQUEST_ACCEPTED, SAVE_REQUEST_REJECTED, SAVE_REQUEST_PENDING, ) load_task_statuses = ( SAVE_TASK_NOT_CREATED, SAVE_TASK_NOT_YET_SCHEDULED, SAVE_TASK_SCHEDULED, SAVE_TASK_SUCCEEDED, SAVE_TASK_FAILED, SAVE_TASK_RUNNING, ) visit_types = get_savable_visit_types() labels_set = product(request_statuses, visit_types) for labels in labels_set: _submitted_save_requests_gauge.labels(*labels).set(0) labels_set = product(load_task_statuses, visit_types) for labels in labels_set: _accepted_save_requests_gauge.labels(*labels).set(0) for sor in SaveOriginRequest.objects.all(): if sor.status == SAVE_REQUEST_ACCEPTED: _accepted_save_requests_gauge.labels( load_task_status=sor.loading_task_status, visit_type=sor.visit_type ).inc() _submitted_save_requests_gauge.labels( status=sor.status, visit_type=sor.visit_type ).inc() diff --git a/swh/web/common/origin_visits.py b/swh/web/common/origin_visits.py index 54b8cecc..6dc2201a 100644 --- a/swh/web/common/origin_visits.py +++ b/swh/web/common/origin_visits.py @@ -1,197 +1,197 @@ # Copyright (C) 2018-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import math from typing import List, Optional from django.core.cache import cache -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import NotFoundExc from swh.web.common.typing import OriginInfo, OriginVisitInfo from swh.web.common.utils import parse_iso8601_date_to_utc def get_origin_visits(origin_info: OriginInfo) -> List[OriginVisitInfo]: """Function that returns the list of visits for a swh origin. That list is put in cache in order to speedup the navigation in the swh web browse ui. The returned visits are sorted according to their date in ascending order. Args: origin_info: dict describing the origin to fetch visits from Returns: A list of dict describing the origin visits Raises: swh.web.common.exc.NotFoundExc: if the origin is not found """ - from swh.web.common import service + from swh.web.common import archive if "url" in origin_info: origin_url = origin_info["url"] else: - origin_url = service.lookup_origin(origin_info)["url"] + origin_url = archive.lookup_origin(origin_info)["url"] cache_entry_id = "origin_visits_%s" % origin_url cache_entry = cache.get(cache_entry_id) if cache_entry: last_visit = cache_entry[-1]["visit"] new_visits = list( - service.lookup_origin_visits(origin_url, last_visit=last_visit) + archive.lookup_origin_visits(origin_url, last_visit=last_visit) ) if not new_visits: - last_snp = service.lookup_latest_origin_snapshot(origin_url) + last_snp = archive.lookup_latest_origin_snapshot(origin_url) if not last_snp or last_snp["id"] == cache_entry[-1]["snapshot"]: return cache_entry origin_visits = [] - per_page = service.MAX_LIMIT + per_page = archive.MAX_LIMIT last_visit = None while 1: visits = list( - service.lookup_origin_visits( + archive.lookup_origin_visits( origin_url, last_visit=last_visit, per_page=per_page ) ) origin_visits += visits if len(visits) < per_page: break else: if not last_visit: last_visit = per_page else: last_visit += per_page def _visit_sort_key(visit): ts = parse_iso8601_date_to_utc(visit["date"]).timestamp() return ts + (float(visit["visit"]) / 10e3) origin_visits = sorted(origin_visits, key=lambda v: _visit_sort_key(v)) cache.set(cache_entry_id, origin_visits) return origin_visits def get_origin_visit( origin_info: OriginInfo, visit_ts: Optional[str] = None, visit_id: Optional[int] = None, snapshot_id: Optional[str] = None, ) -> OriginVisitInfo: """Function that returns information about a visit for a given origin. If a timestamp is provided, the closest visit from that timestamp is returned. If a snapshot identifier is provided, the first visit with that snapshot is returned. If no search hints are provided, return the most recent full visit with a valid snapshot or the most recent partial visit with a valid snapshot otherwise. Args: origin_info: a dict filled with origin information visit_ts: an ISO 8601 datetime string to parse snapshot_id: a snapshot identifier Returns: A dict containing the visit info. Raises: swh.web.common.exc.NotFoundExc: if no visit can be found """ # returns the latest full visit with a valid snapshot - visit = service.lookup_origin_visit_latest( + visit = archive.lookup_origin_visit_latest( origin_info["url"], allowed_statuses=["full"], require_snapshot=True ) if not visit: # or the latest partial visit with a valid snapshot otherwise - visit = service.lookup_origin_visit_latest( + visit = archive.lookup_origin_visit_latest( origin_info["url"], allowed_statuses=["partial"], require_snapshot=True ) if not visit_ts and not visit_id and not snapshot_id: if visit: return visit else: raise NotFoundExc( f"No valid visit for origin with url {origin_info['url']} found!" ) # no need to fetch all visits list and search in it if the latest # visit matches some criteria if visit and (visit["snapshot"] == snapshot_id or visit["visit"] == visit_id): return visit visits = get_origin_visits(origin_info) if not visits: raise NotFoundExc( f"No visits associated to origin with url {origin_info['url']}!" ) if snapshot_id: visits = [v for v in visits if v["snapshot"] == snapshot_id] if len(visits) == 0: raise NotFoundExc( ( "Visit for snapshot with id %s for origin with" " url %s not found!" % (snapshot_id, origin_info["url"]) ) ) return visits[0] if visit_id: visits = [v for v in visits if v["visit"] == int(visit_id)] if len(visits) == 0: raise NotFoundExc( ( "Visit with id %s for origin with" " url %s not found!" % (visit_id, origin_info["url"]) ) ) return visits[0] if visit_ts: target_visit_ts = math.floor(parse_iso8601_date_to_utc(visit_ts).timestamp()) # Find the visit with date closest to the target (in absolute value) (abs_time_delta, visit_idx) = min( ( (math.floor(parse_iso8601_date_to_utc(visit["date"]).timestamp()), i) for (i, visit) in enumerate(visits) ), key=lambda ts_and_i: abs(ts_and_i[0] - target_visit_ts), ) if visit_idx is not None: visit = visits[visit_idx] # If multiple visits have the same date, select the one with # the largest id. while ( visit_idx < len(visits) - 1 and visit["date"] == visits[visit_idx + 1]["date"] ): visit_idx = visit_idx + 1 visit = visits[visit_idx] return visit else: raise NotFoundExc( ( "Visit with timestamp %s for origin with " "url %s not found!" % (visit_ts, origin_info["url"]) ) ) return visits[-1] diff --git a/swh/web/common/utils.py b/swh/web/common/utils.py index 27931f95..09d5109c 100644 --- a/swh/web/common/utils.py +++ b/swh/web/common/utils.py @@ -1,350 +1,350 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from datetime import datetime, timezone import re from typing import Any, Dict, Optional from bs4 import BeautifulSoup from docutils.core import publish_parts import docutils.parsers.rst import docutils.utils from docutils.writers.html5_polyglot import HTMLTranslator, Writer from iso8601 import ParseError, parse_date from prometheus_client.registry import CollectorRegistry from django.http import HttpRequest, QueryDict from django.urls import reverse as django_reverse from rest_framework.authentication import SessionAuthentication from swh.web.common.exc import BadInputExc from swh.web.common.typing import QueryParameters from swh.web.config import get_config SWH_WEB_METRICS_REGISTRY = CollectorRegistry(auto_describe=True) swh_object_icons = { "branch": "mdi mdi-source-branch", "branches": "mdi mdi-source-branch", "content": "mdi mdi-file-document", "directory": "mdi mdi-folder", "origin": "mdi mdi-source-repository", "person": "mdi mdi-account", "revisions history": "mdi mdi-history", "release": "mdi mdi-tag", "releases": "mdi mdi-tag", "revision": "mdi mdi-rotate-90 mdi-source-commit", "snapshot": "mdi mdi-camera", "visits": "mdi mdi-calendar-month", } def reverse( viewname: str, url_args: Optional[Dict[str, Any]] = None, query_params: Optional[QueryParameters] = None, current_app: Optional[str] = None, urlconf: Optional[str] = None, request: Optional[HttpRequest] = None, ) -> str: """An override of django reverse function supporting query parameters. Args: viewname: the name of the django view from which to compute a url url_args: dictionary of url arguments indexed by their names query_params: dictionary of query parameters to append to the reversed url current_app: the name of the django app tighten to the view urlconf: url configuration module request: build an absolute URI if provided Returns: str: the url of the requested view with processed arguments and query parameters """ if url_args: url_args = {k: v for k, v in url_args.items() if v is not None} url = django_reverse( viewname, urlconf=urlconf, kwargs=url_args, current_app=current_app ) if query_params: query_params = {k: v for k, v in query_params.items() if v is not None} if query_params and len(query_params) > 0: query_dict = QueryDict("", mutable=True) for k in sorted(query_params.keys()): query_dict[k] = query_params[k] url += "?" + query_dict.urlencode(safe="/;:") if request is not None: url = request.build_absolute_uri(url) return url def datetime_to_utc(date): """Returns datetime in UTC without timezone info Args: date (datetime.datetime): input datetime with timezone info Returns: datetime.datetime: datetime in UTC without timezone info """ if date.tzinfo and date.tzinfo != timezone.utc: return date.astimezone(tz=timezone.utc) else: return date def parse_iso8601_date_to_utc(iso_date: str) -> datetime: """Given an ISO 8601 datetime string, parse the result as UTC datetime. Returns: a timezone-aware datetime representing the parsed date Raises: swh.web.common.exc.BadInputExc: provided date does not respect ISO 8601 format Samples: - 2016-01-12 - 2016-01-12T09:19:12+0100 - 2007-01-14T20:34:22Z """ try: date = parse_date(iso_date) return datetime_to_utc(date) except ParseError as e: raise BadInputExc(e) def shorten_path(path): """Shorten the given path: for each hash present, only return the first 8 characters followed by an ellipsis""" sha256_re = r"([0-9a-f]{8})[0-9a-z]{56}" sha1_re = r"([0-9a-f]{8})[0-9a-f]{32}" ret = re.sub(sha256_re, r"\1...", path) return re.sub(sha1_re, r"\1...", ret) def format_utc_iso_date(iso_date, fmt="%d %B %Y, %H:%M UTC"): """Turns a string representation of an ISO 8601 datetime string to UTC and format it into a more human readable one. For instance, from the following input string: '2017-05-04T13:27:13+02:00' the following one is returned: '04 May 2017, 11:27 UTC'. Custom format string may also be provided as parameter Args: iso_date (str): a string representation of an ISO 8601 date fmt (str): optional date formatting string Returns: str: a formatted string representation of the input iso date """ if not iso_date: return iso_date date = parse_iso8601_date_to_utc(iso_date) return date.strftime(fmt) def gen_path_info(path): """Function to generate path data navigation for use with a breadcrumb in the swh web ui. For instance, from a path /folder1/folder2/folder3, it returns the following list:: [{'name': 'folder1', 'path': 'folder1'}, {'name': 'folder2', 'path': 'folder1/folder2'}, {'name': 'folder3', 'path': 'folder1/folder2/folder3'}] Args: path: a filesystem path Returns: list: a list of path data for navigation as illustrated above. """ path_info = [] if path: sub_paths = path.strip("/").split("/") path_from_root = "" for p in sub_paths: path_from_root += "/" + p path_info.append({"name": p, "path": path_from_root.strip("/")}) return path_info def parse_rst(text, report_level=2): """ Parse a reStructuredText string with docutils. Args: text (str): string with reStructuredText markups in it report_level (int): level of docutils report messages to print (1 info 2 warning 3 error 4 severe 5 none) Returns: docutils.nodes.document: a parsed docutils document """ parser = docutils.parsers.rst.Parser() components = (docutils.parsers.rst.Parser,) settings = docutils.frontend.OptionParser( components=components ).get_default_values() settings.report_level = report_level document = docutils.utils.new_document("rst-doc", settings=settings) parser.parse(text, document) return document def get_client_ip(request): """ Return the client IP address from an incoming HTTP request. Args: request (django.http.HttpRequest): the incoming HTTP request Returns: str: The client IP address """ x_forwarded_for = request.META.get("HTTP_X_FORWARDED_FOR") if x_forwarded_for: ip = x_forwarded_for.split(",")[0] else: ip = request.META.get("REMOTE_ADDR") return ip browsers_supported_image_mimes = set( [ "image/gif", "image/png", "image/jpeg", "image/bmp", "image/webp", "image/svg", "image/svg+xml", ] ) def context_processor(request): """ Django context processor used to inject variables in all swh-web templates. """ config = get_config() if ( hasattr(request, "user") and request.user.is_authenticated and not hasattr(request.user, "backend") ): # To avoid django.template.base.VariableDoesNotExist errors # when rendering templates when standard Django user is logged in. request.user.backend = "django.contrib.auth.backends.ModelBackend" return { "swh_object_icons": swh_object_icons, "available_languages": None, "swh_client_config": config["client_config"], "oidc_enabled": bool(config["keycloak"]["server_url"]), "browsers_supported_image_mimes": browsers_supported_image_mimes, } class EnforceCSRFAuthentication(SessionAuthentication): """ Helper class to enforce CSRF validation on a DRF view when a user is not authenticated. """ def authenticate(self, request): user = getattr(request._request, "user", None) self.enforce_csrf(request) return (user, None) def resolve_branch_alias( snapshot: Dict[str, Any], branch: Optional[Dict[str, Any]] ) -> Optional[Dict[str, Any]]: """ Resolve branch alias in snapshot content. Args: snapshot: a full snapshot content branch: a branch alias contained in the snapshot Returns: The real snapshot branch that got aliased. """ while branch and branch["target_type"] == "alias": if branch["target"] in snapshot["branches"]: branch = snapshot["branches"][branch["target"]] else: - from swh.web.common import service + from swh.web.common import archive - snp = service.lookup_snapshot( + snp = archive.lookup_snapshot( snapshot["id"], branches_from=branch["target"], branches_count=1 ) if snp and branch["target"] in snp["branches"]: branch = snp["branches"][branch["target"]] else: branch = None return branch class _NoHeaderHTMLTranslator(HTMLTranslator): """ Docutils translator subclass to customize the generation of HTML from reST-formatted docstrings """ def __init__(self, document): super().__init__(document) self.body_prefix = [] self.body_suffix = [] _HTML_WRITER = Writer() _HTML_WRITER.translator_class = _NoHeaderHTMLTranslator def rst_to_html(rst: str) -> str: """ Convert reStructuredText document into HTML. Args: rst: A string containing a reStructuredText document Returns: Body content of the produced HTML conversion. """ settings = { "initial_header_level": 2, } pp = publish_parts(rst, writer=_HTML_WRITER, settings_overrides=settings) return f'
{pp["html_body"]}
' def prettify_html(html: str) -> str: """ Prettify an HTML document. Args: html: Input HTML document Returns: The prettified HTML document """ return BeautifulSoup(html, "lxml").prettify() diff --git a/swh/web/misc/badges.py b/swh/web/misc/badges.py index 6281c2c8..13248063 100644 --- a/swh/web/misc/badges.py +++ b/swh/web/misc/badges.py @@ -1,166 +1,166 @@ # Copyright (C) 2019-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from base64 import b64encode from typing import Optional, cast from pybadges import badge from django.conf.urls import url from django.contrib.staticfiles import finders from django.http import HttpRequest, HttpResponse from swh.model.exceptions import ValidationError from swh.model.identifiers import ( CONTENT, DIRECTORY, ORIGIN, RELEASE, REVISION, SNAPSHOT, parse_swhid, swhid, ) -from swh.web.common import service +from swh.web.common import archive from swh.web.common.exc import BadInputExc, NotFoundExc from swh.web.common.identifiers import resolve_swhid from swh.web.common.utils import reverse _orange = "#f36a24" _blue = "#0172b2" _red = "#cd5741" _swh_logo_data = None _badge_config = { CONTENT: {"color": _blue, "title": "Archived source file",}, DIRECTORY: {"color": _blue, "title": "Archived source tree",}, ORIGIN: {"color": _orange, "title": "Archived software repository",}, RELEASE: {"color": _blue, "title": "Archived software release",}, REVISION: {"color": _blue, "title": "Archived commit",}, SNAPSHOT: {"color": _blue, "title": "Archived software repository snapshot",}, "error": {"color": _red, "title": "An error occurred when generating the badge"}, } def _get_logo_data() -> str: """ Get data-URI for Software Heritage SVG logo to embed it in the generated badges. """ global _swh_logo_data if _swh_logo_data is None: swh_logo_path = cast(str, finders.find("img/swh-logo-white.svg")) with open(swh_logo_path, "rb") as swh_logo_file: _swh_logo_data = "data:image/svg+xml;base64,%s" % b64encode( swh_logo_file.read() ).decode("ascii") return _swh_logo_data def _swh_badge( request: HttpRequest, object_type: str, object_id: str, object_swhid: Optional[str] = "", ) -> HttpResponse: """ Generate a Software Heritage badge for a given object type and id. Args: request: input http request object_type: The type of swh object to generate a badge for, either *content*, *directory*, *revision*, *release*, *origin* or *snapshot* object_id: The id of the swh object, either an url for origin type or a *sha1* for other object types object_swhid: If provided, the object SWHID will not be recomputed Returns: HTTP response with content type *image/svg+xml* containing the SVG badge data. If the provided parameters are invalid, HTTP 400 status code will be returned. If the object can not be found in the archive, HTTP 404 status code will be returned. """ left_text = "error" whole_link = None try: if object_type == ORIGIN: - service.lookup_origin({"url": object_id}) + archive.lookup_origin({"url": object_id}) right_text = "repository" whole_link = reverse( "browse-origin", query_params={"origin_url": object_id} ) else: # when SWHID is provided, object type and id will be parsed # from it if object_swhid: parsed_swhid = parse_swhid(object_swhid) object_type = parsed_swhid.object_type object_id = parsed_swhid.object_id - swh_object = service.lookup_object(object_type, object_id) + swh_object = archive.lookup_object(object_type, object_id) if object_swhid: right_text = object_swhid else: right_text = swhid(object_type, object_id) whole_link = resolve_swhid(right_text)["browse_url"] # remove SWHID metadata if any for badge text if object_swhid: right_text = right_text.split(";")[0] # use release name for badge text if object_type == RELEASE: right_text = "release %s" % swh_object["name"] left_text = "archived" except (BadInputExc, ValidationError): right_text = f'invalid {object_type if object_type else "object"} id' object_type = "error" except NotFoundExc: right_text = f'{object_type if object_type else "object"} not found' object_type = "error" badge_data = badge( left_text=left_text, right_text=right_text, right_color=_badge_config[object_type]["color"], whole_link=request.build_absolute_uri(whole_link), whole_title=_badge_config[object_type]["title"], logo=_get_logo_data(), embed_logo=True, ) return HttpResponse(badge_data, content_type="image/svg+xml") def _swh_badge_swhid(request: HttpRequest, object_swhid: str) -> HttpResponse: """ Generate a Software Heritage badge for a given object SWHID. Args: request (django.http.HttpRequest): input http request object_swhid (str): a SWHID of an archived object Returns: django.http.HttpResponse: An http response with content type *image/svg+xml* containing the SVG badge data. If any error occurs, a status code of 400 will be returned. """ return _swh_badge(request, "", "", object_swhid) urlpatterns = [ url( r"^badge/(?P[a-z]+)/(?P.+)/$", _swh_badge, name="swh-badge", ), url( r"^badge/(?Pswh:[0-9]+:[a-z]+:[0-9a-f]+.*)/$", _swh_badge_swhid, name="swh-badge-swhid", ), ] diff --git a/swh/web/misc/urls.py b/swh/web/misc/urls.py index cc00273c..fa78ff83 100644 --- a/swh/web/misc/urls.py +++ b/swh/web/misc/urls.py @@ -1,102 +1,102 @@ # Copyright (C) 2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json import requests import sentry_sdk from django.conf.urls import include, url from django.contrib.staticfiles import finders from django.http import JsonResponse from django.shortcuts import render -from swh.web.common import service +from swh.web.common import archive from swh.web.config import get_config from swh.web.misc.metrics import prometheus_metrics def _jslicenses(request): jslicenses_file = finders.find("jssources/jslicenses.json") jslicenses_data = json.load(open(jslicenses_file)) jslicenses_data = sorted( jslicenses_data.items(), key=lambda item: item[0].split("/")[-1] ) return render(request, "misc/jslicenses.html", {"jslicenses_data": jslicenses_data}) def _stat_counters(request): - stat_counters = service.stat_counters() + stat_counters = archive.stat_counters() url = get_config()["history_counters_url"] stat_counters_history = "null" if url: try: response = requests.get(url, timeout=5) stat_counters_history = json.loads(response.text) except Exception as exc: sentry_sdk.capture_exception(exc) counters = { "stat_counters": stat_counters, "stat_counters_history": stat_counters_history, } return JsonResponse(counters) urlpatterns = [ url(r"^", include("swh.web.misc.coverage")), url(r"^jslicenses/$", _jslicenses, name="jslicenses"), url(r"^", include("swh.web.misc.origin_save")), url(r"^stat_counters/", _stat_counters, name="stat-counters"), url(r"^", include("swh.web.misc.badges")), url(r"^metrics/prometheus/$", prometheus_metrics, name="metrics-prometheus"), ] # when running end to end tests trough cypress, declare some extra # endpoints to provide input data for some of those tests if get_config()["e2e_tests_mode"]: from swh.web.tests.views import ( get_content_code_data_all_exts, get_content_code_data_all_filenames, get_content_code_data_by_ext, get_content_code_data_by_filename, get_content_other_data_by_ext, ) urlpatterns.append( url( r"^tests/data/content/code/extension/(?P.+)/$", get_content_code_data_by_ext, name="tests-content-code-extension", ) ) urlpatterns.append( url( r"^tests/data/content/other/extension/(?P.+)/$", get_content_other_data_by_ext, name="tests-content-other-extension", ) ) urlpatterns.append( url( r"^tests/data/content/code/extensions/$", get_content_code_data_all_exts, name="tests-content-code-extensions", ) ) urlpatterns.append( url( r"^tests/data/content/code/filename/(?P.+)/$", get_content_code_data_by_filename, name="tests-content-code-filename", ) ) urlpatterns.append( url( r"^tests/data/content/code/filenames/$", get_content_code_data_all_filenames, name="tests-content-code-filenames", ) ) diff --git a/swh/web/tests/api/views/test_origin.py b/swh/web/tests/api/views/test_origin.py index e6037856..55e906ab 100644 --- a/swh/web/tests/api/views/test_origin.py +++ b/swh/web/tests/api/views/test_origin.py @@ -1,682 +1,682 @@ # Copyright (C) 2015-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from datetime import timedelta from hypothesis import given import pytest from requests.utils import parse_header_links from swh.model.model import Origin, OriginVisit, OriginVisitStatus from swh.storage.exc import StorageAPIError, StorageDBError from swh.storage.utils import now from swh.web.api.utils import enrich_origin, enrich_origin_visit from swh.web.common.exc import BadInputExc from swh.web.common.origin_visits import get_origin_visits from swh.web.common.utils import reverse from swh.web.tests.api.views import check_api_get_responses from swh.web.tests.strategies import new_origin, new_snapshots, origin, visit_dates def _scroll_results(api_client, url): """Iterates through pages of results, and returns them all.""" results = [] while True: rv = api_client.get(url) assert rv.status_code == 200, rv.data assert rv["Content-Type"] == "application/json" results.extend(rv.data) if "Link" in rv: for link in parse_header_links(rv["Link"]): if link["rel"] == "next": # Found link to next page of results url = link["url"] break else: # No link with 'rel=next' break else: # No Link header break return results def test_api_lookup_origin_visits_raise_error(api_client, mocker): mock_get_origin_visits = mocker.patch("swh.web.api.views.origin.get_origin_visits") err_msg = "voluntary error to check the bad request middleware." mock_get_origin_visits.side_effect = BadInputExc(err_msg) url = reverse("api-1-origin-visits", url_args={"origin_url": "http://foo"}) rv = check_api_get_responses(api_client, url, status_code=400) assert rv.data == {"exception": "BadInputExc", "reason": err_msg} def test_api_lookup_origin_visits_raise_swh_storage_error_db(api_client, mocker): mock_get_origin_visits = mocker.patch("swh.web.api.views.origin.get_origin_visits") err_msg = "Storage exploded! Will be back online shortly!" mock_get_origin_visits.side_effect = StorageDBError(err_msg) url = reverse("api-1-origin-visits", url_args={"origin_url": "http://foo"}) rv = check_api_get_responses(api_client, url, status_code=503) assert rv.data == { "exception": "StorageDBError", "reason": "An unexpected error occurred in the backend: %s" % err_msg, } def test_api_lookup_origin_visits_raise_swh_storage_error_api(api_client, mocker): mock_get_origin_visits = mocker.patch("swh.web.api.views.origin.get_origin_visits") err_msg = "Storage API dropped dead! Will resurrect asap!" mock_get_origin_visits.side_effect = StorageAPIError(err_msg) url = reverse("api-1-origin-visits", url_args={"origin_url": "http://foo"}) rv = check_api_get_responses(api_client, url, status_code=503) assert rv.data == { "exception": "StorageAPIError", "reason": "An unexpected error occurred in the api backend: %s" % err_msg, } @given(new_origin(), visit_dates(3), new_snapshots(3)) def test_api_lookup_origin_visits( api_client, archive_data, new_origin, visit_dates, new_snapshots ): archive_data.origin_add([new_origin]) for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [OriginVisit(origin=new_origin.url, date=visit_date, type="git",)] )[0] archive_data.snapshot_add([new_snapshots[i]]) visit_status = OriginVisitStatus( origin=new_origin.url, visit=origin_visit.visit, date=now(), status="full", snapshot=new_snapshots[i].id, ) archive_data.origin_visit_status_add([visit_status]) all_visits = list(reversed(get_origin_visits(new_origin.to_dict()))) for last_visit, expected_visits in ( (None, all_visits[:2]), (all_visits[1]["visit"], all_visits[2:]), ): url = reverse( "api-1-origin-visits", url_args={"origin_url": new_origin.url}, query_params={"per_page": 2, "last_visit": last_visit}, ) rv = check_api_get_responses(api_client, url, status_code=200) for i in range(len(expected_visits)): expected_visits[i] = enrich_origin_visit( expected_visits[i], with_origin_link=False, with_origin_visit_link=True, request=rv.wsgi_request, ) assert rv.data == expected_visits @given(new_origin(), visit_dates(3), new_snapshots(3)) def test_api_lookup_origin_visits_by_id( api_client, archive_data, new_origin, visit_dates, new_snapshots ): archive_data.origin_add([new_origin]) for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [OriginVisit(origin=new_origin.url, date=visit_date, type="git",)] )[0] archive_data.snapshot_add([new_snapshots[i]]) visit_status = OriginVisitStatus( origin=new_origin.url, visit=origin_visit.visit, date=now(), status="full", snapshot=new_snapshots[i].id, ) archive_data.origin_visit_status_add([visit_status]) all_visits = list(reversed(get_origin_visits(new_origin.to_dict()))) for last_visit, expected_visits in ( (None, all_visits[:2]), (all_visits[1]["visit"], all_visits[2:4]), ): url = reverse( "api-1-origin-visits", url_args={"origin_url": new_origin.url}, query_params={"per_page": 2, "last_visit": last_visit}, ) rv = check_api_get_responses(api_client, url, status_code=200) for i in range(len(expected_visits)): expected_visits[i] = enrich_origin_visit( expected_visits[i], with_origin_link=False, with_origin_visit_link=True, request=rv.wsgi_request, ) assert rv.data == expected_visits @given(new_origin(), visit_dates(3), new_snapshots(3)) def test_api_lookup_origin_visit( api_client, archive_data, new_origin, visit_dates, new_snapshots ): archive_data.origin_add([new_origin]) for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [OriginVisit(origin=new_origin.url, date=visit_date, type="git",)] )[0] visit_id = origin_visit.visit archive_data.snapshot_add([new_snapshots[i]]) visit_status = OriginVisitStatus( origin=new_origin.url, visit=origin_visit.visit, date=visit_date + timedelta(minutes=5), status="full", snapshot=new_snapshots[i].id, ) archive_data.origin_visit_status_add([visit_status]) url = reverse( "api-1-origin-visit", url_args={"origin_url": new_origin.url, "visit_id": visit_id}, ) rv = check_api_get_responses(api_client, url, status_code=200) expected_visit = archive_data.origin_visit_get_by(new_origin.url, visit_id) expected_visit = enrich_origin_visit( expected_visit, with_origin_link=True, with_origin_visit_link=False, request=rv.wsgi_request, ) assert rv.data == expected_visit @given(new_origin()) def test_api_lookup_origin_visit_latest_no_visit(api_client, archive_data, new_origin): archive_data.origin_add([new_origin]) url = reverse("api-1-origin-visit-latest", url_args={"origin_url": new_origin.url}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "No visit for origin %s found" % new_origin.url, } @given(new_origin(), visit_dates(2), new_snapshots(1)) def test_api_lookup_origin_visit_latest( api_client, archive_data, new_origin, visit_dates, new_snapshots ): archive_data.origin_add([new_origin]) visit_dates.sort() visit_ids = [] for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [OriginVisit(origin=new_origin.url, date=visit_date, type="git",)] )[0] visit_ids.append(origin_visit.visit) archive_data.snapshot_add([new_snapshots[0]]) visit_status = OriginVisitStatus( origin=new_origin.url, visit=visit_ids[0], date=now(), status="full", snapshot=new_snapshots[0].id, ) archive_data.origin_visit_status_add([visit_status]) url = reverse("api-1-origin-visit-latest", url_args={"origin_url": new_origin.url}) rv = check_api_get_responses(api_client, url, status_code=200) expected_visit = archive_data.origin_visit_get_by(new_origin.url, visit_ids[1]) expected_visit = enrich_origin_visit( expected_visit, with_origin_link=True, with_origin_visit_link=False, request=rv.wsgi_request, ) assert rv.data == expected_visit @given(new_origin(), visit_dates(2), new_snapshots(1)) def test_api_lookup_origin_visit_latest_with_snapshot( api_client, archive_data, new_origin, visit_dates, new_snapshots ): archive_data.origin_add([new_origin]) visit_dates.sort() visit_ids = [] for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [OriginVisit(origin=new_origin.url, date=visit_date, type="git",)] )[0] visit_ids.append(origin_visit.visit) archive_data.snapshot_add([new_snapshots[0]]) # Add snapshot to the latest visit visit_id = visit_ids[-1] visit_status = OriginVisitStatus( origin=new_origin.url, visit=visit_id, date=now(), status="full", snapshot=new_snapshots[0].id, ) archive_data.origin_visit_status_add([visit_status]) url = reverse( "api-1-origin-visit-latest", url_args={"origin_url": new_origin.url}, query_params={"require_snapshot": True}, ) rv = check_api_get_responses(api_client, url, status_code=200) expected_visit = archive_data.origin_visit_status_get_latest( new_origin.url, type="git", require_snapshot=True ) expected_visit = enrich_origin_visit( expected_visit, with_origin_link=True, with_origin_visit_link=False, request=rv.wsgi_request, ) assert rv.data == expected_visit @given(origin()) def test_api_lookup_origin_visit_not_found(api_client, origin): all_visits = list(reversed(get_origin_visits(origin))) max_visit_id = max([v["visit"] for v in all_visits]) url = reverse( "api-1-origin-visit", url_args={"origin_url": origin["url"], "visit_id": max_visit_id + 1}, ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Origin %s or its visit with id %s not found!" % (origin["url"], max_visit_id + 1), } def test_api_origins_wrong_input(api_client, archive_data): """Should fail with 400 if the input is deprecated. """ # fail if wrong input url = reverse("api-1-origins", query_params={"origin_from": 1}) rv = check_api_get_responses(api_client, url, status_code=400) assert rv.data == { "exception": "BadInputExc", "reason": "Please use the Link header to browse through result", } def test_api_origins(api_client, archive_data): page_result = archive_data.origin_list(limit=10000) origins = page_result.results origin_urls = {origin.url for origin in origins} # Get only one url = reverse("api-1-origins", query_params={"origin_count": 1}) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 assert {origin["url"] for origin in rv.data} <= origin_urls # Get all url = reverse("api-1-origins", query_params={"origin_count": len(origins)}) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == len(origins) assert {origin["url"] for origin in rv.data} == origin_urls # Get "all + 10" url = reverse("api-1-origins", query_params={"origin_count": len(origins) + 10}) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == len(origins) assert {origin["url"] for origin in rv.data} == origin_urls @pytest.mark.parametrize("origin_count", [1, 2, 10, 100]) def test_api_origins_scroll(api_client, archive_data, origin_count): page_result = archive_data.origin_list(limit=10000) origins = page_result.results origin_urls = {origin.url for origin in origins} url = reverse("api-1-origins", query_params={"origin_count": origin_count}) results = _scroll_results(api_client, url) assert len(results) == len(origins) assert {origin["url"] for origin in results} == origin_urls @given(origin()) def test_api_origin_by_url(api_client, archive_data, origin): origin_url = origin["url"] url = reverse("api-1-origin", url_args={"origin_url": origin_url}) rv = check_api_get_responses(api_client, url, status_code=200) expected_origin = archive_data.origin_get([origin_url])[0] expected_origin = enrich_origin(expected_origin, rv.wsgi_request) assert rv.data == expected_origin @given(new_origin()) def test_api_origin_not_found(api_client, new_origin): url = reverse("api-1-origin", url_args={"origin_url": new_origin.url}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Origin with url %s not found!" % new_origin.url, } @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) def test_api_origin_search(api_client, mocker, backend): if backend != "swh-search": # equivalent to not configuring search in the config - mocker.patch("swh.web.common.service.search", None) + mocker.patch("swh.web.common.archive.search", None) expected_origins = { "https://github.com/wcoder/highlightjs-line-numbers.js", "https://github.com/memononen/libtess2", } # Search for 'github.com', get only one url = reverse( "api-1-origin-search", url_args={"url_pattern": "github.com"}, query_params={"limit": 1}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 assert {origin["url"] for origin in rv.data} <= expected_origins # Search for 'github.com', get all url = reverse( "api-1-origin-search", url_args={"url_pattern": "github.com"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins # Search for 'github.com', get more than available url = reverse( "api-1-origin-search", url_args={"url_pattern": "github.com"}, query_params={"limit": 10}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) def test_api_origin_search_words(api_client, mocker, backend): if backend != "swh-search": # equivalent to not configuring search in the config - mocker.patch("swh.web.common.service.search", None) + mocker.patch("swh.web.common.archive.search", None) expected_origins = { "https://github.com/wcoder/highlightjs-line-numbers.js", "https://github.com/memononen/libtess2", } url = reverse( "api-1-origin-search", url_args={"url_pattern": "github com"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins url = reverse( "api-1-origin-search", url_args={"url_pattern": "com github"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins url = reverse( "api-1-origin-search", url_args={"url_pattern": "memononen libtess2"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 assert {origin["url"] for origin in rv.data} == { "https://github.com/memononen/libtess2" } url = reverse( "api-1-origin-search", url_args={"url_pattern": "libtess2 memononen"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 assert {origin["url"] for origin in rv.data} == { "https://github.com/memononen/libtess2" } @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) @pytest.mark.parametrize("limit", [1, 2, 3, 10]) def test_api_origin_search_scroll(api_client, archive_data, mocker, limit, backend): if backend != "swh-search": # equivalent to not configuring search in the config - mocker.patch("swh.web.common.service.search", None) + mocker.patch("swh.web.common.archive.search", None) expected_origins = { "https://github.com/wcoder/highlightjs-line-numbers.js", "https://github.com/memononen/libtess2", } url = reverse( "api-1-origin-search", url_args={"url_pattern": "github.com"}, query_params={"limit": limit}, ) results = _scroll_results(api_client, url) assert {origin["url"] for origin in results} == expected_origins @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) def test_api_origin_search_limit(api_client, archive_data, tests_data, mocker, backend): if backend == "swh-search": tests_data["search"].origin_update( [{"url": "http://foobar/{}".format(i)} for i in range(2000)] ) else: # equivalent to not configuring search in the config - mocker.patch("swh.web.common.service.search", None) + mocker.patch("swh.web.common.archive.search", None) archive_data.origin_add( [Origin(url="http://foobar/{}".format(i)) for i in range(2000)] ) url = reverse( "api-1-origin-search", url_args={"url_pattern": "foobar"}, query_params={"limit": 1050}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1000 @given(origin()) def test_api_origin_metadata_search(api_client, mocker, origin): - mock_idx_storage = mocker.patch("swh.web.common.service.idx_storage") + mock_idx_storage = mocker.patch("swh.web.common.archive.idx_storage") oimsft = mock_idx_storage.origin_intrinsic_metadata_search_fulltext oimsft.side_effect = lambda conjunction, limit: [ { "from_revision": ( b"p&\xb7\xc1\xa2\xafVR\x1e\x95\x1c\x01\xed " b"\xf2U\xfa\x05B8" ), "metadata": {"author": "Jane Doe"}, "id": origin["url"], "tool": { "configuration": { "context": ["NpmMapping", "CodemetaMapping"], "type": "local", }, "id": 3, "name": "swh-metadata-detector", "version": "0.0.1", }, } ] url = reverse("api-1-origin-metadata-search", query_params={"fulltext": "Jane Doe"}) rv = check_api_get_responses(api_client, url, status_code=200) expected_data = [ { "url": origin["url"], "metadata": { "metadata": {"author": "Jane Doe"}, "from_revision": ("7026b7c1a2af56521e951c01ed20f255fa054238"), "tool": { "configuration": { "context": ["NpmMapping", "CodemetaMapping"], "type": "local", }, "id": 3, "name": "swh-metadata-detector", "version": "0.0.1", }, }, } ] assert rv.data == expected_data oimsft.assert_called_with(conjunction=["Jane Doe"], limit=70) @given(origin()) def test_api_origin_metadata_search_limit(api_client, mocker, origin): - mock_idx_storage = mocker.patch("swh.web.common.service.idx_storage") + mock_idx_storage = mocker.patch("swh.web.common.archive.idx_storage") oimsft = mock_idx_storage.origin_intrinsic_metadata_search_fulltext oimsft.side_effect = lambda conjunction, limit: [ { "from_revision": ( b"p&\xb7\xc1\xa2\xafVR\x1e\x95\x1c\x01\xed " b"\xf2U\xfa\x05B8" ), "metadata": {"author": "Jane Doe"}, "id": origin["url"], "tool": { "configuration": { "context": ["NpmMapping", "CodemetaMapping"], "type": "local", }, "id": 3, "name": "swh-metadata-detector", "version": "0.0.1", }, } ] url = reverse("api-1-origin-metadata-search", query_params={"fulltext": "Jane Doe"}) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 oimsft.assert_called_with(conjunction=["Jane Doe"], limit=70) url = reverse( "api-1-origin-metadata-search", query_params={"fulltext": "Jane Doe", "limit": 10}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 oimsft.assert_called_with(conjunction=["Jane Doe"], limit=10) url = reverse( "api-1-origin-metadata-search", query_params={"fulltext": "Jane Doe", "limit": 987}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 oimsft.assert_called_with(conjunction=["Jane Doe"], limit=100) @given(origin()) def test_api_origin_intrinsic_metadata(api_client, mocker, origin): - mock_idx_storage = mocker.patch("swh.web.common.service.idx_storage") + mock_idx_storage = mocker.patch("swh.web.common.archive.idx_storage") oimg = mock_idx_storage.origin_intrinsic_metadata_get oimg.side_effect = lambda origin_urls: [ { "from_revision": ( b"p&\xb7\xc1\xa2\xafVR\x1e\x95\x1c\x01\xed " b"\xf2U\xfa\x05B8" ), "metadata": {"author": "Jane Doe"}, "id": origin["url"], "tool": { "configuration": { "context": ["NpmMapping", "CodemetaMapping"], "type": "local", }, "id": 3, "name": "swh-metadata-detector", "version": "0.0.1", }, } ] url = reverse( "api-origin-intrinsic-metadata", url_args={"origin_url": origin["url"]} ) rv = check_api_get_responses(api_client, url, status_code=200) oimg.assert_called_with([origin["url"]]) expected_data = {"author": "Jane Doe"} assert rv.data == expected_data def test_api_origin_metadata_search_invalid(api_client, mocker): - mock_idx_storage = mocker.patch("swh.web.common.service.idx_storage") + mock_idx_storage = mocker.patch("swh.web.common.archive.idx_storage") url = reverse("api-1-origin-metadata-search") check_api_get_responses(api_client, url, status_code=400) mock_idx_storage.assert_not_called() diff --git a/swh/web/tests/api/views/test_stat.py b/swh/web/tests/api/views/test_stat.py index 6c603f9c..6030147a 100644 --- a/swh/web/tests/api/views/test_stat.py +++ b/swh/web/tests/api/views/test_stat.py @@ -1,59 +1,59 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.storage.exc import StorageAPIError, StorageDBError from swh.web.common.exc import BadInputExc from swh.web.common.utils import reverse from swh.web.tests.api.views import check_api_get_responses def test_api_1_stat_counters_raise_error(api_client, mocker): - mock_service = mocker.patch("swh.web.api.views.stat.service") - mock_service.stat_counters.side_effect = BadInputExc( + mock_archive = mocker.patch("swh.web.api.views.stat.archive") + mock_archive.stat_counters.side_effect = BadInputExc( "voluntary error to check the bad request middleware." ) url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=400) assert rv.data == { "exception": "BadInputExc", "reason": "voluntary error to check the bad request middleware.", } def test_api_1_stat_counters_raise_from_db(api_client, mocker): - mock_service = mocker.patch("swh.web.api.views.stat.service") - mock_service.stat_counters.side_effect = StorageDBError( + mock_archive = mocker.patch("swh.web.api.views.stat.archive") + mock_archive.stat_counters.side_effect = StorageDBError( "Storage exploded! Will be back online shortly!" ) url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=503) assert rv.data == { "exception": "StorageDBError", "reason": "An unexpected error occurred in the backend: " "Storage exploded! Will be back online shortly!", } def test_api_1_stat_counters_raise_from_api(api_client, mocker): - mock_service = mocker.patch("swh.web.api.views.stat.service") - mock_service.stat_counters.side_effect = StorageAPIError( + mock_archive = mocker.patch("swh.web.api.views.stat.archive") + mock_archive.stat_counters.side_effect = StorageAPIError( "Storage API dropped dead! Will resurrect from its ashes asap!" ) url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=503) assert rv.data == { "exception": "StorageAPIError", "reason": "An unexpected error occurred in the api backend: " "Storage API dropped dead! Will resurrect from its ashes asap!", } def test_api_1_stat_counters(api_client, archive_data): url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == archive_data.stat_counters() diff --git a/swh/web/tests/api/views/test_vault.py b/swh/web/tests/api/views/test_vault.py index 9a939679..71772bb5 100644 --- a/swh/web/tests/api/views/test_vault.py +++ b/swh/web/tests/api/views/test_vault.py @@ -1,166 +1,166 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from hypothesis import given from swh.model import hashutil from swh.vault.exc import NotFoundExc from swh.web.common.utils import reverse from swh.web.tests.api.views import check_api_get_responses, check_api_post_responses from swh.web.tests.strategies import ( directory, revision, unknown_directory, unknown_revision, ) @given(directory(), revision()) def test_api_vault_cook(api_client, mocker, directory, revision): - mock_service = mocker.patch("swh.web.api.views.vault.service") + mock_archive = mocker.patch("swh.web.api.views.vault.archive") for obj_type, obj_id in ( ("directory", directory), ("revision_gitfast", revision), ): fetch_url = reverse( f"api-1-vault-fetch-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, ) stub_cook = { "fetch_url": fetch_url, "obj_id": obj_id, "obj_type": obj_type, "progress_message": None, "status": "done", "task_uuid": "de75c902-5ee5-4739-996e-448376a93eff", } stub_fetch = b"content" - mock_service.vault_cook.return_value = stub_cook - mock_service.vault_fetch.return_value = stub_fetch + mock_archive.vault_cook.return_value = stub_cook + mock_archive.vault_fetch.return_value = stub_fetch email = "test@test.mail" url = reverse( f"api-1-vault-cook-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, query_params={"email": email}, ) rv = check_api_post_responses(api_client, url, data=None, status_code=200) stub_cook["fetch_url"] = rv.wsgi_request.build_absolute_uri( stub_cook["fetch_url"] ) assert rv.data == stub_cook - mock_service.vault_cook.assert_called_with( + mock_archive.vault_cook.assert_called_with( obj_type, hashutil.hash_to_bytes(obj_id), email ) rv = api_client.get(fetch_url) assert rv.status_code == 200 assert rv["Content-Type"] == "application/gzip" assert rv.content == stub_fetch - mock_service.vault_fetch.assert_called_with( + mock_archive.vault_fetch.assert_called_with( obj_type, hashutil.hash_to_bytes(obj_id) ) @given(directory(), revision()) def test_api_vault_cook_uppercase_hash(api_client, directory, revision): for obj_type, obj_id in ( ("directory", directory), ("revision_gitfast", revision), ): url = reverse( f"api-1-vault-cook-{obj_type}-uppercase-checksum", url_args={f"{obj_type[:3]}_id": obj_id.upper()}, ) rv = api_client.post(url, {"email": "test@test.mail"}) assert rv.status_code == 302 redirect_url = reverse( f"api-1-vault-cook-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id} ) assert rv["location"] == redirect_url fetch_url = reverse( f"api-1-vault-fetch-{obj_type}-uppercase-checksum", url_args={f"{obj_type[:3]}_id": obj_id.upper()}, ) rv = api_client.get(fetch_url) assert rv.status_code == 302 redirect_url = reverse( f"api-1-vault-fetch-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, ) assert rv["location"] == redirect_url @given(directory(), revision(), unknown_directory(), unknown_revision()) def test_api_vault_cook_notfound( api_client, mocker, directory, revision, unknown_directory, unknown_revision ): - mock_vault = mocker.patch("swh.web.common.service.vault") + mock_vault = mocker.patch("swh.web.common.archive.vault") mock_vault.cook.side_effect = NotFoundExc("object not found") mock_vault.fetch.side_effect = NotFoundExc("cooked archive not found") mock_vault.progress.side_effect = NotFoundExc("cooking request not found") for obj_type, obj_id in ( ("directory", directory), ("revision_gitfast", revision), ): obj_name = obj_type.split("_")[0] url = reverse( f"api-1-vault-cook-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert ( rv.data["reason"] == f"Cooking of {obj_name} '{obj_id}' was never requested." ) mock_vault.progress.assert_called_with(obj_type, hashutil.hash_to_bytes(obj_id)) for obj_type, obj_id in ( ("directory", unknown_directory), ("revision_gitfast", unknown_revision), ): obj_name = obj_type.split("_")[0] url = reverse( f"api-1-vault-cook-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id} ) rv = check_api_post_responses(api_client, url, data=None, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert rv.data["reason"] == f"{obj_name.title()} '{obj_id}' not found." mock_vault.cook.assert_called_with( obj_type, hashutil.hash_to_bytes(obj_id), email=None ) fetch_url = reverse( f"api-1-vault-fetch-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, ) rv = check_api_get_responses(api_client, fetch_url, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert ( rv.data["reason"] == f"Cooked archive for {obj_name} '{obj_id}' not found." ) mock_vault.fetch.assert_called_with(obj_type, hashutil.hash_to_bytes(obj_id)) diff --git a/swh/web/tests/browse/views/test_content.py b/swh/web/tests/browse/views/test_content.py index 0af95b97..835c306a 100644 --- a/swh/web/tests/browse/views/test_content.py +++ b/swh/web/tests/browse/views/test_content.py @@ -1,589 +1,589 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import random from hypothesis import given from django.utils.html import escape from swh.model.identifiers import CONTENT, DIRECTORY, RELEASE, REVISION, SNAPSHOT from swh.web.browse.snapshot_context import process_snapshot_branches from swh.web.browse.utils import ( _re_encode_content, get_mimetype_and_encoding_for_content, prepare_content_for_display, ) from swh.web.common.exc import NotFoundExc from swh.web.common.identifiers import gen_swhid from swh.web.common.utils import gen_path_info, reverse from swh.web.tests.django_asserts import ( assert_contains, assert_not_contains, assert_template_used, ) from swh.web.tests.strategies import ( content, content_image_type, content_text, content_text_no_highlight, content_text_non_utf8, content_unsupported_image_type_rendering, content_utf8_detected_as_binary, invalid_sha1, origin_with_multiple_visits, unknown_content, ) @given(content_text()) def test_content_view_text(client, archive_data, content): sha1_git = content["sha1_git"] url = reverse( "browse-content", url_args={"query_string": content["sha1"]}, query_params={"path": content["path"]}, ) url_raw = reverse("browse-content-raw", url_args={"query_string": content["sha1"]}) resp = client.get(url) content_display = _process_content_for_display(archive_data, content) mimetype = content_display["mimetype"] assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") if mimetype.startswith("text/"): assert_contains(resp, '' % content_display["language"]) assert_contains(resp, escape(content_display["content_data"])) assert_contains(resp, url_raw) swh_cnt_id = gen_swhid(CONTENT, sha1_git) swh_cnt_id_url = reverse("browse-swhid", url_args={"swhid": swh_cnt_id}) assert_contains(resp, swh_cnt_id) assert_contains(resp, swh_cnt_id_url) @given(content_text_no_highlight()) def test_content_view_text_no_highlight(client, archive_data, content): sha1_git = content["sha1_git"] url = reverse("browse-content", url_args={"query_string": content["sha1"]}) url_raw = reverse("browse-content-raw", url_args={"query_string": content["sha1"]}) resp = client.get(url) content_display = _process_content_for_display(archive_data, content) assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") assert_contains(resp, '') assert_contains(resp, escape(content_display["content_data"])) assert_contains(resp, url_raw) swh_cnt_id = gen_swhid(CONTENT, sha1_git) swh_cnt_id_url = reverse("browse-swhid", url_args={"swhid": swh_cnt_id}) assert_contains(resp, swh_cnt_id) assert_contains(resp, swh_cnt_id_url) @given(content_text_non_utf8()) def test_content_view_no_utf8_text(client, archive_data, content): sha1_git = content["sha1_git"] url = reverse("browse-content", url_args={"query_string": content["sha1"]}) resp = client.get(url) content_display = _process_content_for_display(archive_data, content) assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") swh_cnt_id = gen_swhid(CONTENT, sha1_git) swh_cnt_id_url = reverse("browse-swhid", url_args={"swhid": swh_cnt_id}) assert_contains(resp, swh_cnt_id_url) assert_contains(resp, escape(content_display["content_data"])) @given(content_image_type()) def test_content_view_image(client, archive_data, content): url = reverse("browse-content", url_args={"query_string": content["sha1"]}) url_raw = reverse("browse-content-raw", url_args={"query_string": content["sha1"]}) resp = client.get(url) content_display = _process_content_for_display(archive_data, content) mimetype = content_display["mimetype"] content_data = content_display["content_data"] assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") assert_contains(resp, '' % (mimetype, content_data)) assert_contains(resp, url_raw) @given(content_unsupported_image_type_rendering()) def test_content_view_image_no_rendering(client, archive_data, content): url = reverse("browse-content", url_args={"query_string": content["sha1"]}) resp = client.get(url) mimetype = content["mimetype"] encoding = content["encoding"] assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") assert_contains( resp, ( f"Content with mime type {mimetype} and encoding {encoding} " "cannot be displayed." ), ) @given(content_text()) def test_content_view_text_with_path(client, archive_data, content): path = content["path"] url = reverse( "browse-content", url_args={"query_string": content["sha1"]}, query_params={"path": path}, ) resp = client.get(url) assert resp.status_code == 200 assert_template_used(resp, "browse/content.html") assert_contains(resp, '