diff --git a/docs/developers-info.rst b/docs/developers-info.rst index 876acf87..0e063e3e 100644 --- a/docs/developers-info.rst +++ b/docs/developers-info.rst @@ -1,126 +1,126 @@ Developers Information ====================== Sample configuration -------------------- The configuration will be taken from the default configuration file: ``~/.config/swh/web/web.yml``. The following introduces a default configuration file: .. sourcecode:: yaml storage: cls: remote args: url: http://localhost:5002 debug: false throttling: cache_uri: None scopes: swh_api: limiter_rate: default: 120/h exempted_networks: - 127.0.0.0/8 Run server ---------- Either use the django manage script directly (useful in development mode as it offers various commands): .. sourcecode:: shell $ python3 -m swh.web.manage runserver or use the following shortcut: .. sourcecode:: shell $ make run Modules description ------------------- Common to all web applications ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Configuration and settings """""""""""""""""""""""""" * :mod:`swh.web.config`: holds the configuration for the web applications. when building the documentation. * :mod:`swh.web.manage`: Django management module for developers. * :mod:`swh.web.urls`: module that holds the whole URI scheme of all the web applications. * :mod:`swh.web.settings.common`: Common Django settings * :mod:`swh.web.settings.development`: Django settings for development * :mod:`swh.web.settings.production`: Django settings for production * :mod:`swh.web.settings.tests`: Django settings for tests Common utilities """""""""""""""" - * :mod:`swh.web.common.converters`: conversion module used to transform raw data - to serializable ones. It is used by :mod:`swh.web.common.archive`: to convert data + * :mod:`swh.web.utils.converters`: conversion module used to transform raw data + to serializable ones. It is used by :mod:`swh.web.utils.archive`: to convert data before transmitting then to Django views. - * :mod:`swh.web.common.exc`: module defining exceptions used in the web applications. - * :mod:`swh.web.common.highlightjs`: utility module to ease the use of the highlightjs_ + * :mod:`swh.web.utils.exc`: module defining exceptions used in the web applications. + * :mod:`swh.web.utils.highlightjs`: utility module to ease the use of the highlightjs_ library in produced Django views. - * :mod:`swh.web.common.query`: Utilities to parse data from HTTP endpoints. It is used - by :mod:`swh.web.common.archive`. - * :mod:`swh.web.common.archive`: Orchestration layer used by views module + * :mod:`swh.web.utils.query`: Utilities to parse data from HTTP endpoints. It is used + by :mod:`swh.web.utils.archive`. + * :mod:`swh.web.utils.archive`: Orchestration layer used by views module in charge of communication with :mod:`swh.storage` to retrieve information and perform conversion for the upper layer. - * :mod:`swh.web.common.swh_templatetags`: Custom Django template tags library for swh. - * :mod:`swh.web.common.urlsindex`: Utilities to help the registering of endpoints + * :mod:`swh.web.utils.swh_templatetags`: Custom Django template tags library for swh. + * :mod:`swh.web.utils.urlsindex`: Utilities to help the registering of endpoints for the web applications - * :mod:`swh.web.common.utils`: Utility functions used in the web applications implementation + * :mod:`swh.web.utils`: Utility functions used in the web applications implementation swh-web API application ^^^^^^^^^^^^^^^^^^^^^^^ * :mod:`swh.web.api.apidoc`: Utilities to document the web api for its html browsable rendering. * :mod:`swh.web.api.apiresponse`: Utility module to ease the generation of web api responses. * :mod:`swh.web.api.apiurls`: Utilities to facilitate the registration of web api endpoints. * :mod:`swh.web.api.throttling`: Custom request rate limiter to use with the `Django REST Framework `_ * :mod:`swh.web.api.urls`: Module that defines the whole URI scheme for the api endpoints * :mod:`swh.web.api.utils`: Utility functions used in the web api implementation. * :mod:`swh.web.api.views.content`: Implementation of API endpoints for getting information about contents. * :mod:`swh.web.api.views.directory`: Implementation of API endpoints for getting information about directories. * :mod:`swh.web.api.views.origin`: Implementation of API endpoints for getting information about origins. * :mod:`swh.web.api.views.person`: Implementation of API endpoints for getting information about persons. * :mod:`swh.web.api.views.release`: Implementation of API endpoints for getting information about releases. * :mod:`swh.web.api.views.revision`: Implementation of API endpoints for getting information about revisions. * :mod:`swh.web.api.views.snapshot`: Implementation of API endpoints for getting information about snapshots. * :mod:`swh.web.api.views.stat`: Implementation of API endpoints for getting information about archive statistics. * :mod:`swh.web.api.views.utils`: Utilities used in the web api endpoints implementation. swh-web browse application ^^^^^^^^^^^^^^^^^^^^^^^^^^ * :mod:`swh.web.browse.browseurls`: Utilities to facilitate the registration of browse endpoints. * :mod:`swh.web.browse.urls`: Module that defines the whole URI scheme for the browse endpoints. * :mod:`swh.web.browse.utils`: Utilities functions used throughout the browse endpoints implementation. * :mod:`swh.web.browse.views.content`: Implementation of endpoints for browsing contents. * :mod:`swh.web.browse.views.directory`: Implementation of endpoints for browsing directories. * :mod:`swh.web.browse.views.identifiers`: Implementation of endpoints for browsing objects through :ref:`persistent-identifiers`. * :mod:`swh.web.browse.views.origin`: Implementation of endpoints for browsing origins. * :mod:`swh.web.browse.views.person`: Implementation of endpoints for browsing persons. * :mod:`swh.web.browse.views.release`: Implementation of endpoints for browsing releases. * :mod:`swh.web.browse.views.revision`: Implementation of endpoints for browsing revisions. * :mod:`swh.web.browse.views.snapshot`: Implementation of endpoints for browsing snapshots. .. _highlightjs: https://highlightjs.org/ diff --git a/swh/web/add_forge_now/api_views.py b/swh/web/add_forge_now/api_views.py index 83479d31..99fd8a93 100644 --- a/swh/web/add_forge_now/api_views.py +++ b/swh/web/add_forge_now/api_views.py @@ -1,391 +1,391 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json from typing import Any, Dict, Union from django.core.exceptions import ObjectDoesNotExist from django.core.paginator import Paginator from django.db import transaction from django.forms import CharField, ModelForm from django.http import HttpResponseBadRequest from django.http.request import HttpRequest from django.http.response import HttpResponse, HttpResponseForbidden from rest_framework import serializers from rest_framework.request import Request from rest_framework.response import Response from swh.web.add_forge_now.models import Request as AddForgeRequest from swh.web.add_forge_now.models import RequestActorRole as AddForgeNowRequestActorRole from swh.web.add_forge_now.models import RequestHistory as AddForgeNowRequestHistory from swh.web.add_forge_now.models import RequestStatus as AddForgeNowRequestStatus from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.auth.utils import is_add_forge_now_moderator -from swh.web.common.exc import BadInputExc -from swh.web.common.utils import reverse +from swh.web.utils import reverse +from swh.web.utils.exc import BadInputExc def _block_while_testing(): """Replaced by tests to check concurrency behavior""" pass class AddForgeNowRequestForm(ModelForm): forge_contact_comment = CharField( required=False, ) class Meta: model = AddForgeRequest fields = ( "forge_type", "forge_url", "forge_contact_email", "forge_contact_name", "forge_contact_comment", "submitter_forward_username", ) class AddForgeNowRequestHistoryForm(ModelForm): new_status = CharField( max_length=200, required=False, ) class Meta: model = AddForgeNowRequestHistory fields = ("text", "new_status") class AddForgeNowRequestSerializer(serializers.ModelSerializer): inbound_email_address = serializers.CharField() forge_domain = serializers.CharField() class Meta: model = AddForgeRequest fields = "__all__" class AddForgeNowRequestPublicSerializer(serializers.ModelSerializer): """Serializes AddForgeRequest without private fields.""" class Meta: model = AddForgeRequest fields = ("id", "forge_url", "forge_type", "status", "submission_date") class AddForgeNowRequestHistorySerializer(serializers.ModelSerializer): message_source_url = serializers.SerializerMethodField() class Meta: model = AddForgeNowRequestHistory exclude = ("request", "message_source") def get_message_source_url(self, request_history): if request_history.message_source is None: return None return reverse( "forge-add-message-source", url_args={"id": request_history.pk}, request=self.context["request"], ) class AddForgeNowRequestHistoryPublicSerializer(serializers.ModelSerializer): class Meta: model = AddForgeNowRequestHistory fields = ("id", "date", "new_status", "actor_role") @api_route( r"/add-forge/request/create/", "api-1-add-forge-request-create", methods=["POST"], ) @api_doc("/add-forge/request/create") @format_docstring() @transaction.atomic def api_add_forge_request_create(request: Union[HttpRequest, Request]) -> HttpResponse: """ .. http:post:: /api/1/add-forge/request/create/ Create a new request to add a forge to the list of those crawled regularly by Software Heritage. .. warning:: That endpoint is not publicly available and requires authentication in order to be able to request it. {common_headers} :[0-9]+)/update/", "api-1-add-forge-request-update", methods=["POST"], ) @api_doc("/add-forge/request/update", tags=["hidden"]) @format_docstring() @transaction.atomic def api_add_forge_request_update( request: Union[HttpRequest, Request], id: int ) -> HttpResponse: """ .. http:post:: /api/1/add-forge/request/update/ Update a request to add a forge to the list of those crawled regularly by Software Heritage. .. warning:: That endpoint is not publicly available and requires authentication in order to be able to request it. {common_headers} :[0-9]+)/get/", "api-1-add-forge-request-get", methods=["GET"], ) @api_doc("/add-forge/request/get") @format_docstring() def api_add_forge_request_get(request: Request, id: int): """ .. http:get:: /api/1/add-forge/request/get/ Return all details about an add-forge request. {common_headers} :param int id: add-forge request identifier :statuscode 200: request details successfully returned :statuscode 400: request identifier does not exist """ try: add_forge_request = AddForgeRequest.objects.get(id=id) except ObjectDoesNotExist: raise BadInputExc("Request id does not exist") request_history = AddForgeNowRequestHistory.objects.filter( request=add_forge_request ).order_by("id") if is_add_forge_now_moderator(request.user): data = AddForgeNowRequestSerializer(add_forge_request).data history = AddForgeNowRequestHistorySerializer( request_history, many=True, context={"request": request} ).data else: data = AddForgeNowRequestPublicSerializer(add_forge_request).data history = AddForgeNowRequestHistoryPublicSerializer( request_history, many=True ).data return {"request": data, "history": history} diff --git a/swh/web/api/apidoc.py b/swh/web/api/apidoc.py index 74c3bc77..fefef39e 100644 --- a/swh/web/api/apidoc.py +++ b/swh/web/api/apidoc.py @@ -1,472 +1,472 @@ # Copyright (C) 2015-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from collections import defaultdict import functools from functools import wraps import os import re import textwrap from typing import List import docutils.nodes import docutils.parsers.rst import docutils.utils from rest_framework.decorators import api_view from swh.web.api.apiresponse import make_api_response from swh.web.api.apiurls import APIUrls -from swh.web.common.utils import parse_rst +from swh.web.utils import parse_rst class _HTTPDomainDocVisitor(docutils.nodes.NodeVisitor): """ docutils visitor for walking on a parsed docutils document containing sphinx httpdomain roles. Its purpose is to extract relevant info regarding swh api endpoints (for instance url arguments) from their docstring written using sphinx httpdomain; and produce the main description back into a ReST string """ # httpdomain roles we want to parse (based on sphinxcontrib.httpdomain 1.6) parameter_roles = ("param", "parameter", "arg", "argument") request_json_object_roles = ("reqjsonobj", "reqjson", "jsonobj", ">json") response_json_array_roles = ("resjsonarr", ">jsonarr") query_parameter_roles = ("queryparameter", "queryparam", "qparam", "query") request_header_roles = ("header", "resheader", "responseheader") status_code_roles = ("statuscode", "status", "code") def __init__(self, document, data): super().__init__(document) self.data = data self.args_set = set() self.params_set = set() self.inputs_set = set() self.returns_set = set() self.status_codes_set = set() self.reqheaders_set = set() self.resheaders_set = set() self.current_json_obj = None self.current_field_name = None def _default_visit(self, node: docutils.nodes.Element) -> str: """Simply visits a text node, drops its start and end tags, visits the children, and concatenates their results.""" return "".join(map(self.dispatch_visit, node.children)) def visit_emphasis(self, node: docutils.nodes.emphasis) -> str: return f"*{self._default_visit(node)}*" def visit_strong(self, node: docutils.nodes.emphasis) -> str: return f"**{self._default_visit(node)}**" def visit_reference(self, node: docutils.nodes.reference) -> str: text = self._default_visit(node) refuri = node.attributes.get("refuri") if refuri is not None: return f"`{text} <{refuri}>`__" else: return f"`{text}`_" def visit_target(self, node: docutils.nodes.reference) -> str: parts = ["\n"] parts.extend( f".. _{name}: {node.attributes['refuri']}" for name in node.attributes["names"] ) return "\n".join(parts) def visit_literal(self, node: docutils.nodes.literal) -> str: return f"``{self._default_visit(node)}``" def visit_field_name(self, node: docutils.nodes.field_name) -> str: self.current_field_name = node.astext() return "" def visit_field_body(self, node: docutils.nodes.field_body) -> str: text = self._default_visit(node).strip() assert text, str(node) field_data = self.current_field_name.split(" ") # Parameters if field_data[0] in self.parameter_roles: if field_data[2] not in self.args_set: self.data["args"].append( {"name": field_data[2], "type": field_data[1], "doc": text} ) self.args_set.add(field_data[2]) # Query Parameters if field_data[0] in self.query_parameter_roles: if field_data[2] not in self.params_set: self.data["params"].append( {"name": field_data[2], "type": field_data[1], "doc": text} ) self.params_set.add(field_data[2]) # Request data type if ( field_data[0] in self.request_json_array_roles or field_data[0] in self.request_json_object_roles ): # array if field_data[0] in self.request_json_array_roles: self.data["input_type"] = "array" # object else: self.data["input_type"] = "object" # input object field if field_data[2] not in self.inputs_set: self.data["inputs"].append( {"name": field_data[2], "type": field_data[1], "doc": text} ) self.inputs_set.add(field_data[2]) self.current_json_obj = self.data["inputs"][-1] # Response type if ( field_data[0] in self.response_json_array_roles or field_data[0] in self.response_json_object_roles ): # array if field_data[0] in self.response_json_array_roles: self.data["return_type"] = "array" # object else: self.data["return_type"] = "object" # returned object field if field_data[2] not in self.returns_set: self.data["returns"].append( {"name": field_data[2], "type": field_data[1], "doc": text} ) self.returns_set.add(field_data[2]) self.current_json_obj = self.data["returns"][-1] # Status Codes if field_data[0] in self.status_code_roles: if field_data[1] not in self.status_codes_set: self.data["status_codes"].append({"code": field_data[1], "doc": text}) self.status_codes_set.add(field_data[1]) # Request Headers if field_data[0] in self.request_header_roles: if field_data[1] not in self.reqheaders_set: self.data["reqheaders"].append({"name": field_data[1], "doc": text}) self.reqheaders_set.add(field_data[1]) # Response Headers if field_data[0] in self.response_header_roles: if field_data[1] not in self.resheaders_set: resheader = {"name": field_data[1], "doc": text} self.data["resheaders"].append(resheader) self.resheaders_set.add(field_data[1]) if ( resheader["name"] == "Content-Type" and resheader["doc"] == "application/octet-stream" ): self.data["return_type"] = "octet stream" # Don't return anything in the description; these nodes only add text # to other fields return "" # We ignore these nodes and handle their subtrees directly in # visit_field_name and visit_field_body visit_field = visit_field_list = _default_visit def visit_paragraph(self, node: docutils.nodes.paragraph) -> str: """ Visit relevant paragraphs to parse """ # only parsed top level paragraphs text = self._default_visit(node) return "\n\n" + text def visit_literal_block(self, node: docutils.nodes.literal_block) -> str: """ Visit literal blocks """ text = node.astext() return f"\n\n::\n\n{textwrap.indent(text, ' ')}\n" def visit_bullet_list(self, node: docutils.nodes.bullet_list) -> str: parts = ["\n\n"] for child in node.traverse(): # process list item if isinstance(child, docutils.nodes.paragraph): line_text = self.dispatch_visit(child) parts.append("\t* %s\n" % textwrap.indent(line_text, "\t ").strip()) return "".join(parts) # visit_bullet_list collects and handles this with a more global view: visit_list_item = _default_visit def visit_warning(self, node: docutils.nodes.warning) -> str: text = self._default_visit(node) return "\n\n.. warning::\n%s\n" % textwrap.indent(text, "\t") def visit_Text(self, node: docutils.nodes.Text) -> str: """Leaf node""" return str(node).replace("\n", " ") # Prettier in generated HTML def visit_problematic(self, node: docutils.nodes.problematic) -> str: # api urls cleanup to generate valid links afterwards text = self._default_visit(node) subs_made = 1 while subs_made: (text, subs_made) = re.subn(r"(:http:.*)(\(\w+\))", r"\1", text) subs_made = 1 while subs_made: (text, subs_made) = re.subn(r"(:http:.*)(\[.*\])", r"\1", text) text = re.sub(r"([^:])//", r"\1/", text) # transform references to api endpoints doc into valid rst links text = re.sub(":http:get:`([^,`]*)`", r"`\1 <\1doc/>`_", text) # transform references to some elements into bold text text = re.sub(":http:header:`(.*)`", r"**\1**", text) text = re.sub(":func:`(.*)`", r"**\1**", text) text = re.sub(":mod:`(.*)`", r"**\1**", text) # extract example urls if ":swh_web_api:" in text: # Extract examples to their own section examples_str = re.sub(":swh_web_api:`(.+)`.*", r"/api/1/\1", text) self.data["examples"] += examples_str.split("\n") return text def visit_block_quote(self, node: docutils.nodes.block_quote) -> str: return self._default_visit(node) return ( f".. code-block::\n" f"{textwrap.indent(self._default_visit(node), ' ')}\n" ) def visit_title_reference(self, node: docutils.nodes.title_reference) -> str: text = self._default_visit(node) raise Exception( f"Unexpected title reference. " f"Possible cause: you used `{text}` instead of ``{text}``" ) def visit_document(self, node: docutils.nodes.document) -> None: text = self._default_visit(node) # Strip examples; they are displayed separately text = re.split("\n\\*\\*Examples?:\\*\\*\n", text)[0] self.data["description"] = text.strip() def visit_system_message(self, node): return "" def unknown_visit(self, node) -> str: raise NotImplementedError( f"Unknown node type: {node.__class__.__name__}. Value: {node}" ) def unknown_departure(self, node): pass def _parse_httpdomain_doc(doc, data): doc_lines = doc.split("\n") doc_lines_filtered = [] urls = defaultdict(list) default_http_methods = ["HEAD", "OPTIONS"] # httpdomain is a sphinx extension that is unknown to docutils but # fortunately we can still parse its directives' content, # so remove lines with httpdomain directives before executing the # rst parser from docutils for doc_line in doc_lines: if ".. http" not in doc_line: doc_lines_filtered.append(doc_line) else: url = doc_line[doc_line.find("/") :] # emphasize url arguments for html rendering url = re.sub(r"\((\w+)\)", r" **\(\1\)** ", url) method = re.search(r"http:(\w+)::", doc_line).group(1) urls[url].append(method.upper()) for url, methods in urls.items(): data["urls"].append({"rule": url, "methods": methods + default_http_methods}) # parse the rst docstring and do not print system messages about # unknown httpdomain roles document = parse_rst("\n".join(doc_lines_filtered), report_level=5) # remove the system_message nodes from the parsed document for node in document.traverse(docutils.nodes.system_message): node.parent.remove(node) # visit the document nodes to extract relevant endpoint info visitor = _HTTPDomainDocVisitor(document, data) document.walkabout(visitor) class APIDocException(Exception): """ Custom exception to signal errors in the use of the APIDoc decorators """ def api_doc( route: str, noargs: bool = False, tags: List[str] = [], api_version: str = "1", ): """ Decorator for an API endpoint implementation used to generate a dedicated view displaying its HTML documentation. The documentation will be generated from the endpoint docstring based on sphinxcontrib-httpdomain format. Args: route: documentation page's route noargs: set to True if the route has no arguments, and its result should be displayed anytime its documentation is requested. Default to False tags: Further information on api endpoints. Two values are possibly expected: * hidden: remove the entry points from the listing * upcoming: display the entry point but it is not followable * deprecated: display the entry point as deprecated in the index api_version: api version string """ tags_set = set(tags) # @api_doc() Decorator call def decorator(f): # if the route is not hidden, add it to the index if "hidden" not in tags_set: doc_data = get_doc_data(f, route, noargs) doc_desc = doc_data["description"] APIUrls.add_doc_route( route, re.split(r"\.\s", doc_desc)[0], noargs=noargs, api_version=api_version, tags=tags_set, ) # create a dedicated view to display endpoint HTML doc @api_view(["GET", "HEAD"]) @wraps(f) def doc_view(request): doc_data = get_doc_data(f, route, noargs) return make_api_response(request, None, doc_data) route_name = "%s-doc" % route[1:-1].replace("/", "-") urlpattern = f"^{api_version}{route}doc/$" view_name = "api-%s-%s" % (api_version, route_name) APIUrls.add_url_pattern(urlpattern, doc_view, view_name) @wraps(f) def documented_view(request, **kwargs): doc_data = get_doc_data(f, route, noargs) try: return {"data": f(request, **kwargs), "doc_data": doc_data} except Exception as exc: exc.doc_data = doc_data raise exc return documented_view return decorator @functools.lru_cache(maxsize=32) def get_doc_data(f, route, noargs): """ Build documentation data for the decorated api endpoint function """ data = { "description": "", "response_data": None, "urls": [], "args": [], "params": [], "input_type": "", "inputs": [], "resheaders": [], "reqheaders": [], "return_type": "", "returns": [], "status_codes": [], "examples": [], "route": route, "noargs": noargs, } if not f.__doc__: raise APIDocException( "apidoc: expected a docstring" " for function %s" % (f.__name__,) ) # use raw docstring as endpoint documentation if sphinx # httpdomain is not used if ".. http" not in f.__doc__: data["description"] = f.__doc__ # else parse the sphinx httpdomain docstring with docutils # (except when building the swh-web documentation through autodoc # sphinx extension, not needed and raise errors with sphinx >= 1.7) elif "SWH_DOC_BUILD" not in os.environ: _parse_httpdomain_doc(f.__doc__, data) # process input/returned object info for nicer html display inputs_list = "" returns_list = "" for inp in data["inputs"]: # special case for array of non object type, for instance # :jsonarr string -: an array of string if ret["name"] != "-": returns_list += "\t* **%s (%s)**: %s\n" % ( ret["name"], ret["type"], textwrap.indent(ret["doc"], "\t "), ) data["inputs_list"] = inputs_list data["returns_list"] = returns_list return data DOC_COMMON_HEADERS = """ :reqheader Accept: the requested response content type, either ``application/json`` (default) or ``application/yaml`` :resheader Content-Type: this depends on :http:header:`Accept` header of request""" DOC_RESHEADER_LINK = """ :resheader Link: indicates that a subsequent result page is available and contains the url pointing to it """ DEFAULT_SUBSTITUTIONS = { "common_headers": DOC_COMMON_HEADERS, "resheader_link": DOC_RESHEADER_LINK, } def format_docstring(**substitutions): def decorator(f): f.__doc__ = f.__doc__.format(**{**DEFAULT_SUBSTITUTIONS, **substitutions}) return f return decorator diff --git a/swh/web/api/apiresponse.py b/swh/web/api/apiresponse.py index 5503c2b0..2c90205f 100644 --- a/swh/web/api/apiresponse.py +++ b/swh/web/api/apiresponse.py @@ -1,230 +1,230 @@ # Copyright (C) 2017-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json import logging import traceback from typing import Any, Dict, Optional from django.http import HttpResponse from django.shortcuts import render from django.utils.cache import add_never_cache_headers from django.utils.html import escape from rest_framework.exceptions import APIException from rest_framework.request import Request from rest_framework.response import Response from rest_framework.utils.encoders import JSONEncoder from swh.storage.exc import StorageAPIError, StorageDBError from swh.web.api import utils -from swh.web.common.exc import ( +from swh.web.config import get_config +from swh.web.utils import gen_path_info, shorten_path +from swh.web.utils.exc import ( BadInputExc, ForbiddenExc, LargePayloadExc, NotFoundExc, sentry_capture_exception, ) -from swh.web.common.utils import gen_path_info, shorten_path -from swh.web.config import get_config logger = logging.getLogger("django") def compute_link_header(rv: Dict[str, Any], options: Dict[str, Any]) -> Dict[str, Any]: """Add Link header in returned value results. Args: request: a DRF Request object rv (dict): dictionary with keys: - headers: potential headers with 'link-next' and 'link-prev' keys - results: containing the result to return options (dict): the initial dict to update with result if any Returns: dict: dictionary with optional keys 'link-next' and 'link-prev' """ link_headers = [] if "headers" not in rv: return {} rv_headers = rv["headers"] if "link-next" in rv_headers: link_headers.append('<%s>; rel="next"' % rv_headers["link-next"]) if "link-prev" in rv_headers: link_headers.append('<%s>; rel="previous"' % rv_headers["link-prev"]) if link_headers: link_header_str = ",".join(link_headers) headers = options.get("headers", {}) headers.update({"Link": link_header_str}) return headers return {} def filter_by_fields(request: Request, data: Dict[str, Any]) -> Dict[str, Any]: """Extract a request parameter 'fields' if it exists to permit the filtering on the data dict's keys. If such field is not provided, returns the data as is. """ fields = request.query_params.get("fields") if fields: data = utils.filter_field_keys(data, set(fields.split(","))) return data def transform(rv: Dict[str, Any]) -> Dict[str, Any]: """Transform an eventual returned value with multiple layer of information with only what's necessary. If the returned value rv contains the 'results' key, this is the associated value which is returned. Otherwise, return the initial dict without the potential 'headers' key. """ if "results" in rv: return rv["results"] if "headers" in rv: rv.pop("headers") return rv def make_api_response( request: Request, data: Dict[str, Any], doc_data: Optional[Dict[str, Any]] = None, options: Optional[Dict[str, Any]] = None, ) -> HttpResponse: """Generates an API response based on the requested mimetype. Args: request: a DRF Request object data: raw data to return in the API response doc_data: documentation data for HTML response options: optional data that can be used to generate the response Returns: a DRF Response a object """ options = options or {} if data: options["headers"] = compute_link_header(data, options) data = transform(data) data = filter_by_fields(request, data) doc_data = doc_data or {} headers = {} if "headers" in options: doc_data["headers_data"] = options["headers"] headers = options["headers"] # get request status code doc_data["status_code"] = options.get("status", 200) accepted_media_type = getattr(request, "accepted_media_type", "application/json") # when requesting HTML, typically when browsing the API through its # documented views, we need to enrich the input data with documentation # and render the apidoc HTML template if accepted_media_type == "text/html": doc_data["response_data"] = data if data is not None: doc_data["response_data"] = json.dumps( data, cls=JSONEncoder, sort_keys=True, indent=4, separators=(",", ": ") ) doc_data["heading"] = shorten_path(str(request.path)) # generate breadcrumbs data if "route" in doc_data: doc_data["endpoint_path"] = gen_path_info(doc_data["route"]) for i in range(len(doc_data["endpoint_path"]) - 1): doc_data["endpoint_path"][i]["path"] += "/doc/" if not doc_data["noargs"]: doc_data["endpoint_path"][-1]["path"] += "/doc/" response = render( request, "apidoc.html", doc_data, status=doc_data["status_code"] ) # otherwise simply return the raw data and let DRF picks # the correct renderer (JSON or YAML) else: response = Response( data, headers=headers, content_type=accepted_media_type, status=doc_data["status_code"], ) if getattr(request, "never_cache", False): add_never_cache_headers(response) return response def error_response( request: Request, exception: Exception, doc_data: Dict[str, Any] ) -> HttpResponse: """Private function to create a custom error response. Args: request: a DRF Request object error: the exception that caused the error doc_data: documentation data for HTML response """ error_code = 500 if isinstance(exception, BadInputExc): error_code = 400 elif isinstance(exception, NotFoundExc): error_code = 404 elif isinstance(exception, ForbiddenExc): error_code = 403 elif isinstance(exception, LargePayloadExc): error_code = 413 elif isinstance(exception, StorageDBError): error_code = 503 elif isinstance(exception, StorageAPIError): error_code = 503 elif isinstance(exception, APIException): error_code = exception.status_code error_opts = {"status": error_code} error_data = { "exception": exception.__class__.__name__, "reason": str(exception), } if getattr(request, "accepted_media_type", None) == "text/html": error_data["reason"] = escape(error_data["reason"]) if get_config()["debug"]: error_data["traceback"] = traceback.format_exc() logger.debug(error_data["traceback"]) return make_api_response(request, error_data, doc_data, options=error_opts) def error_response_handler( exc: Exception, context: Dict[str, Any] ) -> Optional[HttpResponse]: """Custom DRF exception handler used to generate API error responses.""" sentry_capture_exception(exc) doc_data = getattr(exc, "doc_data", {}) return error_response(context["request"], exc, doc_data) diff --git a/swh/web/api/apiurls.py b/swh/web/api/apiurls.py index 5d639fd2..eac41e28 100644 --- a/swh/web/api/apiurls.py +++ b/swh/web/api/apiurls.py @@ -1,125 +1,125 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import functools from typing import Dict, List, Optional from django.http.response import HttpResponseBase from rest_framework.decorators import api_view from swh.web.api import throttling from swh.web.api.apiresponse import make_api_response -from swh.web.common.urlsindex import UrlsIndex +from swh.web.utils.urlsindex import UrlsIndex class APIUrls(UrlsIndex): """ Class to manage API documentation URLs. - Indexes all routes documented using apidoc's decorators. - Tracks endpoint/request processing method relationships for use in generating related urls in API documentation """ _apidoc_routes: Dict[str, Dict[str, str]] = {} scope = "api" @classmethod def get_app_endpoints(cls) -> Dict[str, Dict[str, str]]: return cls._apidoc_routes @classmethod def add_doc_route( cls, route: str, docstring: str, noargs: bool = False, api_version: str = "1", **kwargs, ) -> None: """ Add a route to the self-documenting API reference """ route_name = route[1:-1].replace("/", "-") if not noargs: route_name = "%s-doc" % route_name route_view_name = "api-%s-%s" % (api_version, route_name) if route not in cls._apidoc_routes: d = { "docstring": docstring, "route": "/api/%s%s" % (api_version, route), "route_view_name": route_view_name, } for k, v in kwargs.items(): d[k] = v cls._apidoc_routes[route] = d def api_route( url_pattern: str, view_name: Optional[str] = None, methods: List[str] = ["GET", "HEAD", "OPTIONS"], throttle_scope: str = "swh_api", api_version: str = "1", checksum_args: Optional[List[str]] = None, never_cache: bool = False, ): """ Decorator to ease the registration of an API endpoint using the Django REST Framework. Args: url_pattern: the url pattern used by DRF to identify the API route view_name: the name of the API view associated to the route used to reverse the url methods: array of HTTP methods supported by the API route throttle_scope: Named scope for rate limiting api_version: web API version checksum_args: list of view argument names holding checksum values never_cache: define if api response must be cached """ url_pattern = "^api/" + api_version + url_pattern + "$" def decorator(f): # create a DRF view from the wrapped function @api_view(methods) @throttling.throttle_scope(throttle_scope) @functools.wraps(f) def api_view_f(request, **kwargs): # never_cache will be handled in apiresponse module request.never_cache = never_cache response = f(request, **kwargs) doc_data = None # check if response has been forwarded by api_doc decorator if isinstance(response, dict) and "doc_data" in response: doc_data = response["doc_data"] response = response["data"] # check if HTTP response needs to be created if not isinstance(response, HttpResponseBase): api_response = make_api_response( request, data=response, doc_data=doc_data ) else: api_response = response return api_response # small hacks for correctly generating API endpoints index doc api_view_f.__name__ = f.__name__ api_view_f.http_method_names = methods # register the route and its view in the endpoints index APIUrls.add_url_pattern(url_pattern, api_view_f, view_name) if checksum_args: APIUrls.add_redirect_for_checksum_args( view_name, [url_pattern], checksum_args ) return f return decorator diff --git a/swh/web/api/throttling.py b/swh/web/api/throttling.py index b054a070..ddabe2ea 100644 --- a/swh/web/api/throttling.py +++ b/swh/web/api/throttling.py @@ -1,223 +1,223 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from ipaddress import IPv4Network, IPv6Network, ip_address, ip_network from typing import Callable, List, TypeVar, Union from django.core.exceptions import ImproperlyConfigured import rest_framework from rest_framework.throttling import ScopedRateThrottle from swh.web.auth.utils import API_RAW_OBJECT_PERMISSION, API_SAVE_ORIGIN_PERMISSION -from swh.web.common.exc import sentry_capture_exception from swh.web.config import get_config +from swh.web.utils.exc import sentry_capture_exception APIView = TypeVar("APIView", bound="rest_framework.views.APIView") Request = rest_framework.request.Request API_THROTTLING_EXEMPTED_PERM = "swh.web.api.throttling_exempted" class SwhWebRateThrottle(ScopedRateThrottle): """Custom DRF request rate limiter for anonymous users Requests are grouped into scopes. It enables to apply different requests rate limiting based on the scope name but also the input HTTP request types. To associate a scope to requests, one must add a 'throttle_scope' attribute when using a class based view, or call the 'throttle_scope' decorator when using a function based view. By default, requests do not have an associated scope and are not rate limited. Rate limiting can also be configured according to the type of the input HTTP requests for fine grained tuning. For instance, the following YAML configuration section sets a rate of: - 1 per minute for POST requests - 60 per minute for other request types for the 'swh_api' scope while exempting those coming from the 127.0.0.0/8 ip network. .. code-block:: yaml throttling: scopes: swh_api: limiter_rate: default: 60/m POST: 1/m exempted_networks: - 127.0.0.0/8 """ scope = None def __init__(self): super().__init__() self.exempted_networks = None self.num_requests = 0 self.duration = 0 def get_cache_key(self, request, view): # do not handle throttling if user is authenticated if request.user.is_authenticated: return None else: return super().get_cache_key(request, view) def get_exempted_networks( self, scope_name: str ) -> List[Union[IPv4Network, IPv6Network]]: if not self.exempted_networks: scopes = get_config()["throttling"]["scopes"] scope = scopes.get(scope_name) if scope: networks = scope.get("exempted_networks") if networks: self.exempted_networks = [ ip_network(network) for network in networks ] return self.exempted_networks def get_scope(self, view: APIView): if not self.scope: # class based view case return getattr(view, self.scope_attr, None) else: # function based view case return self.scope def allow_request(self, request: Request, view: APIView) -> bool: # class based view case if not self.scope: default_scope = getattr(view, self.scope_attr, None) request_allowed = None if default_scope is not None: # check if there is a specific rate limiting associated # to the request type assert request.method is not None request_scope = f"{default_scope}_{request.method.lower()}" setattr(view, self.scope_attr, request_scope) try: request_allowed = super().allow_request(request, view) # use default rate limiting otherwise except ImproperlyConfigured as exc: sentry_capture_exception(exc) setattr(view, self.scope_attr, default_scope) if request_allowed is None: request_allowed = super().allow_request(request, view) # function based view case else: default_scope = self.scope # check if there is a specific rate limiting associated # to the request type self.scope = default_scope + "_" + request.method.lower() try: self.rate = self.get_rate() # use default rate limiting otherwise except ImproperlyConfigured: self.scope = default_scope self.rate = self.get_rate() self.num_requests, self.duration = self.parse_rate(self.rate) request_allowed = super(ScopedRateThrottle, self).allow_request( request, view ) self.scope = default_scope exempted_networks = self.get_exempted_networks(default_scope) exempted_ip = False if exempted_networks: remote_address = ip_address(self.get_ident(request)) exempted_ip = any( remote_address in network for network in exempted_networks ) request_allowed = exempted_ip or request_allowed # set throttling related data in the request metadata # in order for the ThrottlingHeadersMiddleware to # add X-RateLimit-* headers in the HTTP response if not exempted_ip and hasattr(self, "history"): hit_count = len(self.history) request.META["RateLimit-Limit"] = self.num_requests request.META["RateLimit-Remaining"] = self.num_requests - hit_count wait = self.wait() if wait is not None: request.META["RateLimit-Reset"] = int(self.now + wait) return request_allowed class SwhWebUserRateThrottle(SwhWebRateThrottle): """Custom DRF request rate limiter for authenticated users It has the same behavior than :class:`swh.web.api.throttling.SwhWebRateThrottle` except the number of allowed requests for each throttle scope is increased by a 1Ox factor. """ NUM_REQUESTS_FACTOR = 10 def get_cache_key(self, request, view): # do not handle throttling if user is not authenticated if request.user.is_authenticated: return super(SwhWebRateThrottle, self).get_cache_key(request, view) else: return None def parse_rate(self, rate): # increase number of allowed requests num_requests, duration = super().parse_rate(rate) return (num_requests * self.NUM_REQUESTS_FACTOR, duration) def allow_request(self, request: Request, view: APIView) -> bool: if request.user.is_staff or request.user.has_perm(API_THROTTLING_EXEMPTED_PERM): # no throttling for staff users or users with adequate permission return True scope = self.get_scope(view) if scope == "swh_save_origin" and request.user.has_perm( API_SAVE_ORIGIN_PERMISSION ): # no throttling on save origin endpoint for users with adequate permission return True if scope == "swh_raw_object" and request.user.has_perm( API_RAW_OBJECT_PERMISSION ): # no throttling on raw object endpoint for users with adequate permission return True return super().allow_request(request, view) def throttle_scope(scope: str) -> Callable[..., APIView]: """Decorator that allows the throttle scope of a DRF function based view to be set:: @api_view(['GET', ]) @throttle_scope('scope') def view(request): ... """ def decorator(func: APIView) -> APIView: SwhScopeRateThrottle = type( "SwhWebScopeRateThrottle", (SwhWebRateThrottle,), {"scope": scope} ) SwhScopeUserRateThrottle = type( "SwhWebScopeUserRateThrottle", (SwhWebUserRateThrottle,), {"scope": scope}, ) func.throttle_classes = (SwhScopeRateThrottle, SwhScopeUserRateThrottle) return func return decorator diff --git a/swh/web/api/utils.py b/swh/web/api/utils.py index fd783b53..9abb5e28 100644 --- a/swh/web/api/utils.py +++ b/swh/web/api/utils.py @@ -1,357 +1,357 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Any, Dict, List, Optional, Tuple, Union from django.http import HttpRequest from swh.model.model import Origin -from swh.web.common.query import parse_hash -from swh.web.common.typing import OriginInfo -from swh.web.common.utils import resolve_branch_alias, reverse +from swh.web.utils import resolve_branch_alias, reverse +from swh.web.utils.query import parse_hash +from swh.web.utils.typing import OriginInfo def filter_field_keys(data, field_keys): """Given an object instance (directory or list), and a csv field keys to filter on. Return the object instance with filtered keys. Note: Returns obj as is if it's an instance of types not in (dictionary, list) Args: - data: one object (dictionary, list...) to filter. - field_keys: csv or set of keys to filter the object on Returns: obj filtered on field_keys """ if isinstance(data, map): return map(lambda x: filter_field_keys(x, field_keys), data) if isinstance(data, list): return [filter_field_keys(x, field_keys) for x in data] if isinstance(data, dict): return {k: v for (k, v) in data.items() if k in field_keys} return data def person_to_string(person): """Map a person (person, committer, tagger, etc...) to a string.""" return "".join([person["name"], " <", person["email"], ">"]) def enrich_object( object: Dict[str, str], request: Optional[HttpRequest] = None ) -> Dict[str, str]: """Enrich an object (revision, release) with link to the 'target' of type 'target_type'. Args: object: An object with target and target_type keys (e.g. release, revision) request: Absolute URIs will be generated if provided Returns: Object enriched with target object url (revision, release, content, directory) """ if "target" in object and "target_type" in object: if object["target_type"] in ("revision", "release", "directory"): object["target_url"] = reverse( "api-1-%s" % object["target_type"], url_args={"sha1_git": object["target"]}, request=request, ) elif object["target_type"] == "content": object["target_url"] = reverse( "api-1-content", url_args={"q": "sha1_git:" + object["target"]}, request=request, ) elif object["target_type"] == "snapshot": object["target_url"] = reverse( "api-1-snapshot", url_args={"snapshot_id": object["target"]}, request=request, ) return object enrich_release = enrich_object def enrich_directory_entry( directory: Dict[str, str], request: Optional[HttpRequest] = None ) -> Dict[str, str]: """Enrich directory entry with url to target. Args: directory: dict of data associated to a swh directory entry request: Absolute URIs will be generated if provided Returns: An enriched directory dict filled with additional url """ if "type" in directory: target_type = directory["type"] target = directory["target"] if target_type == "file": directory["target_url"] = reverse( "api-1-content", url_args={"q": "sha1_git:%s" % target}, request=request ) elif target_type == "dir": directory["target_url"] = reverse( "api-1-directory", url_args={"sha1_git": target}, request=request ) else: directory["target_url"] = reverse( "api-1-revision", url_args={"sha1_git": target}, request=request ) return directory def enrich_metadata_endpoint( content_metadata: Dict[str, str], request: Optional[HttpRequest] = None ) -> Dict[str, str]: """Enrich content metadata dict with link to the upper metadata endpoint. Args: content_metadata: dict of data associated to a swh content metadata request: Absolute URIs will be generated if provided Returns: An enriched content metadata dict filled with an additional url """ c = content_metadata c["content_url"] = reverse( "api-1-content", url_args={"q": "sha1:%s" % c["id"]}, request=request ) return c def enrich_content( content: Dict[str, Any], request: Optional[HttpRequest] = None, top_url: Optional[bool] = False, query_string: Optional[str] = None, ) -> Dict[str, str]: """Enrich content with links to: - data_url: its raw data - filetype_url: its filetype information - language_url: its programming language information - license_url: its licensing information Args: content: dict of data associated to a swh content object top_url: whether or not to include the content url in the enriched data query_string: optional query string of type ':' used when requesting the content, it acts as a hint for picking the same hash method when computing the url listed above request: Absolute URIs will be generated if provided Returns: An enriched content dict filled with additional urls """ checksums = content if "checksums" in content: checksums = content["checksums"] hash_algo = "sha1" if query_string: hash_algo = parse_hash(query_string)[0] if hash_algo in checksums: q = "%s:%s" % (hash_algo, checksums[hash_algo]) if top_url: content["content_url"] = reverse("api-1-content", url_args={"q": q}) content["data_url"] = reverse( "api-1-content-raw", url_args={"q": q}, request=request ) content["filetype_url"] = reverse( "api-1-content-filetype", url_args={"q": q}, request=request ) content["language_url"] = reverse( "api-1-content-language", url_args={"q": q}, request=request ) content["license_url"] = reverse( "api-1-content-license", url_args={"q": q}, request=request ) return content def enrich_revision( revision: Dict[str, Any], request: Optional[HttpRequest] = None ) -> Dict[str, Any]: """Enrich revision with links where it makes sense (directory, parents). Keep track of the navigation breadcrumbs if they are specified. Args: revision: the revision as a dict request: Absolute URIs will be generated if provided Returns: An enriched revision dict filled with additional urls """ revision["url"] = reverse( "api-1-revision", url_args={"sha1_git": revision["id"]}, request=request ) revision["history_url"] = reverse( "api-1-revision-log", url_args={"sha1_git": revision["id"]}, request=request ) if "directory" in revision: revision["directory_url"] = reverse( "api-1-directory", url_args={"sha1_git": revision["directory"]}, request=request, ) if "parents" in revision: parents = [] for parent in revision["parents"]: parents.append( { "id": parent, "url": reverse( "api-1-revision", url_args={"sha1_git": parent}, request=request ), } ) revision["parents"] = tuple(parents) if "children" in revision: children = [] for child in revision["children"]: children.append( reverse("api-1-revision", url_args={"sha1_git": child}, request=request) ) revision["children_urls"] = children if "decoding_failures" in revision and "message" in revision["decoding_failures"]: revision["message_url"] = reverse( "api-1-revision-raw-message", url_args={"sha1_git": revision["id"]}, request=request, ) return revision def enrich_snapshot( snapshot: Dict[str, Any], request: Optional[HttpRequest] = None ) -> Dict[str, Any]: """Enrich snapshot with links to the branch targets Args: snapshot: the snapshot as a dict request: Absolute URIs will be generated if provided Returns: An enriched snapshot dict filled with additional urls """ if "branches" in snapshot: snapshot["branches"] = { k: enrich_object(v, request) if v else None for k, v in snapshot["branches"].items() } for k, v in snapshot["branches"].items(): if v and v["target_type"] == "alias": branch = resolve_branch_alias(snapshot, v) if branch: branch = enrich_object(branch, request) v["target_url"] = branch["target_url"] return snapshot def enrich_origin( origin: Union[Dict[str, Any], OriginInfo], request: Optional[HttpRequest] = None ) -> Dict[str, Any]: """Enrich origin dict with link to its visits Args: origin: the origin as a dict request: Absolute URIs will be generated if provided Returns: An enriched origin dict filled with additional urls """ origin_dict = dict(origin) if "url" in origin_dict: origin_dict["origin_visits_url"] = reverse( "api-1-origin-visits", url_args={"origin_url": origin_dict["url"]}, request=request, ) origin_dict["metadata_authorities_url"] = reverse( "api-1-raw-extrinsic-metadata-swhid-authorities", url_args={"target": Origin(url=origin_dict["url"]).swhid()}, request=request, ) return origin_dict def enrich_origin_search_result( origin_search_result: Tuple[List[Dict[str, Any]], Optional[str]], request: Optional[HttpRequest] = None, ) -> Tuple[List[Dict[str, Any]], Optional[str]]: """Enrich origin search result with additional links Args: origin_search_result: tuple returned when searching origins request: Absolute URIs will be generated if provided Returns: An enriched origin search result filled with additional urls """ origins, page_token = origin_search_result return [enrich_origin(origin, request=request) for origin in origins], page_token def enrich_origin_visit( origin_visit: Dict[str, Any], request: Optional[HttpRequest] = None, with_origin_link: bool = False, with_origin_visit_link: bool = False, ) -> Dict[str, Any]: """Enrich origin visit dict with additional links Args: origin_visit: the origin visit as a dict with_origin_link: whether to add link to origin with_origin_visit_link: whether to add link to origin visit request: Absolute URIs will be generated if provided Returns: An enriched origin visit dict filled with additional urls """ ov = origin_visit if with_origin_link: ov["origin_url"] = reverse( "api-1-origin", url_args={"origin_url": ov["origin"]}, request=request ) if with_origin_visit_link: ov["origin_visit_url"] = reverse( "api-1-origin-visit", url_args={"origin_url": ov["origin"], "visit_id": ov["visit"]}, request=request, ) snapshot = ov["snapshot"] if snapshot: ov["snapshot_url"] = reverse( "api-1-snapshot", url_args={"snapshot_id": snapshot}, request=request ) else: ov["snapshot_url"] = None return ov diff --git a/swh/web/api/views/content.py b/swh/web/api/views/content.py index 5536e883..a537ac11 100644 --- a/swh/web/api/views/content.py +++ b/swh/web/api/views/content.py @@ -1,342 +1,342 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import functools import io from typing import Optional from django.http import FileResponse from rest_framework.request import Request from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import archive -from swh.web.common.exc import NotFoundExc +from swh.web.utils import archive +from swh.web.utils.exc import NotFoundExc @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/filetype/", "api-1-content-filetype", checksum_args=["q"], ) @api_doc("/content/filetype/") @format_docstring() def api_content_filetype(request: Request, q: str): """ .. http:get:: /api/1/content/[(hash_type):](hash)/filetype/ Get information about the detected MIME type of a content object. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. :>json object content_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/` for getting information about the content :>json string encoding: the detected content encoding :>json string id: the **sha1** identifier of the content :>json string mimetype: the detected MIME type of the content :>json object tool: information about the tool used to detect the content filetype {common_headers} :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/filetype/` """ return api_lookup( archive.lookup_content_filetype, q, notfound_msg="No filetype information found for content {}.".format(q), enrich_fn=utils.enrich_metadata_endpoint, request=request, ) @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/language/", "api-1-content-language", checksum_args=["q"], ) @api_doc("/content/language/") @format_docstring() def api_content_language(request: Request, q: str): """ .. http:get:: /api/1/content/[(hash_type):](hash)/language/ Get information about the programming language used in a content object. Note: this endpoint currently returns no data. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. :>json object content_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/` for getting information about the content :>json string id: the **sha1** identifier of the content :>json string lang: the detected programming language if any :>json object tool: information about the tool used to detect the programming language {common_headers} :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/language/` """ return api_lookup( archive.lookup_content_language, q, notfound_msg="No language information found for content {}.".format(q), enrich_fn=utils.enrich_metadata_endpoint, request=request, ) @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/license/", "api-1-content-license", checksum_args=["q"], ) @api_doc("/content/license/") @format_docstring() def api_content_license(request: Request, q: str): """ .. http:get:: /api/1/content/[(hash_type):](hash)/license/ Get information about the license of a content object. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. :>json object content_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/` for getting information about the content :>json string id: the **sha1** identifier of the content :>json array licenses: array of strings containing the detected license names :>json object tool: information about the tool used to detect the license {common_headers} :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/license/` """ return api_lookup( archive.lookup_content_license, q, notfound_msg="No license information found for content {}.".format(q), enrich_fn=utils.enrich_metadata_endpoint, request=request, ) @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/raw/", "api-1-content-raw", checksum_args=["q"], ) @api_doc("/content/raw/") def api_content_raw(request: Request, q: str): """ .. http:get:: /api/1/content/[(hash_type):](hash)/raw/ Get the raw content of a content object (aka a "blob"), as a byte sequence. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. :query string filename: if provided, the downloaded content will get that filename :resheader Content-Type: application/octet-stream :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1:dc2830a9e72f23c1dfebef4413003221baa5fb62/raw/` """ content_raw = archive.lookup_content_raw(q) if not content_raw: raise NotFoundExc("Content %s is not found." % q) filename = request.query_params.get("filename") if not filename: filename = "content_%s_raw" % q.replace(":", "_") return FileResponse( io.BytesIO(content_raw["data"]), # not copied, as this is never modified filename=filename, content_type="application/octet-stream", as_attachment=True, ) @api_route(r"/content/known/search/", "api-1-content-known", methods=["POST"]) @api_route(r"/content/known/(?P(?!search).+)/", "api-1-content-known") @api_doc("/content/known/", tags=["hidden"]) @format_docstring() def api_check_content_known(request: Request, q: Optional[str] = None): """ .. http:get:: /api/1/content/known/(sha1)[,(sha1), ...,(sha1)]/ Check whether some content(s) (aka "blob(s)") is present in the archive based on its **sha1** checksum. :param string sha1: hexadecimal representation of the **sha1** checksum value for the content to check existence. Multiple values can be provided separated by ','. {common_headers} :>json array search_res: array holding the search result for each provided **sha1** :>json object search_stats: some statistics regarding the number of **sha1** provided and the percentage of those found in the archive :statuscode 200: no error :statuscode 400: an invalid **sha1** has been provided **Example:** .. parsed-literal:: :swh_web_api:`content/known/dc2830a9e72f23c1dfebef4413003221baa5fb62,0c3f19cb47ebfbe643fb19fa94c874d18fa62d12/` """ search_stats = {"nbfiles": 0, "pct": 0} search_res = None queries = [] # GET: Many hash separated values request if q: hashes = q.split(",") for v in hashes: queries.append({"filename": None, "sha1": v}) # POST: Many hash requests in post form submission elif request.method == "POST": data = request.data # Remove potential inputs with no associated value for k, v in data.items(): if v is not None: if k == "q" and len(v) > 0: queries.append({"filename": None, "sha1": v}) elif v != "": queries.append({"filename": k, "sha1": v}) if queries: lookup = archive.lookup_multiple_hashes(queries) result = [] nb_queries = len(queries) for el in lookup: res_d = {"sha1": el["sha1"], "found": el["found"]} if "filename" in el and el["filename"]: res_d["filename"] = el["filename"] result.append(res_d) search_res = result nbfound = len([x for x in lookup if x["found"]]) search_stats["nbfiles"] = nb_queries search_stats["pct"] = int((nbfound / nb_queries) * 100) return {"search_res": search_res, "search_stats": search_stats} @api_route( r"/content/(?P[0-9a-z_:]*[0-9a-f]+)/", "api-1-content", checksum_args=["q"] ) @api_doc("/content/") @format_docstring() def api_content_metadata(request: Request, q: str): """ .. http:get:: /api/1/content/[(hash_type):](hash)/ Get information about a content (aka a "blob") object. In the archive, a content object is identified based on checksum values computed using various hashing algorithms. :param string hash_type: optional parameter specifying which hashing algorithm has been used to compute the content checksum. It can be either ``sha1``, ``sha1_git``, ``sha256`` or ``blake2s256``. If that parameter is not provided, it is assumed that the hashing algorithm used is ``sha1``. :param string hash: hexadecimal representation of the checksum value computed with the specified hashing algorithm. {common_headers} :>json object checksums: object holding the computed checksum values for the requested content :>json string data_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/raw/` for downloading the content raw bytes :>json string filetype_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/filetype/` for getting information about the content MIME type :>json string language_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/language/` for getting information about the programming language used in the content :>json number length: length of the content in bytes :>json string license_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/license/` for getting information about the license of the content :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested content can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`content/sha1_git:fe95a46679d128ff167b7c55df5d02356c5a1ae1/` """ return api_lookup( archive.lookup_content, q, notfound_msg="Content with {} not found.".format(q), enrich_fn=functools.partial(utils.enrich_content, query_string=q), request=request, ) diff --git a/swh/web/api/views/directory.py b/swh/web/api/views/directory.py index f84f9a7f..3bafcf95 100644 --- a/swh/web/api/views/directory.py +++ b/swh/web/api/views/directory.py @@ -1,97 +1,97 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Optional from rest_framework.request import Request from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import archive +from swh.web.utils import archive @api_route( r"/directory/(?P[0-9a-f]+)/", "api-1-directory", checksum_args=["sha1_git"], ) @api_route( r"/directory/(?P[0-9a-f]+)/(?P.+)/", "api-1-directory", checksum_args=["sha1_git"], ) @api_doc("/directory/") @format_docstring() def api_directory(request: Request, sha1_git: str, path: Optional[str] = None): """ .. http:get:: /api/1/directory/(sha1_git)/[(path)/] Get information about directory objects. Directories are identified by **sha1** checksums, compatible with Git directory identifiers. See :func:`swh.model.git_objects.directory_git_object` in our data model module for details about how they are computed. When given only a directory identifier, this endpoint returns information about the directory itself, returning its content (usually a list of directory entries). When given a directory identifier and a path, this endpoint returns information about the directory entry pointed by the relative path, starting path resolution from the given directory. :param string sha1_git: hexadecimal representation of the directory **sha1_git** identifier :param string path: optional parameter to get information about the directory entry pointed by that relative path {common_headers} :>jsonarr object checksums: object holding the computed checksum values for a directory entry (only for file entries) :>jsonarr string dir_id: **sha1_git** identifier of the requested directory :>jsonarr number length: length of a directory entry in bytes (only for file entries) for getting information about the content MIME type :>jsonarr string name: the directory entry name :>jsonarr number perms: permissions for the directory entry :>jsonarr string target: **sha1_git** identifier of the directory entry :>jsonarr string target_url: link to :http:get:`/api/1/content/[(hash_type):](hash)/` or :http:get:`/api/1/directory/(sha1_git)/[(path)/]` depending on the directory entry type :>jsonarr string type: the type of the directory entry, can be either ``dir``, ``file`` or ``rev`` :statuscode 200: no error :statuscode 400: an invalid **hash_type** or **hash** has been provided :statuscode 404: requested directory can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`directory/977fc4b98c0e85816348cebd3b12026407c368b6/` """ if path: error_msg_path = ( "Entry with path %s relative to directory " "with sha1_git %s not found." ) % (path, sha1_git) return api_lookup( archive.lookup_directory_with_path, sha1_git, path, notfound_msg=error_msg_path, enrich_fn=utils.enrich_directory_entry, request=request, ) else: error_msg_nopath = "Directory with sha1_git %s not found." % sha1_git return api_lookup( archive.lookup_directory, sha1_git, notfound_msg=error_msg_nopath, enrich_fn=utils.enrich_directory_entry, request=request, ) diff --git a/swh/web/api/views/graph.py b/swh/web/api/views/graph.py index ac8427cf..216542f8 100644 --- a/swh/web/api/views/graph.py +++ b/swh/web/api/views/graph.py @@ -1,193 +1,193 @@ # Copyright (C) 2020-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from distutils.util import strtobool import json from typing import Dict, Iterator, Union from urllib.parse import unquote, urlparse, urlunparse import requests from django.http import QueryDict from django.http.response import StreamingHttpResponse from rest_framework.decorators import renderer_classes from rest_framework.renderers import JSONRenderer from rest_framework.request import Request from rest_framework.response import Response from swh.model.hashutil import hash_to_hex from swh.model.model import Sha1Git from swh.model.swhids import ExtendedObjectType, ExtendedSWHID from swh.web.api.apidoc import api_doc from swh.web.api.apiurls import api_route from swh.web.api.renderers import PlainTextRenderer -from swh.web.common import archive from swh.web.config import SWH_WEB_INTERNAL_SERVER_NAME, get_config +from swh.web.utils import archive API_GRAPH_PERM = "swh.web.api.graph" def _resolve_origin_swhid(swhid: str, origin_urls: Dict[Sha1Git, str]) -> str: """ Resolve origin url from its swhid sha1 representation. """ parsed_swhid = ExtendedSWHID.from_string(swhid) if parsed_swhid.object_type == ExtendedObjectType.ORIGIN: if parsed_swhid.object_id in origin_urls: return origin_urls[parsed_swhid.object_id] else: origin_info = list( archive.lookup_origins_by_sha1s([hash_to_hex(parsed_swhid.object_id)]) )[0] assert origin_info is not None origin_urls[parsed_swhid.object_id] = origin_info["url"] return origin_info["url"] else: return swhid def _resolve_origin_swhids_in_graph_response( response: requests.Response, ) -> Iterator[bytes]: """ Resolve origin urls from their swhid sha1 representations in graph service responses. """ content_type = response.headers["Content-Type"] origin_urls: Dict[Sha1Git, str] = {} if content_type == "application/x-ndjson": for line in response.iter_lines(): swhids = json.loads(line.decode("utf-8")) processed_line = [] for swhid in swhids: processed_line.append(_resolve_origin_swhid(swhid, origin_urls)) yield (json.dumps(processed_line) + "\n").encode() elif content_type == "text/plain": for line in response.iter_lines(): processed_line = [] swhids = line.decode("utf-8").split(" ") for swhid in swhids: processed_line.append(_resolve_origin_swhid(swhid, origin_urls)) yield (" ".join(processed_line) + "\n").encode() else: for line in response.iter_lines(): yield line + b"\n" @api_route(r"/graph/", "api-1-graph-doc") @api_doc("/graph/") def api_graph(request: Request) -> None: """ .. http:get:: /api/1/graph/(graph_query)/ Provide fast access to the graph representation of the Software Heritage archive. That endpoint acts as a proxy for the `Software Heritage Graph service `_. It provides fast access to the `graph representation `_ of the Software Heritage archive. For more details please refer to the `Graph RPC API documentation `_. .. warning:: That endpoint is not publicly available and requires authentication and special user permission in order to be able to request it. :param string graph_query: query to forward to the Software Heritage Graph archive (see its `documentation `_) :query boolean resolve_origins: extra parameter defined by that proxy enabling to resolve origin urls from their sha1 representations :statuscode 200: no error :statuscode 400: an invalid graph query has been provided :statuscode 404: provided graph node cannot be found **Examples:** .. parsed-literal:: :swh_web_api:`graph/leaves/swh:1:dir:432d1b21c1256f7408a07c577b6974bbdbcc1323/` :swh_web_api:`graph/neighbors/swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35/` :swh_web_api:`graph/randomwalk/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2/ori?direction=backward` :swh_web_api:`graph/randomwalk/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2/ori?direction=backward&limit=-2` :swh_web_api:`graph/visit/nodes/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc?direction=backward&resolve_origins=true` :swh_web_api:`graph/visit/edges/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc?direction=backward&resolve_origins=true` :swh_web_api:`graph/visit/paths/swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb?direction=backward&resolve_origins=true` """ return None @api_route(r"/graph/(?P.+)/", "api-1-graph") @renderer_classes([JSONRenderer, PlainTextRenderer]) def api_graph_proxy( request: Request, graph_query: str ) -> Union[Response, StreamingHttpResponse]: if request.get_host() != SWH_WEB_INTERNAL_SERVER_NAME: if not bool(request.user and request.user.is_authenticated): return Response("Authentication credentials were not provided.", status=401) if not request.user.has_perm(API_GRAPH_PERM): return Response( "You do not have permission to perform this action.", status=403 ) graph_config = get_config()["graph"] graph_query = unquote(graph_query) graph_query_url = graph_config["server_url"] graph_query_url += graph_query parsed_url = urlparse(graph_query_url) query_dict = QueryDict(parsed_url.query, mutable=True) query_dict.update(request.GET) # clamp max_edges query parameter according to authentication if request.user.is_staff: max_edges = graph_config["max_edges"]["staff"] elif request.user.is_authenticated: max_edges = graph_config["max_edges"]["user"] else: max_edges = graph_config["max_edges"]["anonymous"] query_dict["max_edges"] = min( max_edges, int(query_dict.get("max_edges", max_edges + 1)) ) if query_dict: graph_query_url = urlunparse( parsed_url._replace(query=query_dict.urlencode(safe="/;:")) ) response = requests.get(graph_query_url, stream=True) if response.status_code != 200: return Response( response.content, status=response.status_code, content_type=response.headers["Content-Type"], ) # graph stats and counter endpoint responses are not streamed if response.headers.get("Transfer-Encoding") != "chunked": return Response( response.json(), status=response.status_code, content_type=response.headers["Content-Type"], ) # other endpoint responses are streamed else: resolve_origins = strtobool(request.GET.get("resolve_origins", "false")) if response.status_code == 200 and resolve_origins: response_stream = _resolve_origin_swhids_in_graph_response(response) else: response_stream = map(lambda line: line + b"\n", response.iter_lines()) return StreamingHttpResponse( response_stream, status=response.status_code, content_type=response.headers["Content-Type"], ) diff --git a/swh/web/api/views/identifiers.py b/swh/web/api/views/identifiers.py index f50ed721..021d1614 100644 --- a/swh/web/api/views/identifiers.py +++ b/swh/web/api/views/identifiers.py @@ -1,126 +1,126 @@ # Copyright (C) 2018-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Dict, Set from rest_framework.request import Request from swh.model.hashutil import hash_to_bytes, hash_to_hex from swh.model.swhids import ObjectType from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route -from swh.web.common import archive -from swh.web.common.exc import LargePayloadExc -from swh.web.common.identifiers import get_swhid, group_swhids, resolve_swhid +from swh.web.utils import archive +from swh.web.utils.exc import LargePayloadExc +from swh.web.utils.identifiers import get_swhid, group_swhids, resolve_swhid @api_route(r"/resolve/(?P.+)/", "api-1-resolve-swhid") @api_doc("/resolve/") @format_docstring() def api_resolve_swhid(request: Request, swhid: str): """ .. http:get:: /api/1/resolve/(swhid)/ Resolve a SoftWare Heritage persistent IDentifier (SWHID) Try to resolve a provided `SoftWare Heritage persistent IDentifier `_ into an url for browsing the pointed archive object. If the provided identifier is valid, the existence of the object in the archive will also be checked. :param string swhid: a SoftWare Heritage persistent IDentifier :>json string browse_url: the url for browsing the pointed object :>json object metadata: object holding optional parts of the SWHID :>json string namespace: the SWHID namespace :>json string object_id: the hash identifier of the pointed object :>json string object_type: the type of the pointed object :>json number scheme_version: the scheme version of the SWHID {common_headers} :statuscode 200: no error :statuscode 400: an invalid SWHID has been provided :statuscode 404: the pointed object does not exist in the archive **Example:** .. parsed-literal:: :swh_web_api:`resolve/swh:1:rev:96db9023b881d7cd9f379b0c154650d6c108e9a3;origin=https://github.com/openssl/openssl/` """ # try to resolve the provided swhid swhid_resolved = resolve_swhid(swhid) # id is well-formed, now check that the pointed # object is present in the archive, NotFoundExc # will be raised otherwise swhid_parsed = swhid_resolved["swhid_parsed"] object_type = swhid_parsed.object_type object_id = hash_to_hex(swhid_parsed.object_id) archive.lookup_object(swhid_parsed.object_type, object_id) # id is well-formed and the pointed object exists return { "namespace": swhid_parsed.namespace, "scheme_version": swhid_parsed.scheme_version, "object_type": object_type.name.lower(), "object_id": object_id, "metadata": swhid_parsed.qualifiers(), "browse_url": request.build_absolute_uri(swhid_resolved["browse_url"]), } @api_route(r"/known/", "api-1-known", methods=["POST"]) @api_doc("/known/") @format_docstring() def api_swhid_known(request: Request): """ .. http:post:: /api/1/known/ Check if a list of objects are present in the Software Heritage archive. The objects to check existence must be provided using `SoftWare Heritage persistent IDentifiers `_. :json object : an object whose keys are input SWHIDs and values objects with the following keys: * **known (bool)**: whether the object was found {common_headers} :statuscode 200: no error :statuscode 400: an invalid SWHID was provided :statuscode 413: the input array of SWHIDs is too large """ limit = 1000 if len(request.data) > limit: raise LargePayloadExc( "The maximum number of SWHIDs this endpoint can receive is %s" % limit ) swhids = [get_swhid(swhid) for swhid in request.data] response = {str(swhid): {"known": False} for swhid in swhids} # group swhids by their type swhids_by_type = group_swhids(swhids) # search for hashes not present in the storage missing_hashes: Dict[ObjectType, Set[bytes]] = { k: set(map(hash_to_bytes, archive.lookup_missing_hashes({k: v}))) for k, v in swhids_by_type.items() } for swhid in swhids: if swhid.object_id not in missing_hashes[swhid.object_type]: response[str(swhid)]["known"] = True return response diff --git a/swh/web/api/views/metadata.py b/swh/web/api/views/metadata.py index c987c654..edabb6a2 100644 --- a/swh/web/api/views/metadata.py +++ b/swh/web/api/views/metadata.py @@ -1,325 +1,324 @@ # Copyright (C) 2021-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import base64 import re from typing import Dict, Optional import iso8601 from django.http import HttpResponse from django.shortcuts import redirect from rest_framework.request import Request from swh.model import hashutil, swhids from swh.model.model import MetadataAuthority, MetadataAuthorityType, Origin from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route -from swh.web.common import archive, converters -from swh.web.common.exc import BadInputExc, NotFoundExc -from swh.web.common.utils import SWHID_RE, reverse +from swh.web.utils import SWHID_RE, archive, converters, reverse +from swh.web.utils.exc import BadInputExc, NotFoundExc @api_route( f"/raw-extrinsic-metadata/swhid/(?P{SWHID_RE})/", "api-1-raw-extrinsic-metadata-swhid", ) @api_doc("/raw-extrinsic-metadata/swhid/") @format_docstring() def api_raw_extrinsic_metadata_swhid(request: Request, target: str): """ .. http:get:: /api/1/raw-extrinsic-metadata/swhid/(target) Returns raw `extrinsic metadata `__ collected on a given object. :param string target: The core SWHID of the object whose metadata should be returned :query string authority: A metadata authority identifier, formatted as `` ``. Required. :query string after: ISO8601 representation of the minimum timestamp of metadata to fetch. Defaults to allowing all metadata. :query int limit: Maximum number of metadata objects to return. {common_headers} :>jsonarr string target: SWHID of the object described by this metadata (absent when ``target`` is not a core SWHID (ie. it does not have type ``cnt``/``dir``/``rev``/``rel``/``snp``) :>jsonarr string discovery_date: ISO8601/RFC3339 timestamp of the moment this metadata was collected. :>jsonarr object authority: authority this metadata is coming from :>jsonarr object fetcher: tool used to fetch the metadata :>jsonarr string format: short identifier of the format of the metadata :>jsonarr string metadata_url: link to download the metadata "blob" itself :>jsonarr string origin: URL of the origin in which context's the metadata is valid, if any :>jsonarr int visit: identifier of the visit in which context's the metadata is valid, if any :>jsonarr string snapshot: SWHID of the snapshot in which context's the metadata is valid, if any :>jsonarr string release: SWHID of the release in which context's the metadata is valid, if any :>jsonarr string revision: SWHID of the revision in which context's the metadata is valid, if any :>jsonarr string path: SWHID of the path in which context's is valid if any, relative to a release or revision as anchor :>jsonarr string directory: SWHID of the directory in which context's the metadata is valid, if any :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`raw-extrinsic-metadata/swhid/swh:1:dir:a2faa28028657859c16ff506924212b33f0e1307/?authority=forge%20https://pypi.org/` """ # noqa authority_str: Optional[str] = request.query_params.get("authority") after_str: Optional[str] = request.query_params.get("after") limit_str: str = request.query_params.get("limit", "100") page_token_str: Optional[str] = request.query_params.get("page_token") if authority_str is None: raise BadInputExc("The 'authority' query parameter is required.") if " " not in authority_str.strip(): raise BadInputExc("The 'authority' query parameter should contain a space.") (authority_type_str, authority_url) = authority_str.split(" ", 1) try: authority_type = MetadataAuthorityType(authority_type_str) except ValueError: raise BadInputExc( f"Invalid 'authority' type, should be one of: " f"{', '.join(member.value for member in MetadataAuthorityType)}" ) authority = MetadataAuthority(authority_type, authority_url) if after_str: try: after = iso8601.parse_date(after_str) except iso8601.ParseError: raise BadInputExc("Invalid format for 'after' parameter.") from None else: after = None try: limit = int(limit_str) except ValueError: raise BadInputExc("'limit' parameter must be an integer.") from None limit = min(limit, 10000) try: parsed_target = swhids.ExtendedSWHID.from_string(target) except swhids.ValidationError as e: raise BadInputExc(f"Invalid target SWHID: {e}") from None try: swhids.CoreSWHID.from_string(target) except swhids.ValidationError: # Can be parsed as an extended SWHID, but not as a core SWHID extended_swhid = True else: extended_swhid = False if page_token_str is not None: page_token = base64.urlsafe_b64decode(page_token_str) else: page_token = None result_page = archive.storage.raw_extrinsic_metadata_get( target=parsed_target, authority=authority, after=after, page_token=page_token, limit=limit, ) filename = None if parsed_target.object_type == swhids.ExtendedObjectType.ORIGIN: origin_sha1 = hashutil.hash_to_hex(parsed_target.object_id) (origin_info,) = list(archive.lookup_origins_by_sha1s([origin_sha1])) if origin_info is not None: filename = re.sub("[:/_.]+", "_", origin_info["url"]) + "_metadata" if filename is None: filename = f"{target}_metadata" results = [] for metadata in result_page.results: result = converters.from_raw_extrinsic_metadata(metadata) if extended_swhid: # Keep extended SWHIDs away from the public API as much as possible. # (It is part of the URL, but not documented, and only accessed via # the link in the response of api-1-origin) del result["target"] # We can't reliably send metadata directly, because it is a bytestring, # and we have to return JSON documents. result["metadata_url"] = reverse( "api-1-raw-extrinsic-metadata-get", url_args={"id": hashutil.hash_to_hex(metadata.id)}, query_params={"filename": filename}, request=request, ) results.append(result) headers: Dict[str, str] = {} if result_page.next_page_token is not None: headers["link-next"] = reverse( "api-1-raw-extrinsic-metadata-swhid", url_args={"target": target}, query_params=dict( authority=authority_str, after=after_str, limit=limit_str, page_token=base64.urlsafe_b64encode( result_page.next_page_token.encode() ).decode(), ), request=request, ) return { "results": results, "headers": headers, } @api_route( "/raw-extrinsic-metadata/get/(?P[0-9a-z]+)/", "api-1-raw-extrinsic-metadata-get", ) def api_raw_extrinsic_metadata_get(request: Request, id: str): # This is an internal endpoint that should only be accessed via URLs given # by /raw-extrinsic-metadata/swhid/; so it is not documented. metadata = archive.storage.raw_extrinsic_metadata_get_by_ids( [hashutil.hash_to_bytes(id)] ) if not metadata: raise NotFoundExc( "Metadata not found. Use /raw-extrinsic-metadata/swhid/ to access metadata." ) response = HttpResponse( metadata[0].metadata, content_type="application/octet-stream" ) filename = request.query_params.get("filename") if filename and re.match("[a-zA-Z0-9:._-]+", filename): response["Content-disposition"] = f'attachment; filename="{filename}"' else: # It should always be not-None and match the regexp if the URL was created by # /raw-extrinsic-metadata/swhid/, but we're better safe than sorry. response["Content-disposition"] = "attachment" return response @api_route( f"/raw-extrinsic-metadata/swhid/(?P{SWHID_RE})/authorities/", "api-1-raw-extrinsic-metadata-swhid-authorities", ) @api_doc("/raw-extrinsic-metadata/swhid/authorities/") @format_docstring() def api_raw_extrinsic_metadata_swhid_authorities(request: Request, target: str): """ .. http:get:: /api/1/raw-extrinsic-metadata/swhid/(target)/authorities/ Returns a list of metadata authorities that provided metadata on the given target. They can then be used to get the raw `extrinsic metadata `__ collected on that object from each of the authorities. This endpoint should only be used directly to retrieve metadata from core SWHIDs (with type ``cnt``, ``dir``, ``rev``, ``rel``, and ``snp``). For "extended" SWHIDs such as origins, :http:get:`/api/1/raw-extrinsic-metadata/origin/(origin_url)/authorities/` should be used instead of building this URL directly. :param string target: The core SWHID of the object whose metadata-providing authorities should be returned {common_headers} :>jsonarr string type: Type of authority (deposit_client, forge, registry) :>jsonarr string url: Unique IRI identifying the authority :>jsonarr object metadata_list_url: URL to get the list of metadata objects on the given object from this authority :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`raw-extrinsic-metadata/swhid/swh:1:dir:a2faa28028657859c16ff506924212b33f0e1307/authorities/` """ # noqa try: parsed_target = swhids.ExtendedSWHID.from_string(target) except swhids.ValidationError as e: raise BadInputExc(f"Invalid target SWHID: {e}") from None authorities = archive.storage.raw_extrinsic_metadata_get_authorities( target=parsed_target ) results = [ { **authority.to_dict(), "metadata_list_url": reverse( "api-1-raw-extrinsic-metadata-swhid", url_args={"target": target}, query_params={"authority": f"{authority.type.value} {authority.url}"}, request=request, ), } for authority in authorities ] return { "results": results, "headers": {}, } @api_route( "/raw-extrinsic-metadata/origin/(?P.*)/authorities/", "api-1-raw-extrinsic-metadata-origin-authorities", ) @api_doc("/raw-extrinsic-metadata/origin/authorities/") @format_docstring() def api_raw_extrinsic_metadata_origin_authorities(request: Request, origin_url: str): """ .. http:get:: /api/1/raw-extrinsic-metadata/origin/(origin_url)/authorities/ Similar to :http:get:`/api/1/raw-extrinsic-metadata/swhid/(target)/authorities/` but to get metadata on origins instead of objects :param string origin_url: The URL of the origin whose metadata-providing authorities should be returned {common_headers} :>jsonarr string type: Type of authority (deposit_client, forge, registry) :>jsonarr string url: Unique IRI identifying the authority :>jsonarr object metadata_list_url: URL to get the list of metadata objects on the given object from this authority :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`raw-extrinsic-metadata/origin/https://github.com/rdicosmo/parmap/authorities/` """ # noqa url = reverse( "api-1-raw-extrinsic-metadata-swhid-authorities", url_args={"target": Origin(url=origin_url).swhid()}, ) return redirect(url) diff --git a/swh/web/api/views/origin.py b/swh/web/api/views/origin.py index 307cbf0a..6cbf6088 100644 --- a/swh/web/api/views/origin.py +++ b/swh/web/api/views/origin.py @@ -1,510 +1,509 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from distutils.util import strtobool from functools import partial from typing import Dict from rest_framework.request import Request from swh.search.exc import SearchQuerySyntaxError from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.utils import ( enrich_origin, enrich_origin_search_result, enrich_origin_visit, ) from swh.web.api.views.utils import api_lookup -from swh.web.common import archive -from swh.web.common.exc import BadInputExc -from swh.web.common.origin_visits import get_origin_visits -from swh.web.common.typing import OriginInfo -from swh.web.common.utils import origin_visit_types, reverse +from swh.web.utils import archive, origin_visit_types, reverse +from swh.web.utils.exc import BadInputExc +from swh.web.utils.origin_visits import get_origin_visits +from swh.web.utils.typing import OriginInfo DOC_RETURN_ORIGIN = """ :>json string origin_visits_url: link to in order to get information about the visits for that origin :>json string url: the origin canonical url """ DOC_RETURN_ORIGIN_ARRAY = DOC_RETURN_ORIGIN.replace(":>json", ":>jsonarr") DOC_RETURN_ORIGIN_VISIT = """ :>json string date: ISO8601/RFC3339 representation of the visit date (in UTC) :>json str origin: the origin canonical url :>json string origin_url: link to get information about the origin :>jsonarr string snapshot: the snapshot identifier of the visit (may be null if status is not **full**). :>jsonarr string snapshot_url: link to :http:get:`/api/1/snapshot/(snapshot_id)/` in order to get information about the snapshot of the visit (may be null if status is not **full**). :>json string status: status of the visit (either **full**, **partial** or **ongoing**) :>json number visit: the unique identifier of the visit :>json string metadata_authorities_url: link to :http:get:`/api/1/raw-extrinsic-metadata/swhid/(target)/authorities/` to get the list of metadata authorities providing extrinsic metadata on this origin (and, indirectly, to the origin's extrinsic metadata itself) """ DOC_RETURN_ORIGIN_VISIT_ARRAY = DOC_RETURN_ORIGIN_VISIT.replace(":>json", ":>jsonarr") DOC_RETURN_ORIGIN_VISIT_ARRAY += """ :>jsonarr number id: the unique identifier of the origin :>jsonarr string origin_visit_url: link to :http:get:`/api/1/origin/(origin_url)/visit/(visit_id)/` in order to get information about the visit """ @api_route(r"/origins/", "api-1-origins") @api_doc("/origins/", noargs=True) @format_docstring(return_origin_array=DOC_RETURN_ORIGIN_ARRAY) def api_origins(request: Request): """ .. http:get:: /api/1/origins/ Get list of archived software origins. .. warning:: This endpoint used to provide an ``origin_from`` query parameter, and guarantee an order on results. This is no longer true, and only the Link header should be used for paginating through results. :query int origin_count: The maximum number of origins to return (default to 100, can not exceed 10000) {return_origin_array} {common_headers} {resheader_link} :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`origins?origin_count=500` """ old_param_origin_from = request.query_params.get("origin_from") if old_param_origin_from: raise BadInputExc("Please use the Link header to browse through result") page_token = request.query_params.get("page_token", None) limit = min(int(request.query_params.get("origin_count", "100")), 10000) page_result = archive.lookup_origins(page_token, limit) origins = [enrich_origin(o, request=request) for o in page_result.results] next_page_token = page_result.next_page_token headers: Dict[str, str] = {} if next_page_token is not None: headers["link-next"] = reverse( "api-1-origins", query_params={"page_token": next_page_token, "origin_count": str(limit)}, request=request, ) return {"results": origins, "headers": headers} @api_route(r"/origin/(?P.+)/get/", "api-1-origin") @api_doc("/origin/") @format_docstring(return_origin=DOC_RETURN_ORIGIN) def api_origin(request: Request, origin_url: str): """ .. http:get:: /api/1/origin/(origin_url)/get/ Get information about a software origin. :param string origin_url: the origin url {return_origin} {common_headers} :statuscode 200: no error :statuscode 404: requested origin can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/python/cpython/get/` """ ori_dict = {"url": origin_url} error_msg = "Origin with url %s not found." % ori_dict["url"] return api_lookup( archive.lookup_origin, ori_dict, lookup_similar_urls=False, notfound_msg=error_msg, enrich_fn=enrich_origin, request=request, ) def _visit_types() -> str: docstring = "" # available visit types are queried using swh-search so we do it in a try # block in case of failure (for instance in docker environment when # elasticsearch service is not available) try: visit_types = [f"**{visit_type}**" for visit_type in origin_visit_types()] docstring = ", ".join(visit_types[:-1]) + f", and {visit_types[-1]}" except Exception: docstring = "???" pass return docstring @api_route( r"/origin/search/(?P.*)/", "api-1-origin-search", throttle_scope="swh_api_origin_search", ) @api_doc("/origin/search/") @format_docstring( return_origin_array=DOC_RETURN_ORIGIN_ARRAY, visit_types=_visit_types() ) def api_origin_search(request: Request, url_pattern: str): """ .. http:get:: /api/1/origin/search/(url_pattern)/ Search for software origins whose urls contain a provided string pattern or match a provided regular expression. The search is performed in a case insensitive way. .. warning:: This endpoint used to provide an ``offset`` query parameter, and guarantee an order on results. This is no longer true, and only the Link header should be used for paginating through results. :param string url_pattern: a string pattern :query boolean use_ql: whether to use swh search query language or not :query int limit: the maximum number of found origins to return (bounded to 1000) :query boolean with_visit: if true, only return origins with at least one visit by Software heritage :query string visit_type: if provided, only return origins with that specific visit type (currently the supported types are {visit_types}) {return_origin_array} {common_headers} {resheader_link} :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`origin/search/python/?limit=2` """ result = {} limit = min(int(request.query_params.get("limit", "70")), 1000) page_token = request.query_params.get("page_token") use_ql = request.query_params.get("use_ql", "false") with_visit = request.query_params.get("with_visit", "false") visit_type = request.query_params.get("visit_type") try: (results, page_token) = api_lookup( archive.search_origin, url_pattern, bool(strtobool(use_ql)), limit, bool(strtobool(with_visit)), [visit_type] if visit_type else None, page_token, enrich_fn=enrich_origin_search_result, request=request, ) except SearchQuerySyntaxError as e: raise BadInputExc(f"Syntax error in search query: {e.args[0]}") if page_token is not None: query_params = {k: v for (k, v) in request.GET.dict().items()} query_params["page_token"] = page_token result["headers"] = { "link-next": reverse( "api-1-origin-search", url_args={"url_pattern": url_pattern}, query_params=query_params, request=request, ) } result.update({"results": results}) return result @api_route(r"/origin/metadata-search/", "api-1-origin-metadata-search") @api_doc("/origin/metadata-search/", noargs=True) @format_docstring(return_origin_array=DOC_RETURN_ORIGIN_ARRAY) def api_origin_metadata_search(request: Request): """ .. http:get:: /api/1/origin/metadata-search/ Search for software origins whose metadata (expressed as a JSON-LD/CodeMeta dictionary) match the provided criteria. For now, only full-text search on this dictionary is supported. :query str fulltext: a string that will be matched against origin metadata; results are ranked and ordered starting with the best ones. :query int limit: the maximum number of found origins to return (bounded to 100) {return_origin_array} {common_headers} :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`origin/metadata-search/?limit=2&fulltext=Jane%20Doe` """ fulltext = request.query_params.get("fulltext", None) limit = min(int(request.query_params.get("limit", "70")), 100) if not fulltext: content = '"fulltext" must be provided and non-empty.' raise BadInputExc(content) results = api_lookup( archive.search_origin_metadata, fulltext, limit, request=request ) return { "results": results, } @api_route(r"/origin/(?P.+)/visits/", "api-1-origin-visits") @api_doc("/origin/visits/") @format_docstring(return_origin_visit_array=DOC_RETURN_ORIGIN_VISIT_ARRAY) def api_origin_visits(request: Request, origin_url: str): """ .. http:get:: /api/1/origin/(origin_url)/visits/ Get information about all visits of a software origin. Visits are returned sorted in descending order according to their date. :param str origin_url: a software origin URL :query int per_page: specify the number of visits to list, for pagination purposes :query int last_visit: visit to start listing from, for pagination purposes {common_headers} {resheader_link} {return_origin_visit_array} :statuscode 200: no error :statuscode 404: requested origin can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/hylang/hy/visits/` """ result = {} origin_query = OriginInfo(url=origin_url) notfound_msg = "No origin {} found".format(origin_url) url_args_next = {"origin_url": origin_url} per_page = int(request.query_params.get("per_page", "10")) last_visit_str = request.query_params.get("last_visit") last_visit = int(last_visit_str) if last_visit_str else None def _lookup_origin_visits(origin_query, last_visit=last_visit, per_page=per_page): all_visits = get_origin_visits(origin_query, lookup_similar_urls=False) all_visits.reverse() visits = [] if not last_visit: visits = all_visits[:per_page] else: for i, v in enumerate(all_visits): if v["visit"] == last_visit: visits = all_visits[i + 1 : i + 1 + per_page] break for v in visits: yield v results = api_lookup( _lookup_origin_visits, origin_query, notfound_msg=notfound_msg, enrich_fn=partial( enrich_origin_visit, with_origin_link=False, with_origin_visit_link=True ), request=request, ) if results: nb_results = len(results) if nb_results == per_page: new_last_visit = results[-1]["visit"] query_params = {} query_params["last_visit"] = new_last_visit if request.query_params.get("per_page"): query_params["per_page"] = per_page result["headers"] = { "link-next": reverse( "api-1-origin-visits", url_args=url_args_next, query_params=query_params, request=request, ) } result.update({"results": results}) return result @api_route( r"/origin/(?P.+)/visit/latest/", "api-1-origin-visit-latest", throttle_scope="swh_api_origin_visit_latest", ) @api_doc("/origin/visit/latest/") @format_docstring(return_origin_visit=DOC_RETURN_ORIGIN_VISIT) def api_origin_visit_latest(request: Request, origin_url: str): """ .. http:get:: /api/1/origin/(origin_url)/visit/latest/ Get information about the latest visit of a software origin. :param str origin_url: a software origin URL :query boolean require_snapshot: if true, only return a visit with a snapshot {common_headers} {return_origin_visit} :statuscode 200: no error :statuscode 404: requested origin or visit can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/hylang/hy/visit/latest/` """ require_snapshot = request.query_params.get("require_snapshot", "false") return api_lookup( archive.lookup_origin_visit_latest, origin_url, bool(strtobool(require_snapshot)), lookup_similar_urls=False, notfound_msg=("No visit for origin {} found".format(origin_url)), enrich_fn=partial( enrich_origin_visit, with_origin_link=True, with_origin_visit_link=False ), request=request, ) @api_route( r"/origin/(?P.+)/visit/(?P[0-9]+)/", "api-1-origin-visit" ) @api_doc("/origin/visit/") @format_docstring(return_origin_visit=DOC_RETURN_ORIGIN_VISIT) def api_origin_visit(request: Request, visit_id: str, origin_url: str): """ .. http:get:: /api/1/origin/(origin_url)/visit/(visit_id)/ Get information about a specific visit of a software origin. :param str origin_url: a software origin URL :param int visit_id: a visit identifier {common_headers} {return_origin_visit} :statuscode 200: no error :statuscode 404: requested origin or visit can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/hylang/hy/visit/1/` """ return api_lookup( archive.lookup_origin_visit, origin_url, int(visit_id), lookup_similar_urls=False, notfound_msg=("No visit {} for origin {} found".format(visit_id, origin_url)), enrich_fn=partial( enrich_origin_visit, with_origin_link=True, with_origin_visit_link=False ), request=request, ) @api_route( r"/origin/(?P.+)/intrinsic-metadata/", "api-origin-intrinsic-metadata" ) @api_doc("/origin/intrinsic-metadata/") @format_docstring() def api_origin_intrinsic_metadata(request: Request, origin_url: str): """ .. http:get:: /api/1/origin/(origin_url)/intrinsic-metadata Get intrinsic metadata of a software origin (as a JSON-LD/CodeMeta dictionary). :param string origin_url: the origin url :>json string ???: intrinsic metadata field of the origin {common_headers} :statuscode 200: no error :statuscode 404: requested origin can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`origin/https://github.com/python/cpython/intrinsic-metadata` """ return api_lookup( archive.lookup_origin_intrinsic_metadata, origin_url, notfound_msg=f"Origin with url {origin_url} not found", enrich_fn=enrich_origin, request=request, ) diff --git a/swh/web/api/views/raw.py b/swh/web/api/views/raw.py index 05697221..e3f72019 100644 --- a/swh/web/api/views/raw.py +++ b/swh/web/api/views/raw.py @@ -1,119 +1,118 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.http import HttpResponse from rest_framework.exceptions import PermissionDenied from rest_framework.request import Request from swh.model import model from swh.model.git_objects import ( content_git_object, directory_git_object, release_git_object, revision_git_object, snapshot_git_object, ) from swh.model.hashutil import hash_to_hex from swh.model.swhids import CoreSWHID, ObjectType from swh.storage.algos.directory import directory_get from swh.storage.algos.snapshot import snapshot_get_all_branches from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.auth.utils import API_RAW_OBJECT_PERMISSION -from swh.web.common import archive -from swh.web.common.exc import NotFoundExc -from swh.web.common.utils import SWHID_RE +from swh.web.utils import SWHID_RE, archive +from swh.web.utils.exc import NotFoundExc @api_route( f"/raw/(?P{SWHID_RE})/", "api-1-raw-object", throttle_scope="swh_raw_object", ) @api_doc("/raw/") @format_docstring() def api_raw_object(request: Request, swhid: str): """ .. http:get:: /api/1/raw/(swhid)/ Get the object corresponding to the SWHID in raw form. This endpoint exposes the internal representation (see the ``*_git_object`` functions in :mod:`swh.model.git_objects`), and so can be used to fetch a binary blob which hashes to the same identifier. .. warning:: That endpoint is not publicly available and requires authentication and special user permission in order to be able to request it. :param string swhid: the object's SWHID :resheader Content-Type: application/octet-stream :statuscode 200: no error :statuscode 404: the requested object can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`raw/swh:1:snp:6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a` """ if not (request.user.is_staff or request.user.has_perm(API_RAW_OBJECT_PERMISSION)): raise PermissionDenied() parsed_swhid = CoreSWHID.from_string(swhid) object_id = parsed_swhid.object_id object_type = parsed_swhid.object_type def not_found(): return NotFoundExc(f"Object with id {swhid} not found.") if object_type == ObjectType.CONTENT: results = archive.storage.content_find({"sha1_git": object_id}) if len(results) == 0: raise not_found() cnt = results[0] # `cnt.with_data()` unfortunately doesn't seem to work. if cnt.data is None: d = cnt.to_dict() d["data"] = archive.storage.content_get_data(cnt.sha1) cnt = model.Content.from_dict(d) assert cnt.data, f"Content {hash_to_hex(cnt.sha1)} ceased to exist" result = content_git_object(cnt) elif object_type == ObjectType.DIRECTORY: dir_ = directory_get(archive.storage, object_id) if dir_ is None: raise not_found() result = directory_git_object(dir_) elif object_type == ObjectType.REVISION: rev = archive.storage.revision_get([object_id], ignore_displayname=True)[0] if rev is None: raise not_found() result = revision_git_object(rev) elif object_type == ObjectType.RELEASE: rel = archive.storage.release_get([object_id], ignore_displayname=True)[0] if rel is None: raise not_found() result = release_git_object(rel) elif object_type == ObjectType.SNAPSHOT: snp = snapshot_get_all_branches(archive.storage, object_id) if snp is None: raise not_found() result = snapshot_git_object(snp) else: raise ValueError(f"Unexpected object type variant: {object_type}") response = HttpResponse(result, content_type="application/octet-stream") filename = swhid.replace(":", "_") + "_raw" response["Content-disposition"] = f"attachment; filename={filename}" return response diff --git a/swh/web/api/views/release.py b/swh/web/api/views/release.py index 3ce995bd..a276a7b4 100644 --- a/swh/web/api/views/release.py +++ b/swh/web/api/views/release.py @@ -1,62 +1,62 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from rest_framework.request import Request from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import archive +from swh.web.utils import archive @api_route( r"/release/(?P[0-9a-f]+)/", "api-1-release", checksum_args=["sha1_git"] ) @api_doc("/release/") @format_docstring() def api_release(request: Request, sha1_git: str): """ .. http:get:: /api/1/release/(sha1_git)/ Get information about a release in the archive. Releases are identified by **sha1** checksums, compatible with Git tag identifiers. See :func:`swh.model.git_objects.release_git_object` in our data model module for details about how they are computed. :param string sha1_git: hexadecimal representation of the release **sha1_git** identifier {common_headers} :>json object author: information about the author of the release :>json string date: RFC3339 representation of the release date :>json string id: the release unique identifier :>json string message: the message associated to the release :>json string name: the name of the release :>json string target: the target identifier of the release :>json string target_type: the type of the target, can be either **release**, **revision**, **content**, **directory** :>json string target_url: a link to the adequate api url based on the target type :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested release can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`release/208f61cc7a5dbc9879ae6e5c2f95891e270f09ef/` """ error_msg = "Release with sha1_git %s not found." % sha1_git return api_lookup( archive.lookup_release, sha1_git, notfound_msg=error_msg, enrich_fn=utils.enrich_release, request=request, ) diff --git a/swh/web/api/views/revision.py b/swh/web/api/views/revision.py index 7708a071..09a74ac6 100644 --- a/swh/web/api/views/revision.py +++ b/swh/web/api/views/revision.py @@ -1,218 +1,218 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Optional from django.http import HttpResponse from rest_framework.request import Request from swh.web.api import utils from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import archive +from swh.web.utils import archive DOC_RETURN_REVISION = """ :>json object author: information about the author of the revision :>json object committer: information about the committer of the revision :>json string committer_date: RFC3339 representation of the commit date :>json string date: RFC3339 representation of the revision date :>json string directory: the unique identifier that revision points to :>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>json string id: the revision unique identifier :>json boolean merge: whether or not the revision corresponds to a merge commit :>json string message: the message associated to the revision :>json array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` to get more information about it :>json string type: the type of the revision """ DOC_RETURN_REVISION_ARRAY = DOC_RETURN_REVISION.replace(":>json", ":>jsonarr") @api_route( r"/revision/(?P[0-9a-f]+)/", "api-1-revision", checksum_args=["sha1_git"] ) @api_doc("/revision/") @format_docstring(return_revision=DOC_RETURN_REVISION) def api_revision(request: Request, sha1_git: str): """ .. http:get:: /api/1/revision/(sha1_git)/ Get information about a revision in the archive. Revisions are identified by **sha1** checksums, compatible with Git commit identifiers. See :func:`swh.model.git_objects.revision_git_object` in our data model module for details about how they are computed. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier {common_headers} {return_revision} :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/` """ return api_lookup( archive.lookup_revision, sha1_git, notfound_msg="Revision with sha1_git {} not found.".format(sha1_git), enrich_fn=utils.enrich_revision, request=request, ) @api_route( r"/revision/(?P[0-9a-f]+)/raw/", "api-1-revision-raw-message", checksum_args=["sha1_git"], ) @api_doc("/revision/raw/", tags=["hidden"]) def api_revision_raw_message(request: Request, sha1_git: str): """Return the raw data of the message of revision identified by sha1_git""" raw = archive.lookup_revision_message(sha1_git) response = HttpResponse(raw["message"], content_type="application/octet-stream") response["Content-disposition"] = "attachment;filename=rev_%s_raw" % sha1_git return response @api_route( r"/revision/(?P[0-9a-f]+)/directory/", "api-1-revision-directory", checksum_args=["sha1_git"], ) @api_route( r"/revision/(?P[0-9a-f]+)/directory/(?P.+)/", "api-1-revision-directory", checksum_args=["sha1_git"], ) @api_doc("/revision/directory/") @format_docstring() def api_revision_directory( request: Request, sha1_git: str, dir_path: Optional[str] = None ): """ .. http:get:: /api/1/revision/(sha1_git)/directory/[(path)/] Get information about directory (entry) objects associated to revisions. Each revision is associated to a single "root" directory. This endpoint behaves like :http:get:`/api/1/directory/(sha1_git)/[(path)/]`, but operates on the root directory associated to a given revision. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier :param string path: optional parameter to get information about the directory entry pointed by that relative path {common_headers} :>json array content: directory entries as returned by :http:get:`/api/1/directory/(sha1_git)/[(path)/]` :>json string path: path of directory from the revision root one :>json string revision: the unique revision identifier :>json string type: the type of the directory :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/directory/` """ rev_id, result = archive.lookup_directory_through_revision( {"sha1_git": sha1_git}, dir_path ) content = result["content"] if result["type"] == "dir": # dir_entries result["content"] = [ utils.enrich_directory_entry(entry, request=request) for entry in content ] elif result["type"] == "file": # content result["content"] = utils.enrich_content(content, request=request) elif result["type"] == "rev": # revision result["content"] = utils.enrich_revision(content, request=request) return result @api_route( r"/revision/(?P[0-9a-f]+)/log/", "api-1-revision-log", checksum_args=["sha1_git"], ) @api_doc("/revision/log/") @format_docstring(return_revision_array=DOC_RETURN_REVISION_ARRAY) def api_revision_log(request: Request, sha1_git: str): """ .. http:get:: /api/1/revision/(sha1_git)/log/ Get a list of all revisions heading to a given one, in other words show the commit log. The revisions are returned in the breadth-first search order while visiting the revision graph. The number of revisions to return is also bounded by the **limit** query parameter. .. warning:: To get the full BFS traversal of the revision graph when the total number of revisions is greater than 1000, it is up to the client to keep track of the multiple branches of history when there's merge revisions in the returned objects. In other words, identify all the continuation points that need to be followed to get the full history through recursion. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier :query int limit: maximum number of revisions to return when performing BFS traversal on the revision graph (default to 10, can not exceed 1000) {common_headers} {return_revision_array} :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: head revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/e1a315fa3fa734e2a6154ed7b5b9ae0eb8987aad/log/` """ limit = int(request.query_params.get("limit", "10")) limit = min(limit, 1000) error_msg = "Revision with sha1_git %s not found." % sha1_git revisions = api_lookup( archive.lookup_revision_log, sha1_git, limit, notfound_msg=error_msg, enrich_fn=utils.enrich_revision, request=request, ) return {"results": revisions} diff --git a/swh/web/api/views/snapshot.py b/swh/web/api/views/snapshot.py index 03244e0e..85e4771c 100644 --- a/swh/web/api/views/snapshot.py +++ b/swh/web/api/views/snapshot.py @@ -1,106 +1,105 @@ # Copyright (C) 2018-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from rest_framework.request import Request from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.utils import enrich_snapshot from swh.web.api.views.utils import api_lookup -from swh.web.common import archive -from swh.web.common.utils import reverse from swh.web.config import get_config +from swh.web.utils import archive, reverse @api_route( r"/snapshot/(?P[0-9a-f]+)/", "api-1-snapshot", checksum_args=["snapshot_id"], ) @api_doc("/snapshot/") @format_docstring() def api_snapshot(request: Request, snapshot_id: str): """ .. http:get:: /api/1/snapshot/(snapshot_id)/ Get information about a snapshot in the archive. A snapshot is a set of named branches, which are pointers to objects at any level of the Software Heritage DAG. It represents a full picture of an origin at a given time. As well as pointing to other objects in the Software Heritage DAG, branches can also be aliases, in which case their target is the name of another branch in the same snapshot, or dangling, in which case the target is unknown. A snapshot identifier is a salted sha1. See :func:`swh.model.git_objects.snapshot_git_object` in our data model module for details about how they are computed. :param sha1 snapshot_id: a snapshot identifier :query str branches_from: optional parameter used to skip branches whose name is lesser than it before returning them :query int branches_count: optional parameter used to restrain the amount of returned branches (default to 1000) :query str target_types: optional comma separated list parameter used to filter the target types of branch to return (possible values that can be contained in that list are ``content``, ``directory``, ``revision``, ``release``, ``snapshot`` or ``alias``) {common_headers} {resheader_link} :>json object branches: object containing all branches associated to the snapshot,for each of them the associated target type and id are given but also a link to get information about that target :>json string id: the unique identifier of the snapshot :statuscode 200: no error :statuscode 400: an invalid snapshot identifier has been provided :statuscode 404: requested snapshot can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`snapshot/6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a/` """ snapshot_content_max_size = get_config()["snapshot_content_max_size"] branches_from = request.GET.get("branches_from", "") branches_count = int(request.GET.get("branches_count", snapshot_content_max_size)) target_types_str = request.GET.get("target_types", None) target_types = target_types_str.split(",") if target_types_str else None results = api_lookup( archive.lookup_snapshot, snapshot_id, branches_from, branches_count, target_types, branch_name_exclude_prefix=None, notfound_msg="Snapshot with id {} not found.".format(snapshot_id), enrich_fn=enrich_snapshot, request=request, ) response = {"results": results, "headers": {}} if results["next_branch"] is not None: response["headers"]["link-next"] = reverse( "api-1-snapshot", url_args={"snapshot_id": snapshot_id}, query_params={ "branches_from": results["next_branch"], "branches_count": str(branches_count), "target_types": ",".join(target_types) if target_types else None, }, request=request, ) return response diff --git a/swh/web/api/views/stat.py b/swh/web/api/views/stat.py index be837f74..70bff321 100644 --- a/swh/web/api/views/stat.py +++ b/swh/web/api/views/stat.py @@ -1,52 +1,52 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from rest_framework.request import Request from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route -from swh.web.common import archive +from swh.web.utils import archive @api_route(r"/stat/counters/", "api-1-stat-counters") @api_doc("/stat/counters/", noargs=True) @format_docstring() def api_stats(request: Request): """ .. http:get:: /api/1/stat/counters/ Get statistics about the content of the archive. :>json number content: current number of content objects (aka files) in the archive :>json number directory: current number of directory objects in the archive :>json number origin: current number of software origins (an origin is a "place" where code source can be found, e.g. a git repository, a tarball, ...) in the archive :>json number origin_visit: current number of visits on software origins to fill the archive :>json number person: current number of persons (code source authors or committers) in the archive :>json number release: current number of releases objects in the archive :>json number revision: current number of revision objects (aka commits) in the archive :>json number skipped_content: current number of content objects (aka files) which where not inserted in the archive :>json number snapshot: current number of snapshot objects (aka set of named branches) in the archive {common_headers} :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`stat/counters/` """ return archive.stat_counters() diff --git a/swh/web/api/views/utils.py b/swh/web/api/views/utils.py index 2196fb57..8f0147d2 100644 --- a/swh/web/api/views/utils.py +++ b/swh/web/api/views/utils.py @@ -1,91 +1,91 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from types import GeneratorType from typing import Any, Callable, Dict, List, Optional, Tuple, Union from django.http import HttpRequest from rest_framework.decorators import api_view from rest_framework.request import Request from rest_framework.response import Response from swh.web.api.apiurls import APIUrls, api_route -from swh.web.common.exc import NotFoundExc +from swh.web.utils.exc import NotFoundExc EnrichFunction = Callable[[Dict[str, str], Optional[HttpRequest]], Dict[str, str]] EnrichFunctionSearchResult = Callable[ [Tuple[List[Dict[str, Any]], Optional[str]], Optional[HttpRequest]], Tuple[List[Dict[str, Any]], Optional[str]], ] def api_lookup( lookup_fn: Callable[..., Any], *args: Any, notfound_msg: Optional[str] = "Object not found", enrich_fn: Optional[Union[EnrichFunction, EnrichFunctionSearchResult]] = None, request: Optional[HttpRequest] = None, **kwargs: Any, ): r""" Capture a redundant behavior of: - looking up the backend with a criteria (be it an identifier or checksum) passed to the function lookup_fn - if nothing is found, raise an NotFoundExc exception with error message notfound_msg. - Otherwise if something is returned: - either as list, map or generator, map the enrich_fn function to it and return the resulting data structure as list. - either as dict and pass to enrich_fn and return the dict enriched. Args: - lookup_fn: function expects one criteria and optional supplementary \*args. - \*args: supplementary arguments to pass to lookup_fn. - notfound_msg: if nothing matching the criteria is found, raise NotFoundExc with this error message. - enrich_fn: Function to use to enrich the result returned by lookup_fn. Default to the identity function if not provided. - request: Input HTTP request that will be provided as parameter to enrich_fn. Raises: NotFoundExp or whatever `lookup_fn` raises. """ def _enrich_fn_noop(x, request): return x if enrich_fn is None: enrich_fn = _enrich_fn_noop res = lookup_fn(*args, **kwargs) if res is None: raise NotFoundExc(notfound_msg) if isinstance(res, (list, GeneratorType)) or type(res) == map: return [enrich_fn(x, request) for x in res] return enrich_fn(res, request) @api_view(["GET", "HEAD"]) def api_home(request: Request): return Response({}, template_name="api.html") APIUrls.add_url_pattern(r"^api/$", api_home, view_name="api-1-homepage") @api_route(r"/", "api-1-endpoints") def api_endpoints(request): """Display the list of opened api endpoints.""" routes = APIUrls.get_app_endpoints().copy() for route, doc in routes.items(): doc["doc_intro"] = doc["docstring"].split("\n\n")[0] # Return a list of routes with consistent ordering env = {"doc_routes": sorted(routes.items())} return Response(env, template_name="api-endpoints.html") diff --git a/swh/web/api/views/vault.py b/swh/web/api/views/vault.py index b2807fb5..78e5abe4 100644 --- a/swh/web/api/views/vault.py +++ b/swh/web/api/views/vault.py @@ -1,513 +1,512 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Any, Dict from django.http import HttpResponse from django.shortcuts import redirect from rest_framework.request import Request from swh.model.hashutil import hash_to_hex from swh.model.swhids import CoreSWHID, ObjectType from swh.web.api.apidoc import api_doc, format_docstring from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup -from swh.web.common import archive, query -from swh.web.common.exc import BadInputExc -from swh.web.common.utils import SWHID_RE, reverse +from swh.web.utils import SWHID_RE, archive, query, reverse +from swh.web.utils.exc import BadInputExc ###################################################### # Common # XXX: a bit spaghetti. Would be better with class-based views. def _dispatch_cook_progress(request, bundle_type: str, swhid: CoreSWHID): if request.method == "GET": return api_lookup( archive.vault_progress, bundle_type, swhid, notfound_msg=f"Cooking of {swhid} was never requested.", request=request, ) elif request.method == "POST": email = request.POST.get("email", request.GET.get("email", None)) return api_lookup( archive.vault_cook, bundle_type, swhid, email, notfound_msg=f"{swhid} not found.", request=request, ) def _vault_response( vault_response: Dict[str, Any], add_legacy_items: bool ) -> Dict[str, Any]: d = { "fetch_url": vault_response["fetch_url"], "progress_message": vault_response["progress_msg"], "id": vault_response["task_id"], "status": vault_response["task_status"], "swhid": str(vault_response["swhid"]), } if add_legacy_items: d["obj_type"] = vault_response["swhid"].object_type.name.lower() d["obj_id"] = hash_to_hex(vault_response["swhid"].object_id) return d ###################################################### # Flat bundles @api_route( f"/vault/flat/(?P{SWHID_RE})/", "api-1-vault-cook-flat", methods=["GET", "POST"], throttle_scope="swh_vault_cooking", never_cache=True, ) @api_doc("/vault/flat/") @format_docstring() def api_vault_cook_flat(request: Request, swhid: str): """ .. http:get:: /api/1/vault/flat/(swhid)/ .. http:post:: /api/1/vault/flat/(swhid)/ Request the cooking of a simple archive, typically for a directory. That endpoint enables to create a vault cooking task for a directory through a POST request or check the status of a previously created one through a GET request. Once the cooking task has been executed, the resulting archive can be downloaded using the dedicated endpoint :http:get:`/api/1/vault/flat/(swhid)/raw/`. Then to extract the cooked directory in the current one, use:: $ tar xvf path/to/swh_1_*.tar.gz :param string swhid: the object's SWHID :query string email: e-mail to notify when the archive is ready {common_headers} :>json string fetch_url: the url from which to download the archive once it has been cooked (see :http:get:`/api/1/vault/flat/(swhid)/raw/`) :>json string progress_message: message describing the cooking task progress :>json number id: the cooking task id :>json string status: the cooking task status (either **new**, **pending**, **done** or **failed**) :>json string swhid: the identifier of the object to cook :statuscode 200: no error :statuscode 400: an invalid directory identifier has been provided :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ parsed_swhid = CoreSWHID.from_string(swhid) if parsed_swhid.object_type == ObjectType.DIRECTORY: res = _dispatch_cook_progress(request, "flat", parsed_swhid) res["fetch_url"] = reverse( "api-1-vault-fetch-flat", url_args={"swhid": swhid}, request=request, ) return _vault_response(res, add_legacy_items=False) elif parsed_swhid.object_type == ObjectType.CONTENT: raise BadInputExc( "Content objects do not need to be cooked, " "use `/api/1/content/raw/` instead." ) elif parsed_swhid.object_type == ObjectType.REVISION: # TODO: support revisions too? (the vault allows it) raise BadInputExc( "Only directories can be cooked as 'flat' bundles. " "Use `/api/1/vault/gitfast/` to cook revisions, as gitfast bundles." ) else: raise BadInputExc("Only directories can be cooked as 'flat' bundles.") @api_route( r"/vault/directory/(?P[0-9a-f]+)/", "api-1-vault-cook-directory", methods=["GET", "POST"], checksum_args=["dir_id"], throttle_scope="swh_vault_cooking", never_cache=True, ) @api_doc("/vault/directory/", tags=["deprecated"]) @format_docstring() def api_vault_cook_directory(request: Request, dir_id: str): """ .. http:get:: /api/1/vault/directory/(dir_id)/ This endpoint was replaced by :http:get:`/api/1/vault/flat/(swhid)/` """ _, obj_id = query.parse_hash_with_algorithms_or_throws( dir_id, ["sha1"], "Only sha1_git is supported." ) swhid = f"swh:1:dir:{obj_id.hex()}" res = _dispatch_cook_progress(request, "flat", CoreSWHID.from_string(swhid)) res["fetch_url"] = reverse( "api-1-vault-fetch-flat", url_args={"swhid": swhid}, request=request, ) return _vault_response(res, add_legacy_items=True) @api_route( f"/vault/flat/(?P{SWHID_RE})/raw/", "api-1-vault-fetch-flat", ) @api_doc("/vault/flat/raw/") def api_vault_fetch_flat(request: Request, swhid: str): """ .. http:get:: /api/1/vault/flat/(swhid)/raw/ Fetch the cooked archive for a flat bundle. See :http:get:`/api/1/vault/flat/(swhid)/` to get more details on 'flat' bundle cooking. :param string swhid: the SWHID of the object to cook :resheader Content-Type: application/gzip :statuscode 200: no error :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ res = api_lookup( archive.vault_fetch, "flat", CoreSWHID.from_string(swhid), notfound_msg=f"Cooked archive for {swhid} not found.", request=request, ) fname = "{}.tar.gz".format(swhid) response = HttpResponse(res, content_type="application/gzip") response["Content-disposition"] = "attachment; filename={}".format( fname.replace(":", "_") ) return response @api_route( r"/vault/directory/(?P[0-9a-f]+)/raw/", "api-1-vault-fetch-directory", checksum_args=["dir_id"], ) @api_doc("/vault/directory/raw/", tags=["hidden", "deprecated"]) def api_vault_fetch_directory(request: Request, dir_id: str): """ .. http:get:: /api/1/vault/directory/(dir_id)/raw/ This endpoint was replaced by :http:get:`/api/1/vault/flat/(swhid)/raw/` """ _, obj_id = query.parse_hash_with_algorithms_or_throws( dir_id, ["sha1"], "Only sha1_git is supported." ) rev_flat_raw_url = reverse( "api-1-vault-fetch-flat", url_args={"swhid": f"swh:1:dir:{dir_id}"} ) return redirect(rev_flat_raw_url) ###################################################### # gitfast bundles @api_route( f"/vault/gitfast/(?P{SWHID_RE})/", "api-1-vault-cook-gitfast", methods=["GET", "POST"], throttle_scope="swh_vault_cooking", never_cache=True, ) @api_doc("/vault/gitfast/") @format_docstring() def api_vault_cook_gitfast(request: Request, swhid: str): """ .. http:get:: /api/1/vault/gitfast/(swhid)/ .. http:post:: /api/1/vault/gitfast/(swhid)/ Request the cooking of a gitfast archive for a revision or check its cooking status. That endpoint enables to create a vault cooking task for a revision through a POST request or check the status of a previously created one through a GET request. Once the cooking task has been executed, the resulting gitfast archive can be downloaded using the dedicated endpoint :http:get:`/api/1/vault/gitfast/(swhid)/raw/`. Then to import the revision in the current directory, use:: $ git init $ zcat path/to/swh_1_rev_*.gitfast.gz | git fast-import $ git checkout HEAD :param string swhid: the revision's permanent identifiers :query string email: e-mail to notify when the gitfast archive is ready {common_headers} :>json string fetch_url: the url from which to download the archive once it has been cooked (see :http:get:`/api/1/vault/gitfast/(swhid)/raw/`) :>json string progress_message: message describing the cooking task progress :>json number id: the cooking task id :>json string status: the cooking task status (new/pending/done/failed) :>json string swhid: the identifier of the object to cook :statuscode 200: no error :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ parsed_swhid = CoreSWHID.from_string(swhid) if parsed_swhid.object_type == ObjectType.REVISION: res = _dispatch_cook_progress(request, "gitfast", parsed_swhid) res["fetch_url"] = reverse( "api-1-vault-fetch-gitfast", url_args={"swhid": swhid}, request=request, ) return _vault_response(res, add_legacy_items=False) elif parsed_swhid.object_type == ObjectType.CONTENT: raise BadInputExc( "Content objects do not need to be cooked, " "use `/api/1/content/raw/` instead." ) elif parsed_swhid.object_type == ObjectType.DIRECTORY: raise BadInputExc( "Only revisions can be cooked as 'gitfast' bundles. " "Use `/api/1/vault/flat/` to cook directories, as flat bundles." ) else: raise BadInputExc("Only revisions can be cooked as 'gitfast' bundles.") @api_route( r"/vault/revision/(?P[0-9a-f]+)/gitfast/", "api-1-vault-cook-revision_gitfast", methods=["GET", "POST"], checksum_args=["rev_id"], throttle_scope="swh_vault_cooking", never_cache=True, ) @api_doc("/vault/revision/gitfast/", tags=["deprecated"]) @format_docstring() def api_vault_cook_revision_gitfast(request: Request, rev_id: str): """ .. http:get:: /api/1/vault/revision/(rev_id)/gitfast/ This endpoint was replaced by :http:get:`/api/1/vault/gitfast/(swhid)/` """ _, obj_id = query.parse_hash_with_algorithms_or_throws( rev_id, ["sha1"], "Only sha1_git is supported." ) swhid = f"swh:1:rev:{obj_id.hex()}" res = _dispatch_cook_progress(request, "gitfast", CoreSWHID.from_string(swhid)) res["fetch_url"] = reverse( "api-1-vault-fetch-gitfast", url_args={"swhid": swhid}, request=request, ) return _vault_response(res, add_legacy_items=True) @api_route( f"/vault/gitfast/(?P{SWHID_RE})/raw/", "api-1-vault-fetch-gitfast", ) @api_doc("/vault/gitfast/raw/") def api_vault_fetch_revision_gitfast(request: Request, swhid: str): """ .. http:get:: /api/1/vault/gitfast/(swhid)/raw/ Fetch the cooked gitfast archive for a revision. See :http:get:`/api/1/vault/gitfast/(swhid)/` to get more details on gitfast cooking. :param string rev_id: the revision's sha1 identifier :resheader Content-Type: application/gzip :statuscode 200: no error :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ res = api_lookup( archive.vault_fetch, "gitfast", CoreSWHID.from_string(swhid), notfound_msg="Cooked archive for {} not found.".format(swhid), request=request, ) fname = "{}.gitfast.gz".format(swhid) response = HttpResponse(res, content_type="application/gzip") response["Content-disposition"] = "attachment; filename={}".format( fname.replace(":", "_") ) return response @api_route( r"/vault/revision/(?P[0-9a-f]+)/gitfast/raw/", "api-1-vault-fetch-revision_gitfast", checksum_args=["rev_id"], ) @api_doc("/vault/revision_gitfast/raw/", tags=["hidden", "deprecated"]) def _api_vault_revision_gitfast_raw(request: Request, rev_id: str): """ .. http:get:: /api/1/vault/revision/(rev_id)/gitfast/raw/ This endpoint was replaced by :http:get:`/api/1/vault/gitfast/(swhid)/raw/` """ rev_gitfast_raw_url = reverse( "api-1-vault-fetch-gitfast", url_args={"swhid": f"swh:1:rev:{rev_id}"} ) return redirect(rev_gitfast_raw_url) ###################################################### # git_bare bundles @api_route( f"/vault/git-bare/(?P{SWHID_RE})/", "api-1-vault-cook-git-bare", methods=["GET", "POST"], throttle_scope="swh_vault_cooking", never_cache=True, ) @api_doc("/vault/git-bare/") @format_docstring() def api_vault_cook_git_bare(request: Request, swhid: str): """ .. http:get:: /api/1/vault/git-bare/(swhid)/ .. http:post:: /api/1/vault/git-bare/(swhid)/ Request the cooking of a git-bare archive for a revision or check its cooking status. That endpoint enables to create a vault cooking task for a revision through a POST request or check the status of a previously created one through a GET request. Once the cooking task has been executed, the resulting git-bare archive can be downloaded using the dedicated endpoint :http:get:`/api/1/vault/git-bare/(swhid)/raw/`. Then to import the revision in the current directory, use:: $ tar -xf path/to/swh_1_rev_*.git.tar $ git clone swh:1:rev:*.git new_repository (replace ``swh:1:rev:*`` with the SWHID of the requested revision) This will create a directory called ``new_repository``, which is a git repository containing the requested objects. :param string swhid: the revision's permanent identifier :query string email: e-mail to notify when the git-bare archive is ready {common_headers} :>json string fetch_url: the url from which to download the archive once it has been cooked (see :http:get:`/api/1/vault/git-bare/(swhid)/raw/`) :>json string progress_message: message describing the cooking task progress :>json number id: the cooking task id :>json string status: the cooking task status (new/pending/done/failed) :>json string swhid: the identifier of the object to cook :statuscode 200: no error :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ parsed_swhid = CoreSWHID.from_string(swhid) if parsed_swhid.object_type == ObjectType.REVISION: res = _dispatch_cook_progress(request, "git_bare", parsed_swhid) res["fetch_url"] = reverse( "api-1-vault-fetch-git-bare", url_args={"swhid": swhid}, request=request, ) return _vault_response(res, add_legacy_items=False) elif parsed_swhid.object_type == ObjectType.CONTENT: raise BadInputExc( "Content objects do not need to be cooked, " "use `/api/1/content/raw/` instead." ) elif parsed_swhid.object_type == ObjectType.DIRECTORY: raise BadInputExc( "Only revisions can be cooked as 'git-bare' bundles. " "Use `/api/1/vault/flat/` to cook directories, as flat bundles." ) else: raise BadInputExc("Only revisions can be cooked as 'git-bare' bundles.") @api_route( f"/vault/git-bare/(?P{SWHID_RE})/raw/", "api-1-vault-fetch-git-bare", ) @api_doc("/vault/git-bare/raw/") def api_vault_fetch_revision_git_bare(request: Request, swhid: str): """ .. http:get:: /api/1/vault/git-bare/(swhid)/raw/ Fetch the cooked git-bare archive for a revision. See :http:get:`/api/1/vault/git-bare/(swhid)/` to get more details on git-bare cooking. :param string swhid: the revision's permanent identifier :resheader Content-Type: application/x-tar :statuscode 200: no error :statuscode 404: requested directory did not receive any cooking request yet (in case of GET) or can not be found in the archive (in case of POST) """ res = api_lookup( archive.vault_fetch, "git_bare", CoreSWHID.from_string(swhid), notfound_msg="Cooked archive for {} not found.".format(swhid), request=request, ) fname = "{}.git.tar".format(swhid) response = HttpResponse(res, content_type="application/x-tar") response["Content-disposition"] = "attachment; filename={}".format( fname.replace(":", "_") ) return response diff --git a/swh/web/auth/utils.py b/swh/web/auth/utils.py index 5d8da6ff..c710bc24 100644 --- a/swh/web/auth/utils.py +++ b/swh/web/auth/utils.py @@ -1,126 +1,124 @@ # Copyright (C) 2020-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from base64 import urlsafe_b64encode from typing import List from cryptography.fernet import Fernet from cryptography.hazmat.backends import default_backend from cryptography.hazmat.primitives import hashes from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC from django.contrib.auth.decorators import user_passes_test from django.http.request import HttpRequest -from swh.web.common.exc import ForbiddenExc - -OIDC_SWH_WEB_CLIENT_ID = "swh-web" +from swh.web.utils.exc import ForbiddenExc SWH_AMBASSADOR_PERMISSION = "swh.ambassador" API_SAVE_ORIGIN_PERMISSION = "swh.web.api.save_origin" ADMIN_LIST_DEPOSIT_PERMISSION = "swh.web.admin.list_deposits" MAILMAP_PERMISSION = "swh.web.mailmap" ADD_FORGE_MODERATOR_PERMISSION = "swh.web.add_forge_now.moderator" MAILMAP_ADMIN_PERMISSION = "swh.web.admin.mailmap" API_RAW_OBJECT_PERMISSION = "swh.web.api.raw_object" def _get_fernet(password: bytes, salt: bytes) -> Fernet: """ Instantiate a Fernet system from a password and a salt value (see https://cryptography.io/en/latest/fernet/). Args: password: user password that will be used to generate a Fernet key derivation function salt: value that will be used to generate a Fernet key derivation function Returns: The Fernet system """ kdf = PBKDF2HMAC( algorithm=hashes.SHA256(), length=32, salt=salt, iterations=100000, backend=default_backend(), ) key = urlsafe_b64encode(kdf.derive(password)) return Fernet(key) def encrypt_data(data: bytes, password: bytes, salt: bytes) -> bytes: """ Encrypt data using Fernet system (symmetric encryption). Args: data: input data to encrypt password: user password that will be used to generate a Fernet key derivation function salt: value that will be used to generate a Fernet key derivation function Returns: The encrypted data """ return _get_fernet(password, salt).encrypt(data) def decrypt_data(data: bytes, password: bytes, salt: bytes) -> bytes: """ Decrypt data using Fernet system (symmetric encryption). Args: data: input data to decrypt password: user password that will be used to generate a Fernet key derivation function salt: value that will be used to generate a Fernet key derivation function Returns: The decrypted data """ return _get_fernet(password, salt).decrypt(data) def privileged_user(request: HttpRequest, permissions: List[str] = []) -> bool: """Determine whether a user is authenticated and is a privileged one (e.g ambassador). This allows such user to have access to some more actions (e.g. bypass save code now review, access to 'archives' type...). A user is considered as privileged if he is a staff member or has any permission from those provided as parameters. Args: request: Input django HTTP request permissions: list of permission names to determine if user is privileged or not Returns: Whether the user is privileged or not. """ user = request.user return user.is_authenticated and ( user.is_staff or any([user.has_perm(perm) for perm in permissions]) ) def any_permission_required(*perms): """View decorator granting access to it if user has at least one permission among those passed as parameters. """ def check_perms(user): if any(user.has_perm(perm) for perm in perms): return True raise ForbiddenExc return user_passes_test(check_perms) def is_add_forge_now_moderator(user) -> bool: """Is a user considered an add-forge-now moderator? Returns True if a user is staff or has add forge now moderator permission """ return user.is_staff or user.has_perm(ADD_FORGE_MODERATOR_PERMISSION) diff --git a/swh/web/auth/views.py b/swh/web/auth/views.py index b662b977..0efd1c99 100644 --- a/swh/web/auth/views.py +++ b/swh/web/auth/views.py @@ -1,155 +1,155 @@ # Copyright (C) 2020-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json from typing import Any, Dict, Union, cast from cryptography.fernet import InvalidToken from django.contrib.auth.decorators import login_required from django.core.paginator import Paginator from django.http import HttpRequest from django.http.response import ( HttpResponse, HttpResponseBadRequest, HttpResponseForbidden, HttpResponseRedirect, JsonResponse, ) from django.shortcuts import render from django.views.decorators.http import require_http_methods from swh.auth.django.models import OIDCUser from swh.auth.django.utils import keycloak_oidc_client from swh.auth.django.views import get_oidc_login_data, oidc_login_view from swh.auth.keycloak import KeycloakError, keycloak_error_message from swh.web.auth.models import OIDCUserOfflineTokens from swh.web.auth.utils import decrypt_data, encrypt_data -from swh.web.common.exc import ForbiddenExc -from swh.web.common.utils import reverse from swh.web.config import get_config +from swh.web.utils import reverse +from swh.web.utils.exc import ForbiddenExc def oidc_generate_bearer_token(request: HttpRequest) -> HttpResponse: if not request.user.is_authenticated or not isinstance(request.user, OIDCUser): return HttpResponseForbidden() redirect_uri = reverse("oidc-generate-bearer-token-complete", request=request) return oidc_login_view( request, redirect_uri=redirect_uri, scope="openid offline_access" ) def oidc_generate_bearer_token_complete(request: HttpRequest) -> HttpResponse: if not request.user.is_authenticated or not isinstance(request.user, OIDCUser): raise ForbiddenExc("You are not allowed to generate bearer tokens.") if "error" in request.GET: raise Exception(request.GET["error"]) login_data = get_oidc_login_data(request) oidc_client = keycloak_oidc_client() oidc_profile = oidc_client.authorization_code( code=request.GET["code"], code_verifier=login_data["code_verifier"], redirect_uri=login_data["redirect_uri"], ) user = cast(OIDCUser, request.user) token = oidc_profile["refresh_token"] secret = get_config()["secret_key"].encode() salt = user.sub.encode() encrypted_token = encrypt_data(token.encode(), secret, salt) OIDCUserOfflineTokens.objects.create( user_id=str(user.id), offline_token=encrypted_token ).save() return HttpResponseRedirect(reverse("oidc-profile") + "#tokens") def oidc_list_bearer_tokens(request: HttpRequest) -> HttpResponse: if not request.user.is_authenticated or not isinstance(request.user, OIDCUser): return HttpResponseForbidden() tokens = OIDCUserOfflineTokens.objects.filter(user_id=str(request.user.id)) tokens = tokens.order_by("-creation_date") length = int(request.GET["length"]) page = int(request.GET["start"]) / length + 1 paginator = Paginator(tokens, length) tokens_data = [ {"id": t.id, "creation_date": t.creation_date.isoformat()} for t in paginator.page(int(page)).object_list ] table_data: Dict[str, Any] = {} table_data["recordsTotal"] = len(tokens_data) table_data["draw"] = int(request.GET["draw"]) table_data["data"] = tokens_data table_data["recordsFiltered"] = len(tokens_data) return JsonResponse(table_data) def _encrypted_token_bytes(token: Union[bytes, memoryview]) -> bytes: # token has been retrieved from a PosgreSQL database if isinstance(token, memoryview): return token.tobytes() else: return token @require_http_methods(["POST"]) def oidc_get_bearer_token(request: HttpRequest) -> HttpResponse: if not request.user.is_authenticated or not isinstance(request.user, OIDCUser): return HttpResponseForbidden() try: data = json.loads(request.body.decode("ascii")) user = cast(OIDCUser, request.user) token_data = OIDCUserOfflineTokens.objects.get(id=data["token_id"]) secret = get_config()["secret_key"].encode() salt = user.sub.encode() decrypted_token = decrypt_data( _encrypted_token_bytes(token_data.offline_token), secret, salt ) refresh_token = decrypted_token.decode("ascii") # check token is still valid oidc_client = keycloak_oidc_client() oidc_client.refresh_token(refresh_token) return HttpResponse(refresh_token, content_type="text/plain") except InvalidToken: return HttpResponse(status=401) except KeycloakError as ke: error_msg = keycloak_error_message(ke) if error_msg in ( "invalid_grant: Offline session not active", "invalid_grant: Offline user session not found", ): error_msg = "Bearer token has expired, please generate a new one." return HttpResponseBadRequest(error_msg, content_type="text/plain") @require_http_methods(["POST"]) def oidc_revoke_bearer_tokens(request: HttpRequest) -> HttpResponse: if not request.user.is_authenticated or not isinstance(request.user, OIDCUser): return HttpResponseForbidden() try: data = json.loads(request.body.decode("ascii")) user = cast(OIDCUser, request.user) for token_id in data["token_ids"]: token_data = OIDCUserOfflineTokens.objects.get(id=token_id) secret = get_config()["secret_key"].encode() salt = user.sub.encode() decrypted_token = decrypt_data( _encrypted_token_bytes(token_data.offline_token), secret, salt ) oidc_client = keycloak_oidc_client() oidc_client.logout(decrypted_token.decode("ascii")) token_data.delete() return HttpResponse(status=200) except InvalidToken: return HttpResponse(status=401) @login_required(login_url="/oidc/login/", redirect_field_name="next_path") def oidc_profile_view(request: HttpRequest) -> HttpResponse: return render(request, "profile.html") diff --git a/swh/web/browse/browseurls.py b/swh/web/browse/browseurls.py index fa22d466..3f67dfa5 100644 --- a/swh/web/browse/browseurls.py +++ b/swh/web/browse/browseurls.py @@ -1,48 +1,48 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import List, Optional -from swh.web.common.urlsindex import UrlsIndex +from swh.web.utils.urlsindex import UrlsIndex class BrowseUrls(UrlsIndex): """ Class to manage swh-web browse application urls. """ scope = "browse" def browse_route( *url_patterns: str, view_name: Optional[str] = None, checksum_args: Optional[List[str]] = None, ): """ Decorator to ease the registration of a swh-web browse endpoint Args: url_patterns: list of url patterns used by Django to identify the browse routes view_name: the name of the Django view associated to the routes used to reverse the url """ url_patterns = tuple("^browse/" + url_pattern + "$" for url_pattern in url_patterns) view_name = view_name def decorator(f): # register the route and its view in the browse endpoints index for url_pattern in url_patterns: BrowseUrls.add_url_pattern(url_pattern, f, view_name) if checksum_args: BrowseUrls.add_redirect_for_checksum_args( view_name, url_patterns, checksum_args ) return f return decorator diff --git a/swh/web/browse/identifiers.py b/swh/web/browse/identifiers.py index 8120b4f0..072a870c 100644 --- a/swh/web/browse/identifiers.py +++ b/swh/web/browse/identifiers.py @@ -1,20 +1,20 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.http import HttpRequest, HttpResponse from django.shortcuts import redirect -from swh.web.common.identifiers import resolve_swhid +from swh.web.utils.identifiers import resolve_swhid def swhid_browse(request: HttpRequest, swhid: str) -> HttpResponse: """ Django view enabling to browse the archive using :ref:`persistent-identifiers`. The url that points to it is :http:get:`/(swhid)/`. """ swhid_resolved = resolve_swhid(swhid, query_params=request.GET.dict()) assert swhid_resolved["browse_url"] return redirect(swhid_resolved["browse_url"]) diff --git a/swh/web/browse/snapshot_context.py b/swh/web/browse/snapshot_context.py index c4e8a5d0..1baf66b7 100644 --- a/swh/web/browse/snapshot_context.py +++ b/swh/web/browse/snapshot_context.py @@ -1,1379 +1,1379 @@ # Copyright (C) 2018-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information # Utility module for browsing the archive in a snapshot context. from collections import defaultdict from typing import Any, Dict, List, Optional, Tuple from django.http import HttpRequest, HttpResponse from django.shortcuts import render from django.utils.html import escape from swh.model.hashutil import hash_to_bytes from swh.model.model import Snapshot from swh.model.swhids import CoreSWHID, ObjectType from swh.web.browse.utils import ( format_log_entries, gen_release_link, gen_revision_link, gen_revision_log_link, gen_revision_url, gen_snapshot_link, get_directory_entries, get_readme_to_display, ) -from swh.web.common import archive -from swh.web.common.exc import BadInputExc, NotFoundExc, http_status_code_message -from swh.web.common.identifiers import get_swhids_info -from swh.web.common.origin_visits import get_origin_visit -from swh.web.common.typing import ( +from swh.web.config import get_config +from swh.web.utils import ( + archive, + django_cache, + format_utc_iso_date, + gen_path_info, + reverse, + swh_object_icons, +) +from swh.web.utils.exc import BadInputExc, NotFoundExc, http_status_code_message +from swh.web.utils.identifiers import get_swhids_info +from swh.web.utils.origin_visits import get_origin_visit +from swh.web.utils.typing import ( DirectoryMetadata, OriginInfo, SnapshotBranchInfo, SnapshotContext, SnapshotReleaseInfo, SWHObjectInfo, ) -from swh.web.common.utils import ( - django_cache, - format_utc_iso_date, - gen_path_info, - reverse, - swh_object_icons, -) -from swh.web.config import get_config _empty_snapshot_id = Snapshot(branches={}).id.hex() def _get_branch( branches: List[SnapshotBranchInfo], branch_name: str, snapshot_id: str ) -> Optional[SnapshotBranchInfo]: """ Utility function to get a specific branch from a snapshot. Returns None if the branch cannot be found. """ filtered_branches = [b for b in branches if b["name"] == branch_name] if filtered_branches: return filtered_branches[0] else: # case where a large branches list has been truncated snp = archive.lookup_snapshot( snapshot_id, branches_from=branch_name, branches_count=1, target_types=["revision", "alias"], # pull request branches must be browsable even if they are hidden # by default in branches list branch_name_exclude_prefix=None, ) snp_branch, _, _ = process_snapshot_branches(snp) if snp_branch and snp_branch[0]["name"] == branch_name: branches.append(snp_branch[0]) return snp_branch[0] return None def _get_release( releases: List[SnapshotReleaseInfo], release_name: Optional[str], snapshot_id: str ) -> Optional[SnapshotReleaseInfo]: """ Utility function to get a specific release from a snapshot. Returns None if the release cannot be found. """ filtered_releases = [r for r in releases if r["name"] == release_name] if filtered_releases: return filtered_releases[0] elif release_name: # case where a large branches list has been truncated try: # git origins have specific branches for releases snp = archive.lookup_snapshot( snapshot_id, branches_from=f"refs/tags/{release_name}", branches_count=1, target_types=["release"], ) except NotFoundExc: snp = archive.lookup_snapshot( snapshot_id, branches_from=release_name, branches_count=1, target_types=["release", "alias"], ) _, snp_release, _ = process_snapshot_branches(snp) if snp_release and snp_release[0]["name"] == release_name: releases.append(snp_release[0]) return snp_release[0] return None def _branch_not_found( branch_type: str, branch: str, snapshot_id: str, snapshot_sizes: Dict[str, int], origin_info: Optional[OriginInfo], timestamp: Optional[str], visit_id: Optional[int], ) -> None: """ Utility function to raise an exception when a specified branch/release can not be found. """ if branch_type == "branch": branch_type = "Branch" branch_type_plural = "branches" target_type = "revision" else: branch_type = "Release" branch_type_plural = "releases" target_type = "release" if snapshot_id and snapshot_sizes[target_type] == 0: msg = "Snapshot with id %s has an empty list" " of %s!" % ( snapshot_id, branch_type_plural, ) elif snapshot_id: msg = "%s %s for snapshot with id %s" " not found!" % ( branch_type, branch, snapshot_id, ) elif visit_id and snapshot_sizes[target_type] == 0 and origin_info: msg = ( "Origin with url %s" " for visit with id %s has an empty list" " of %s!" % (origin_info["url"], visit_id, branch_type_plural) ) elif visit_id and origin_info: msg = ( "%s %s associated to visit with" " id %s for origin with url %s" " not found!" % (branch_type, branch, visit_id, origin_info["url"]) ) elif snapshot_sizes[target_type] == 0 and origin_info and timestamp: msg = ( "Origin with url %s" " for visit with timestamp %s has an empty list" " of %s!" % (origin_info["url"], timestamp, branch_type_plural) ) elif origin_info and timestamp: msg = ( "%s %s associated to visit with" " timestamp %s for origin with " "url %s not found!" % (branch_type, branch, timestamp, origin_info["url"]) ) raise NotFoundExc(escape(msg)) def process_snapshot_branches( snapshot: Dict[str, Any] ) -> Tuple[List[SnapshotBranchInfo], List[SnapshotReleaseInfo], Dict[str, Any]]: """ Process a dictionary describing snapshot branches: extract those targeting revisions and releases, put them in two different lists, then sort those lists in lexicographical order of the branches' names. Args: snapshot: A dict describing a snapshot as returned for instance by - :func:`swh.web.common.archive.lookup_snapshot` + :func:`swh.web.utils.archive.lookup_snapshot` Returns: A tuple whose first member is the sorted list of branches targeting revisions, second member the sorted list of branches targeting releases and third member a dict mapping resolved branch aliases to their real target. """ snapshot_branches = snapshot["branches"] branches: Dict[str, SnapshotBranchInfo] = {} branch_aliases: Dict[str, str] = {} releases: Dict[str, SnapshotReleaseInfo] = {} revision_to_branch = defaultdict(set) revision_to_release = defaultdict(set) release_to_branch = defaultdict(set) for branch_name, target in snapshot_branches.items(): if not target: # FIXME: display branches with an unknown target anyway continue target_id = target["target"] target_type = target["target_type"] if target_type == "revision": branches[branch_name] = SnapshotBranchInfo( name=branch_name, alias=False, revision=target_id, date=None, directory=None, message=None, url=None, ) revision_to_branch[target_id].add(branch_name) elif target_type == "release": release_to_branch[target_id].add(branch_name) elif target_type == "alias": branch_aliases[branch_name] = target_id # FIXME: handle pointers to other object types def _add_release_info(branch, release, alias=False): releases[branch] = SnapshotReleaseInfo( name=release["name"], alias=alias, branch_name=branch, date=format_utc_iso_date(release["date"]), directory=None, id=release["id"], message=release["message"], target_type=release["target_type"], target=release["target"], url=None, ) def _add_branch_info(branch, revision, alias=False): branches[branch] = SnapshotBranchInfo( name=branch, alias=alias, revision=revision["id"], directory=revision["directory"], date=format_utc_iso_date(revision["date"]), message=revision["message"], url=None, ) releases_info = archive.lookup_release_multiple(release_to_branch.keys()) for release in releases_info: if release is None: continue branches_to_update = release_to_branch[release["id"]] for branch in branches_to_update: _add_release_info(branch, release) if release["target_type"] == "revision": revision_to_release[release["target"]].update(branches_to_update) revisions = archive.lookup_revision_multiple( set(revision_to_branch.keys()) | set(revision_to_release.keys()) ) for revision in revisions: if not revision: continue for branch in revision_to_branch[revision["id"]]: _add_branch_info(branch, revision) for release_id in revision_to_release[revision["id"]]: releases[release_id]["directory"] = revision["directory"] resolved_aliases = {} for branch_alias, branch_target in branch_aliases.items(): resolved_alias = archive.lookup_snapshot_alias(snapshot["id"], branch_alias) resolved_aliases[branch_alias] = resolved_alias if resolved_alias is None: continue target_type = resolved_alias["target_type"] target = resolved_alias["target"] if target_type == "revision": revision = archive.lookup_revision(target) _add_branch_info(branch_alias, revision, alias=True) elif target_type == "release": release = archive.lookup_release(target) _add_release_info(branch_alias, release, alias=True) if branch_alias in branches: branches[branch_alias]["name"] = branch_alias ret_branches = list(sorted(branches.values(), key=lambda b: b["name"])) ret_releases = list(sorted(releases.values(), key=lambda b: b["name"])) return ret_branches, ret_releases, resolved_aliases @django_cache() def get_snapshot_content( snapshot_id: str, ) -> Tuple[List[SnapshotBranchInfo], List[SnapshotReleaseInfo], Dict[str, Any]]: """Returns the lists of branches and releases associated to a swh snapshot. That list is put in cache in order to speedup the navigation in the swh-web/browse ui. .. warning:: At most 1000 branches contained in the snapshot will be returned for performance reasons. Args: snapshot_id: hexadecimal representation of the snapshot identifier Returns: A tuple with three members. The first one is a list of dict describing the snapshot branches. The second one is a list of dict describing the snapshot releases. The third one is a dict mapping resolved branch aliases to their real target. Raises: NotFoundExc if the snapshot does not exist """ branches: List[SnapshotBranchInfo] = [] releases: List[SnapshotReleaseInfo] = [] aliases: Dict[str, Any] = {} snapshot_content_max_size = get_config()["snapshot_content_max_size"] if snapshot_id: snapshot = archive.lookup_snapshot( snapshot_id, branches_count=snapshot_content_max_size ) branches, releases, aliases = process_snapshot_branches(snapshot) return branches, releases, aliases def get_origin_visit_snapshot( origin_info: OriginInfo, visit_ts: Optional[str] = None, visit_id: Optional[int] = None, snapshot_id: Optional[str] = None, ) -> Tuple[List[SnapshotBranchInfo], List[SnapshotReleaseInfo], Dict[str, Any]]: """Returns the lists of branches and releases associated to an origin for a given visit. The visit is expressed by either: * a snapshot identifier * a timestamp, if no visit with that exact timestamp is found, the closest one from the provided timestamp will be used. If no visit parameter is provided, it returns the list of branches found for the latest visit. That list is put in cache in order to speedup the navigation in the swh-web/browse ui. .. warning:: At most 1000 branches contained in the snapshot will be returned for performance reasons. Args: origin_info: a dict filled with origin information visit_ts: an ISO 8601 datetime string to parse visit_id: visit id for disambiguation in case several visits have the same timestamp snapshot_id: if provided, visit associated to the snapshot will be processed Returns: A tuple with three members. The first one is a list of dict describing the origin branches for the given visit. The second one is a list of dict describing the origin releases for the given visit. The third one is a dict mapping resolved branch aliases to their real target. Raises: NotFoundExc if the origin or its visit are not found """ visit_info = get_origin_visit(origin_info, visit_ts, visit_id, snapshot_id) return get_snapshot_content(visit_info["snapshot"]) def get_snapshot_context( snapshot_id: Optional[str] = None, origin_url: Optional[str] = None, timestamp: Optional[str] = None, visit_id: Optional[int] = None, branch_name: Optional[str] = None, release_name: Optional[str] = None, revision_id: Optional[str] = None, path: Optional[str] = None, browse_context: str = "directory", ) -> SnapshotContext: """ Utility function to compute relevant information when navigating the archive in a snapshot context. The snapshot is either referenced by its id or it will be retrieved from an origin visit. Args: snapshot_id: hexadecimal representation of a snapshot identifier origin_url: an origin_url timestamp: a datetime string for retrieving the closest visit of the origin visit_id: optional visit id for disambiguation in case of several visits with the same timestamp branch_name: optional branch name set when browsing the snapshot in that scope (will default to "HEAD" if not provided) release_name: optional release name set when browsing the snapshot in that scope revision_id: optional revision identifier set when browsing the snapshot in that scope path: optional path of the object currently browsed in the snapshot browse_context: indicates which type of object is currently browsed Returns: A dict filled with snapshot context information. Raises: - swh.web.common.exc.NotFoundExc: if no snapshot is found for the visit + swh.web.utils.exc.NotFoundExc: if no snapshot is found for the visit of an origin. """ assert origin_url is not None or snapshot_id is not None origin_info = None visit_info = None url_args = {} query_params: Dict[str, Any] = {} origin_visits_url = None if origin_url: if visit_id is not None: query_params["visit_id"] = visit_id elif snapshot_id is not None: query_params["snapshot"] = snapshot_id origin_info = archive.lookup_origin({"url": origin_url}) visit_info = get_origin_visit(origin_info, timestamp, visit_id, snapshot_id) formatted_date = format_utc_iso_date(visit_info["date"]) visit_info["formatted_date"] = formatted_date snapshot_id = visit_info["snapshot"] if not snapshot_id: raise NotFoundExc( "No snapshot associated to the visit of origin " "%s on %s" % (escape(origin_url), formatted_date) ) # provided timestamp is not necessarily equals to the one # of the retrieved visit, so get the exact one in order # to use it in the urls generated below if timestamp: timestamp = visit_info["date"] branches, releases, aliases = get_origin_visit_snapshot( origin_info, timestamp, visit_id, snapshot_id ) query_params["origin_url"] = origin_info["url"] origin_visits_url = reverse( "browse-origin-visits", query_params={"origin_url": origin_info["url"]} ) if timestamp is not None: query_params["timestamp"] = format_utc_iso_date( timestamp, "%Y-%m-%dT%H:%M:%SZ" ) visit_url = reverse("browse-origin-directory", query_params=query_params) visit_info["url"] = directory_url = visit_url branches_url = reverse("browse-origin-branches", query_params=query_params) releases_url = reverse("browse-origin-releases", query_params=query_params) else: assert snapshot_id is not None branches, releases, aliases = get_snapshot_content(snapshot_id) url_args = {"snapshot_id": snapshot_id} directory_url = reverse("browse-snapshot-directory", url_args=url_args) branches_url = reverse("browse-snapshot-branches", url_args=url_args) releases_url = reverse("browse-snapshot-releases", url_args=url_args) releases = list(reversed(releases)) @django_cache() def _get_snapshot_sizes(snapshot_id): return archive.lookup_snapshot_sizes(snapshot_id) snapshot_sizes = _get_snapshot_sizes(snapshot_id) is_empty = (snapshot_sizes["release"] + snapshot_sizes["revision"]) == 0 swh_snp_id = str( CoreSWHID(object_type=ObjectType.SNAPSHOT, object_id=hash_to_bytes(snapshot_id)) ) if visit_info: timestamp = format_utc_iso_date(visit_info["date"]) if origin_info: browse_view_name = f"browse-origin-{browse_context}" else: browse_view_name = f"browse-snapshot-{browse_context}" release_id = None root_directory = None snapshot_total_size = snapshot_sizes["release"] + snapshot_sizes["revision"] if path is not None: query_params["path"] = path if snapshot_total_size and revision_id is not None: # browse specific revision for a snapshot requested revision = archive.lookup_revision(revision_id) root_directory = revision["directory"] branches.append( SnapshotBranchInfo( name=revision_id, alias=False, revision=revision_id, directory=root_directory, date=revision["date"], message=revision["message"], url=None, ) ) query_params["revision"] = revision_id elif snapshot_total_size and release_name: # browse specific release for a snapshot requested release = _get_release(releases, release_name, snapshot_id) if release is None: _branch_not_found( "release", release_name, snapshot_id, snapshot_sizes, origin_info, timestamp, visit_id, ) else: if release["target_type"] == "revision": revision = archive.lookup_revision(release["target"]) root_directory = revision["directory"] revision_id = release["target"] elif release["target_type"] == "directory": root_directory = release["target"] release_id = release["id"] query_params["release"] = release_name elif snapshot_total_size: head = aliases.get("HEAD") if branch_name: # browse specific branch for a snapshot requested query_params["branch"] = branch_name branch = _get_branch(branches, branch_name, snapshot_id) if branch is None: _branch_not_found( "branch", branch_name, snapshot_id, snapshot_sizes, origin_info, timestamp, visit_id, ) else: branch_name = branch["name"] revision_id = branch["revision"] root_directory = branch["directory"] elif head is not None: # otherwise, browse branch targeted by the HEAD alias if it exists if head["target_type"] == "revision": # HEAD alias targets a revision head_rev = archive.lookup_revision(head["target"]) branch_name = "HEAD" revision_id = head_rev["id"] root_directory = head_rev["directory"] else: # HEAD alias targets a release release_name = archive.lookup_release(head["target"])["name"] head_rel = _get_release(releases, release_name, snapshot_id) if head_rel is None: _branch_not_found( "release", str(release_name), snapshot_id, snapshot_sizes, origin_info, timestamp, visit_id, ) elif head_rel["target_type"] == "revision": revision = archive.lookup_revision(head_rel["target"]) root_directory = revision["directory"] revision_id = head_rel["target"] elif head_rel["target_type"] == "directory": root_directory = head_rel["target"] if head_rel is not None: release_id = head_rel["id"] elif branches: # fallback to browse first branch otherwise branch = branches[0] branch_name = branch["name"] revision_id = branch["revision"] root_directory = branch["directory"] elif releases: # fallback to browse last release otherwise release = releases[-1] if release["target_type"] == "revision": revision = archive.lookup_revision(release["target"]) root_directory = revision["directory"] revision_id = release["target"] elif release["target_type"] == "directory": root_directory = release["target"] release_id = release["id"] release_name = release["name"] for b in branches: branch_query_params = dict(query_params) branch_query_params.pop("release", None) if b["name"] != b["revision"]: branch_query_params.pop("revision", None) branch_query_params["branch"] = b["name"] b["url"] = reverse( browse_view_name, url_args=url_args, query_params=branch_query_params ) for r in releases: release_query_params = dict(query_params) release_query_params.pop("branch", None) release_query_params.pop("revision", None) release_query_params["release"] = r["name"] r["url"] = reverse( browse_view_name, url_args=url_args, query_params=release_query_params, ) revision_info = None if revision_id: try: revision_info = archive.lookup_revision(revision_id) except NotFoundExc: pass else: revision_info["date"] = format_utc_iso_date(revision_info["date"]) revision_info["committer_date"] = format_utc_iso_date( revision_info["committer_date"] ) if revision_info["message"]: message_lines = revision_info["message"].split("\n") revision_info["message_header"] = message_lines[0] else: revision_info["message_header"] = "" snapshot_context = SnapshotContext( directory_url=directory_url, branch=branch_name, branch_alias=branch_name in aliases, branches=branches, branches_url=branches_url, is_empty=is_empty, origin_info=origin_info, origin_visits_url=origin_visits_url, release=release_name, release_alias=release_name in aliases, release_id=release_id, query_params=query_params, releases=releases, releases_url=releases_url, revision_id=revision_id, revision_info=revision_info, root_directory=root_directory, snapshot_id=snapshot_id, snapshot_sizes=snapshot_sizes, snapshot_swhid=swh_snp_id, url_args=url_args, visit_info=visit_info, ) if revision_info: revision_info["revision_url"] = gen_revision_url( revision_info["id"], snapshot_context ) return snapshot_context def _build_breadcrumbs( snapshot_context: SnapshotContext, path: Optional[str] ) -> List[Dict[str, str]]: origin_info = snapshot_context["origin_info"] url_args = snapshot_context["url_args"] query_params = dict(snapshot_context["query_params"]) root_directory = snapshot_context["root_directory"] path_info = gen_path_info(path) if origin_info: browse_view_name = "browse-origin-directory" else: browse_view_name = "browse-snapshot-directory" breadcrumbs = [] if root_directory: query_params.pop("path", None) breadcrumbs.append( { "name": root_directory[:7], "url": reverse( browse_view_name, url_args=url_args, query_params=query_params ), } ) for pi in path_info: query_params["path"] = pi["path"] breadcrumbs.append( { "name": pi["name"], "url": reverse( browse_view_name, url_args=url_args, query_params=query_params ), } ) return breadcrumbs def _check_origin_url(snapshot_id: Optional[str], origin_url: Optional[str]) -> None: if snapshot_id is None and origin_url is None: raise BadInputExc("An origin URL must be provided as query parameter.") def browse_snapshot_directory( request: HttpRequest, snapshot_id: Optional[str] = None, origin_url: Optional[str] = None, timestamp: Optional[str] = None, path: Optional[str] = None, ) -> HttpResponse: """ Django view implementation for browsing a directory in a snapshot context. """ _check_origin_url(snapshot_id, origin_url) visit_id = int(request.GET.get("visit_id", 0)) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=visit_id or None, path=path, browse_context="directory", branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), ) root_directory = snapshot_context["root_directory"] sha1_git = root_directory error_info: Dict[str, Any] = { "status_code": 200, "description": None, } if root_directory and path: try: dir_info = archive.lookup_directory_with_path(root_directory, path) sha1_git = dir_info["target"] except NotFoundExc as e: sha1_git = None error_info["status_code"] = 404 error_info["description"] = f"NotFoundExc: {str(e)}" dirs = [] files = [] if sha1_git: dirs, files = get_directory_entries(sha1_git) origin_info = snapshot_context["origin_info"] visit_info = snapshot_context["visit_info"] url_args = snapshot_context["url_args"] query_params = dict(snapshot_context["query_params"]) revision_id = snapshot_context["revision_id"] snapshot_id = snapshot_context["snapshot_id"] if origin_info: browse_view_name = "browse-origin-directory" else: browse_view_name = "browse-snapshot-directory" breadcrumbs = _build_breadcrumbs(snapshot_context, path) path = "" if path is None else (path + "/") for d in dirs: if d["type"] == "rev": d["url"] = reverse("browse-revision", url_args={"sha1_git": d["target"]}) else: query_params["path"] = path + d["name"] d["url"] = reverse( browse_view_name, url_args=url_args, query_params=query_params ) sum_file_sizes = 0 readmes = {} if origin_info: browse_view_name = "browse-origin-content" else: browse_view_name = "browse-snapshot-content" for f in files: query_params["path"] = path + f["name"] f["url"] = reverse( browse_view_name, url_args=url_args, query_params=query_params ) if f["length"] is not None: sum_file_sizes += f["length"] if f["name"].lower().startswith("readme"): readmes[f["name"]] = f["checksums"]["sha1"] readme_name, readme_url, readme_html = get_readme_to_display(readmes) if origin_info: browse_view_name = "browse-origin-log" else: browse_view_name = "browse-snapshot-log" history_url = None if snapshot_id != _empty_snapshot_id: query_params.pop("path", None) history_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) nb_files = None nb_dirs = None dir_path = None if root_directory: nb_files = len(files) nb_dirs = len(dirs) dir_path = "/" + path swh_objects = [] vault_cooking: Dict[str, Any] = { "directory_context": False, "directory_swhid": None, "revision_context": False, "revision_swhid": None, } revision_found = False if revision_id is not None: try: archive.lookup_revision(revision_id) except NotFoundExc: pass else: revision_found = True if sha1_git is not None: swh_objects.append( SWHObjectInfo(object_type=ObjectType.DIRECTORY, object_id=sha1_git) ) vault_cooking.update( { "directory_context": True, "directory_swhid": f"swh:1:dir:{sha1_git}", } ) if revision_id is not None and revision_found: swh_objects.append( SWHObjectInfo(object_type=ObjectType.REVISION, object_id=revision_id) ) vault_cooking.update( { "revision_context": True, "revision_swhid": f"swh:1:rev:{revision_id}", } ) swh_objects.append( SWHObjectInfo(object_type=ObjectType.SNAPSHOT, object_id=snapshot_id) ) visit_date = None visit_type = None if visit_info: visit_date = format_utc_iso_date(visit_info["date"]) visit_type = visit_info["type"] release_id = snapshot_context["release_id"] if release_id: swh_objects.append( SWHObjectInfo(object_type=ObjectType.RELEASE, object_id=release_id) ) dir_metadata = DirectoryMetadata( object_type=ObjectType.DIRECTORY, object_id=sha1_git, directory=sha1_git, nb_files=nb_files, nb_dirs=nb_dirs, sum_file_sizes=sum_file_sizes, root_directory=root_directory, path=dir_path, revision=revision_id, revision_found=revision_found, release=release_id, snapshot=snapshot_id, origin_url=origin_url, visit_date=visit_date, visit_type=visit_type, ) swhids_info = get_swhids_info(swh_objects, snapshot_context, dir_metadata) dir_path = "/".join([bc["name"] for bc in breadcrumbs]) + "/" context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading = "Directory - %s - %s - %s" % ( dir_path, snapshot_context["branch"], context_found, ) top_right_link = None if not snapshot_context["is_empty"] and revision_found: top_right_link = { "url": history_url, "icon": swh_object_icons["revisions history"], "text": "History", } return render( request, "browse/directory.html", { "heading": heading, "swh_object_name": "Directory", "swh_object_metadata": dir_metadata, "dirs": dirs, "files": files, "breadcrumbs": breadcrumbs if root_directory else [], "top_right_link": top_right_link, "readme_name": readme_name, "readme_url": readme_url, "readme_html": readme_html, "snapshot_context": snapshot_context, "vault_cooking": vault_cooking, "show_actions": True, "swhids_info": swhids_info, "error_code": error_info["status_code"], "error_message": http_status_code_message.get(error_info["status_code"]), "error_description": error_info["description"], }, status=error_info["status_code"], ) PER_PAGE = 100 def browse_snapshot_log( request: HttpRequest, snapshot_id: Optional[str] = None, origin_url: Optional[str] = None, timestamp: Optional[str] = None, ) -> HttpResponse: """ Django view implementation for browsing a revision history in a snapshot context. """ _check_origin_url(snapshot_id, origin_url) visit_id = int(request.GET.get("visit_id", 0)) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=visit_id or None, browse_context="log", branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), ) revision_id = snapshot_context["revision_id"] if revision_id is None: raise NotFoundExc("No revisions history found in the current snapshot context.") per_page = int(request.GET.get("per_page", PER_PAGE)) offset = int(request.GET.get("offset", 0)) revs_ordering = request.GET.get("revs_ordering", "committer_date") session_key = "rev_%s_log_ordering_%s" % (revision_id, revs_ordering) rev_log_session = request.session.get(session_key, None) rev_log = [] revs_walker_state = None if rev_log_session: rev_log = rev_log_session["rev_log"] revs_walker_state = rev_log_session["revs_walker_state"] if len(rev_log) < offset + per_page: revs_walker = archive.get_revisions_walker( revs_ordering, revision_id, max_revs=offset + per_page + 1, state=revs_walker_state, ) rev_log += [rev["id"] for rev in revs_walker] revs_walker_state = revs_walker.export_state() revs = rev_log[offset : offset + per_page] revision_log = archive.lookup_revision_multiple(revs) request.session[session_key] = { "rev_log": rev_log, "revs_walker_state": revs_walker_state, } origin_info = snapshot_context["origin_info"] visit_info = snapshot_context["visit_info"] url_args = snapshot_context["url_args"] query_params = snapshot_context["query_params"] snapshot_id = snapshot_context["snapshot_id"] query_params["per_page"] = str(per_page) revs_ordering = request.GET.get("revs_ordering", "") if revs_ordering: query_params["revs_ordering"] = revs_ordering if origin_info: browse_view_name = "browse-origin-log" else: browse_view_name = "browse-snapshot-log" prev_log_url = None if len(rev_log) > offset + per_page: query_params["offset"] = str(offset + per_page) prev_log_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) next_log_url = None if offset != 0: query_params["offset"] = str(offset - per_page) next_log_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) revision_log_data = format_log_entries(revision_log, per_page, snapshot_context) browse_rev_link = gen_revision_link(revision_id) browse_log_link = gen_revision_log_link(revision_id) browse_snp_link = gen_snapshot_link(snapshot_id) revision_metadata = { "context-independent revision": browse_rev_link, "context-independent revision history": browse_log_link, "context-independent snapshot": browse_snp_link, "snapshot": snapshot_id, } if origin_info and visit_info: revision_metadata["origin url"] = origin_info["url"] revision_metadata["origin visit date"] = format_utc_iso_date(visit_info["date"]) revision_metadata["origin visit type"] = visit_info["type"] swh_objects = [ SWHObjectInfo(object_type=ObjectType.REVISION, object_id=revision_id), SWHObjectInfo(object_type=ObjectType.SNAPSHOT, object_id=snapshot_id), ] release_id = snapshot_context["release_id"] if release_id: swh_objects.append( SWHObjectInfo(object_type=ObjectType.RELEASE, object_id=release_id) ) browse_rel_link = gen_release_link(release_id) revision_metadata["release"] = release_id revision_metadata["context-independent release"] = browse_rel_link swhids_info = get_swhids_info(swh_objects, snapshot_context) context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading = "Revision history - %s - %s" % (snapshot_context["branch"], context_found) return render( request, "browse/revision-log.html", { "heading": heading, "swh_object_name": "Revisions history", "swh_object_metadata": revision_metadata, "revision_log": revision_log_data, "revs_ordering": revs_ordering, "next_log_url": next_log_url, "prev_log_url": prev_log_url, "breadcrumbs": None, "top_right_link": None, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": True, "swhids_info": swhids_info, }, ) def browse_snapshot_branches( request: HttpRequest, snapshot_id: Optional[str] = None, origin_url: Optional[str] = None, timestamp: Optional[str] = None, branch_name_include: Optional[str] = None, ) -> HttpResponse: """ Django view implementation for browsing a list of branches in a snapshot context. """ _check_origin_url(snapshot_id, origin_url) visit_id = int(request.GET.get("visit_id", 0)) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=visit_id or None, ) branches_bc_str = request.GET.get("branches_breadcrumbs", "") branches_bc = branches_bc_str.split(",") if branches_bc_str else [] branches_from = branches_bc[-1] if branches_bc else "" origin_info = snapshot_context["origin_info"] url_args = snapshot_context["url_args"] query_params = snapshot_context["query_params"] if origin_info: browse_view_name = "browse-origin-directory" else: browse_view_name = "browse-snapshot-directory" snapshot = archive.lookup_snapshot( snapshot_context["snapshot_id"], branches_from, PER_PAGE + 1, target_types=["revision", "alias"], branch_name_include_substring=branch_name_include, ) displayed_branches: List[Dict[str, Any]] = [] if snapshot: branches, _, _ = process_snapshot_branches(snapshot) displayed_branches = [dict(branch) for branch in branches] for branch in displayed_branches: rev_query_params = {} if origin_info: rev_query_params["origin_url"] = origin_info["url"] revision_url = reverse( "browse-revision", url_args={"sha1_git": branch["revision"]}, query_params=query_params, ) query_params["branch"] = branch["name"] directory_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) del query_params["branch"] branch["revision_url"] = revision_url branch["directory_url"] = directory_url if origin_info: browse_view_name = "browse-origin-branches" else: browse_view_name = "browse-snapshot-branches" prev_branches_url = None next_branches_url = None if branches_bc: query_params_prev = dict(query_params) query_params_prev["branches_breadcrumbs"] = ",".join(branches_bc[:-1]) prev_branches_url = reverse( browse_view_name, url_args=url_args, query_params=query_params_prev ) elif branches_from: prev_branches_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) if snapshot and snapshot["next_branch"] is not None: query_params_next = dict(query_params) next_branch = displayed_branches[-1]["name"] del displayed_branches[-1] branches_bc.append(next_branch) query_params_next["branches_breadcrumbs"] = ",".join(branches_bc) next_branches_url = reverse( browse_view_name, url_args=url_args, query_params=query_params_next ) heading = "Branches - " if origin_info: heading += "origin: %s" % origin_info["url"] else: heading += "snapshot: %s" % snapshot_id return render( request, "browse/branches.html", { "heading": heading, "swh_object_name": "Branches", "swh_object_metadata": {}, "top_right_link": None, "displayed_branches": displayed_branches, "prev_branches_url": prev_branches_url, "next_branches_url": next_branches_url, "snapshot_context": snapshot_context, "search_string": branch_name_include or "", }, ) def browse_snapshot_releases( request: HttpRequest, snapshot_id: Optional[str] = None, origin_url: Optional[str] = None, timestamp: Optional[str] = None, release_name_include: Optional[str] = None, ): """ Django view implementation for browsing a list of releases in a snapshot context. """ _check_origin_url(snapshot_id, origin_url) visit_id = int(request.GET.get("visit_id", 0)) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=visit_id or None, ) rel_bc_str = request.GET.get("releases_breadcrumbs", "") rel_bc = rel_bc_str.split(",") if rel_bc_str else [] rel_from = rel_bc[-1] if rel_bc else "" origin_info = snapshot_context["origin_info"] url_args = snapshot_context["url_args"] query_params = snapshot_context["query_params"] snapshot = archive.lookup_snapshot( snapshot_context["snapshot_id"], rel_from, PER_PAGE + 1, target_types=["release", "alias"], branch_name_include_substring=release_name_include, ) displayed_releases: List[Dict[str, Any]] = [] if snapshot: _, releases, _ = process_snapshot_branches(snapshot) displayed_releases = [dict(release) for release in releases] for release in displayed_releases: query_params_tgt = {"snapshot": snapshot_id, "release": release["name"]} if origin_info: query_params_tgt["origin_url"] = origin_info["url"] release_url = reverse( "browse-release", url_args={"sha1_git": release["id"]}, query_params=query_params_tgt, ) target_url = "" tooltip = ( f"The release {release['name']} targets " f"{release['target_type']} {release['target']}" ) if release["target_type"] == "revision": target_url = reverse( "browse-revision", url_args={"sha1_git": release["target"]}, query_params=query_params_tgt, ) elif release["target_type"] == "directory": target_url = reverse( "browse-directory", url_args={"sha1_git": release["target"]}, query_params=query_params_tgt, ) elif release["target_type"] == "content": target_url = reverse( "browse-content", url_args={"query_string": release["target"]}, query_params=query_params_tgt, ) elif release["target_type"] == "release": target_url = reverse( "browse-release", url_args={"sha1_git": release["target"]}, query_params=query_params_tgt, ) tooltip = ( f"The release {release['name']} " f"is an alias for release {release['target']}" ) release["release_url"] = release_url release["target_url"] = target_url release["tooltip"] = tooltip if origin_info: browse_view_name = "browse-origin-releases" else: browse_view_name = "browse-snapshot-releases" prev_releases_url = None next_releases_url = None if rel_bc: query_params_prev = dict(query_params) query_params_prev["releases_breadcrumbs"] = ",".join(rel_bc[:-1]) prev_releases_url = reverse( browse_view_name, url_args=url_args, query_params=query_params_prev ) elif rel_from: prev_releases_url = reverse( browse_view_name, url_args=url_args, query_params=query_params ) if snapshot and snapshot["next_branch"] is not None: query_params_next = dict(query_params) next_rel = displayed_releases[-1]["branch_name"] del displayed_releases[-1] rel_bc.append(next_rel) query_params_next["releases_breadcrumbs"] = ",".join(rel_bc) next_releases_url = reverse( browse_view_name, url_args=url_args, query_params=query_params_next ) heading = "Releases - " if origin_info: heading += "origin: %s" % origin_info["url"] else: heading += "snapshot: %s" % snapshot_id return render( request, "browse/releases.html", { "heading": heading, "top_panel_visible": False, "top_panel_collapsible": False, "swh_object_name": "Releases", "swh_object_metadata": {}, "top_right_link": None, "displayed_releases": displayed_releases, "prev_releases_url": prev_releases_url, "next_releases_url": next_releases_url, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": False, "search_string": release_name_include or "", }, ) diff --git a/swh/web/browse/urls.py b/swh/web/browse/urls.py index 39af8592..2e9762f0 100644 --- a/swh/web/browse/urls.py +++ b/swh/web/browse/urls.py @@ -1,64 +1,64 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.http import HttpRequest, HttpResponse from django.shortcuts import redirect, render from django.urls import re_path as url from swh.web.browse.browseurls import BrowseUrls from swh.web.browse.identifiers import swhid_browse import swh.web.browse.views.content # noqa import swh.web.browse.views.directory # noqa import swh.web.browse.views.origin # noqa import swh.web.browse.views.release # noqa import swh.web.browse.views.revision # noqa import swh.web.browse.views.snapshot # noqa -from swh.web.common.utils import origin_visit_types, reverse +from swh.web.utils import origin_visit_types, reverse def _browse_help_view(request: HttpRequest) -> HttpResponse: return render( request, "browse/help.html", {"heading": "How to browse the archive ?"} ) def _browse_search_view(request: HttpRequest) -> HttpResponse: return render( request, "browse/search.html", { "heading": "Search software origins to browse", "visit_types": origin_visit_types(), }, ) def _browse_vault_view(request: HttpRequest) -> HttpResponse: return render( request, "browse/vault-ui.html", {"heading": "Download archive content from the Vault"}, ) def _browse_origin_save_view(request: HttpRequest) -> HttpResponse: return redirect(reverse("origin-save")) urlpatterns = [ url(r"^browse/$", _browse_search_view), url(r"^browse/help/$", _browse_help_view, name="browse-help"), url(r"^browse/search/$", _browse_search_view, name="browse-search"), url(r"^browse/vault/$", _browse_vault_view, name="browse-vault"), # for backward compatibility url(r"^browse/origin/save/$", _browse_origin_save_view, name="browse-origin-save"), url( r"^browse/(?Pswh:[0-9]+:[a-z]+:[0-9a-f]+.*)/$", swhid_browse, name="browse-swhid-legacy", ), ] urlpatterns += BrowseUrls.get_url_patterns() diff --git a/swh/web/browse/utils.py b/swh/web/browse/utils.py index a39bfbf4..c0815281 100644 --- a/swh/web/browse/utils.py +++ b/swh/web/browse/utils.py @@ -1,773 +1,774 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import base64 import stat import textwrap from typing import Any, Dict, Iterator, List, Optional, Tuple, Union import chardet import magic from django.utils.html import escape from django.utils.safestring import mark_safe -from swh.web.common import archive, highlightjs -from swh.web.common.exc import NotFoundExc, sentry_capture_exception -from swh.web.common.typing import SnapshotContext -from swh.web.common.utils import ( +from swh.web.config import get_config +from swh.web.utils import ( + archive, browsers_supported_image_mimes, django_cache, format_utc_iso_date, + highlightjs, reverse, rst_to_html, ) -from swh.web.config import get_config +from swh.web.utils.exc import NotFoundExc, sentry_capture_exception +from swh.web.utils.typing import SnapshotContext @django_cache() def get_directory_entries( sha1_git: str, ) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]: """Function that retrieves the content of a directory from the archive. The directories entries are first sorted in lexicographical order. Sub-directories and regular files are then extracted. Args: sha1_git: sha1_git identifier of the directory Returns: A tuple whose first member corresponds to the sub-directories list and second member the regular files list Raises: NotFoundExc if the directory is not found """ entries: List[Dict[str, Any]] = list(archive.lookup_directory(sha1_git)) for e in entries: e["perms"] = stat.filemode(e["perms"]) if e["type"] == "rev": # modify dir entry name to explicitly show it points # to a revision e["name"] = "%s @ %s" % (e["name"], e["target"][:7]) dirs = [e for e in entries if e["type"] in ("dir", "rev")] files = [e for e in entries if e["type"] == "file"] dirs = sorted(dirs, key=lambda d: d["name"]) files = sorted(files, key=lambda f: f["name"]) return dirs, files def get_mimetype_and_encoding_for_content(content: bytes) -> Tuple[str, str]: """Function that returns the mime type and the encoding associated to a content buffer using the magic module under the hood. Args: content: a content buffer Returns: A tuple (mimetype, encoding), for instance ('text/plain', 'us-ascii'), associated to the provided content. """ m = magic.Magic(mime=True, mime_encoding=True) mime_encoding = m.from_buffer(content) mime_type, encoding = mime_encoding.split(";") encoding = encoding.replace(" charset=", "") return mime_type, encoding # maximum authorized content size in bytes for HTML display # with code highlighting content_display_max_size = get_config()["content_display_max_size"] def re_encode_content( mimetype: str, encoding: str, content_data: bytes ) -> Tuple[str, str, bytes]: """Try to re-encode textual content if it is not encoded to UTF-8 for proper display in the browse Web UI. Args: mimetype: content mimetype as detected by python-magic encoding: content encoding as detected by python-magic content_data: raw content bytes Returns: A tuple with 3 members: content mimetype, content encoding (possibly updated after processing), content raw bytes (possibly reencoded to UTF-8) """ if mimetype.startswith("text/") and encoding not in ("us-ascii", "utf-8"): # first check if chardet detects an encoding with confidence result = chardet.detect(content_data) if result["confidence"] >= 0.9: encoding = result["encoding"] content_data = content_data.decode(encoding).encode("utf-8") elif encoding == "unknown-8bit": # probably a malformed UTF-8 content, re-encode it # by replacing invalid chars with a substitution one content_data = content_data.decode("utf-8", "replace").encode("utf-8") elif encoding not in ["utf-8", "binary"]: content_data = content_data.decode(encoding, "replace").encode("utf-8") elif mimetype.startswith("application/octet-stream"): # file may detect a text content as binary # so try to decode it for display encodings = ["us-ascii", "utf-8"] encodings += ["iso-8859-%s" % i for i in range(1, 17)] for enc in encodings: try: content_data = content_data.decode(enc).encode("utf-8") except Exception: pass else: # ensure display in content view encoding = enc mimetype = "text/plain" break return mimetype, encoding, content_data def request_content( query_string: str, max_size: Optional[int] = content_display_max_size, re_encode: bool = True, ) -> Dict[str, Any]: """Function that retrieves a content from the archive. Raw bytes content is first retrieved, then the content mime type. If the mime type is not stored in the archive, it will be computed using Python magic module. Args: query_string: a string of the form "[ALGO_HASH:]HASH" where optional ALGO_HASH can be either ``sha1``, ``sha1_git``, ``sha256``, or ``blake2s256`` (default to ``sha1``) and HASH the hexadecimal representation of the hash value max_size: the maximum size for a content to retrieve (default to 1MB, no size limit if None) Returns: A dict filled with content info. Raises: NotFoundExc if the content is not found """ content_data = archive.lookup_content(query_string) filetype = None language = None # requests to the indexer db may fail so properly handle # those cases in order to avoid content display errors try: filetype = archive.lookup_content_filetype(query_string) language = archive.lookup_content_language(query_string) except Exception as exc: sentry_capture_exception(exc) mimetype = "unknown" encoding = "unknown" if filetype: mimetype = filetype["mimetype"] encoding = filetype["encoding"] if not max_size or content_data["length"] < max_size: try: content_raw = archive.lookup_content_raw(query_string) except Exception as exc: sentry_capture_exception(exc) raise NotFoundExc( "The bytes of the content are currently not available " "in the archive." ) else: content_data["raw_data"] = content_raw["data"] if not filetype: mimetype, encoding = get_mimetype_and_encoding_for_content( content_data["raw_data"] ) if re_encode: mimetype, encoding, raw_data = re_encode_content( mimetype, encoding, content_data["raw_data"] ) content_data["raw_data"] = raw_data else: content_data["raw_data"] = None content_data["mimetype"] = mimetype content_data["encoding"] = encoding if language: content_data["language"] = language["lang"] else: content_data["language"] = "not detected" return content_data def prepare_content_for_display( content_data: bytes, mime_type: str, path: Optional[str] ) -> Dict[str, Any]: """Function that prepares a content for HTML display. The function tries to associate a programming language to a content in order to perform syntax highlighting client-side using highlightjs. The language is determined using either the content filename or its mime type. If the mime type corresponds to an image format supported by web browsers, the content will be encoded in base64 for displaying the image. Args: content_data: raw bytes of the content mime_type: mime type of the content path: path of the content including filename Returns: A dict containing the content bytes (possibly different from the one provided as parameter if it is an image) under the key 'content_data and the corresponding highlightjs language class under the key 'language'. """ language = None if path: language = highlightjs.get_hljs_language_from_filename(path.split("/")[-1]) if language is None: language = highlightjs.get_hljs_language_from_mime_type(mime_type) if language is None: language = "plaintext" processed_content: Union[bytes, str] = content_data if mime_type.startswith("image/"): if mime_type in browsers_supported_image_mimes: processed_content = base64.b64encode(content_data).decode("ascii") if mime_type.startswith("image/svg"): mime_type = "image/svg+xml" if mime_type.startswith("text/") or mime_type.startswith("application/"): processed_content = content_data.decode("utf-8", errors="replace") return { "content_data": processed_content, "language": language, "mimetype": mime_type, } def gen_link( url: str, link_text: Optional[str] = None, link_attrs: Optional[Dict[str, str]] = None, ) -> str: """ Utility function for generating an HTML link to insert in Django templates. Args: url: an url link_text: optional text for the produced link, if not provided the url will be used link_attrs: optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ attrs = " " if link_attrs: for k, v in link_attrs.items(): attrs += '%s="%s" ' % (k, v) if not link_text: link_text = url link = '%s' % (attrs, escape(url), escape(link_text)) return mark_safe(link) def _snapshot_context_query_params( snapshot_context: Optional[SnapshotContext], ) -> Dict[str, str]: query_params: Dict[str, str] = {} if not snapshot_context: return query_params if snapshot_context and snapshot_context["origin_info"]: origin_info = snapshot_context["origin_info"] snp_query_params = snapshot_context["query_params"] query_params = {"origin_url": origin_info["url"]} if "timestamp" in snp_query_params: query_params["timestamp"] = str(snp_query_params["timestamp"]) if "visit_id" in snp_query_params: query_params["visit_id"] = str(snp_query_params["visit_id"]) if "snapshot" in snp_query_params and "visit_id" not in query_params: query_params["snapshot"] = str(snp_query_params["snapshot"]) elif snapshot_context: query_params = {"snapshot": snapshot_context["snapshot_id"]} if snapshot_context["release"]: query_params["release"] = snapshot_context["release"] elif snapshot_context["branch"] and snapshot_context["branch"] not in ( "HEAD", snapshot_context["revision_id"], ): query_params["branch"] = snapshot_context["branch"] elif snapshot_context["revision_id"]: query_params["revision"] = snapshot_context["revision_id"] return query_params def gen_revision_url( revision_id: str, snapshot_context: Optional[SnapshotContext] = None ) -> str: """ Utility function for generating an url to a revision. Args: revision_id: a revision id snapshot_context: if provided, generate snapshot-dependent browsing url Returns: str: The url to browse the revision """ query_params = _snapshot_context_query_params(snapshot_context) # remove query parameters not needed for a revision view query_params.pop("revision", None) query_params.pop("release", None) return reverse( "browse-revision", url_args={"sha1_git": revision_id}, query_params=query_params ) def gen_revision_link( revision_id: str, shorten_id: bool = False, snapshot_context: Optional[SnapshotContext] = None, link_text: Optional[str] = "Browse", link_attrs: Optional[Dict[str, str]] = { "class": "btn btn-default btn-sm", "role": "button", }, ) -> Optional[str]: """ Utility function for generating a link to a revision HTML view to insert in Django templates. Args: revision_id: a revision id shorten_id: whether to shorten the revision id to 7 characters for the link text snapshot_context: if provided, generate snapshot-dependent browsing link link_text: optional text for the generated link (the revision id will be used by default) link_attrs: optional attributes (e.g. class) to add to the link Returns: str: An HTML link in the form 'revision_id' """ if not revision_id: return None revision_url = gen_revision_url(revision_id, snapshot_context) if shorten_id: return gen_link(revision_url, revision_id[:7], link_attrs) else: if not link_text: link_text = revision_id return gen_link(revision_url, link_text, link_attrs) def gen_directory_link( sha1_git: str, snapshot_context: Optional[SnapshotContext] = None, link_text: Optional[str] = "Browse", link_attrs: Optional[Dict[str, str]] = { "class": "btn btn-default btn-sm", "role": "button", }, ) -> Optional[str]: """ Utility function for generating a link to a directory HTML view to insert in Django templates. Args: sha1_git: directory identifier link_text: optional text for the generated link (the directory id will be used by default) link_attrs: optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ if not sha1_git: return None query_params = _snapshot_context_query_params(snapshot_context) directory_url = reverse( "browse-directory", url_args={"sha1_git": sha1_git}, query_params=query_params ) if not link_text: link_text = sha1_git return gen_link(directory_url, link_text, link_attrs) def gen_snapshot_link( snapshot_id: str, snapshot_context: Optional[SnapshotContext] = None, link_text: Optional[str] = "Browse", link_attrs: Optional[Dict[str, str]] = { "class": "btn btn-default btn-sm", "role": "button", }, ) -> str: """ Utility function for generating a link to a snapshot HTML view to insert in Django templates. Args: snapshot_id: snapshot identifier link_text: optional text for the generated link (the snapshot id will be used by default) link_attrs: optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ query_params = _snapshot_context_query_params(snapshot_context) snapshot_url = reverse( "browse-snapshot", url_args={"snapshot_id": snapshot_id}, query_params=query_params, ) if not link_text: link_text = snapshot_id return gen_link(snapshot_url, link_text, link_attrs) def gen_content_link( sha1_git: str, snapshot_context: Optional[SnapshotContext] = None, link_text: Optional[str] = "Browse", link_attrs: Optional[Dict[str, str]] = { "class": "btn btn-default btn-sm", "role": "button", }, ) -> Optional[str]: """ Utility function for generating a link to a content HTML view to insert in Django templates. Args: sha1_git: content identifier link_text: optional text for the generated link (the content sha1_git will be used by default) link_attrs: optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ if not sha1_git: return None query_params = _snapshot_context_query_params(snapshot_context) content_url = reverse( "browse-content", url_args={"query_string": "sha1_git:" + sha1_git}, query_params=query_params, ) if not link_text: link_text = sha1_git return gen_link(content_url, link_text, link_attrs) def get_revision_log_url( revision_id: str, snapshot_context: Optional[SnapshotContext] = None ) -> str: """ Utility function for getting the URL for a revision log HTML view (possibly in the context of an origin). Args: revision_id: revision identifier the history heads to snapshot_context: if provided, generate snapshot-dependent browsing link Returns: The revision log view URL """ query_params = {} if snapshot_context: query_params = _snapshot_context_query_params(snapshot_context) query_params["revision"] = revision_id if snapshot_context and snapshot_context["origin_info"]: revision_log_url = reverse("browse-origin-log", query_params=query_params) elif snapshot_context: url_args = {"snapshot_id": snapshot_context["snapshot_id"]} del query_params["snapshot"] revision_log_url = reverse( "browse-snapshot-log", url_args=url_args, query_params=query_params ) else: revision_log_url = reverse( "browse-revision-log", url_args={"sha1_git": revision_id} ) return revision_log_url def gen_revision_log_link( revision_id: str, snapshot_context: Optional[SnapshotContext] = None, link_text: Optional[str] = "Browse", link_attrs: Optional[Dict[str, str]] = { "class": "btn btn-default btn-sm", "role": "button", }, ) -> Optional[str]: """ Utility function for generating a link to a revision log HTML view (possibly in the context of an origin) to insert in Django templates. Args: revision_id: revision identifier the history heads to snapshot_context: if provided, generate snapshot-dependent browsing link link_text: optional text to use for the generated link (the revision id will be used by default) link_attrs: optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ if not revision_id: return None revision_log_url = get_revision_log_url(revision_id, snapshot_context) if not link_text: link_text = revision_id return gen_link(revision_log_url, link_text, link_attrs) def gen_person_mail_link( person: Dict[str, Any], link_text: Optional[str] = None ) -> str: """ Utility function for generating a mail link to a person to insert in Django templates. Args: person: dictionary containing person data (*name*, *email*, *fullname*) link_text: optional text to use for the generated mail link (the person name will be used by default) Returns: str: A mail link to the person or the person name if no email is present in person data """ person_name = person["name"] or person["fullname"] or "None" if link_text is None: link_text = person_name person_email = person["email"] if person["email"] else None if person_email is None and "@" in person_name and " " not in person_name: person_email = person_name if person_email: return gen_link(url="mailto:%s" % person_email, link_text=link_text) else: return person_name def gen_release_link( sha1_git: str, snapshot_context: Optional[SnapshotContext] = None, link_text: Optional[str] = "Browse", link_attrs: Optional[Dict[str, str]] = { "class": "btn btn-default btn-sm", "role": "button", }, ) -> str: """ Utility function for generating a link to a release HTML view to insert in Django templates. Args: sha1_git: release identifier link_text: optional text for the generated link (the release id will be used by default) link_attrs: optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ query_params = _snapshot_context_query_params(snapshot_context) release_url = reverse( "browse-release", url_args={"sha1_git": sha1_git}, query_params=query_params ) if not link_text: link_text = sha1_git return gen_link(release_url, link_text, link_attrs) def format_log_entries( revision_log: Iterator[Optional[Dict[str, Any]]], per_page: int, snapshot_context: Optional[SnapshotContext] = None, ) -> List[Dict[str, str]]: """ Utility functions that process raw revision log data for HTML display. Its purpose is to: * add links to relevant browse views * format date in human readable format * truncate the message log Args: revision_log: raw revision log as returned by the swh-web api per_page: number of log entries per page snapshot_context: if provided, generate snapshot-dependent browsing link """ revision_log_data = [] for i, rev in enumerate(revision_log): if rev is None: continue if i == per_page: break author_name = "None" author_fullname = "None" committer_fullname = "None" if rev["author"]: author_name = gen_person_mail_link(rev["author"]) author_fullname = rev["author"]["fullname"] if rev["committer"]: committer_fullname = rev["committer"]["fullname"] author_date = format_utc_iso_date(rev["date"]) committer_date = format_utc_iso_date(rev["committer_date"]) tooltip = "revision %s\n" % rev["id"] tooltip += "author: %s\n" % author_fullname tooltip += "author date: %s\n" % author_date tooltip += "committer: %s\n" % committer_fullname tooltip += "committer date: %s\n\n" % committer_date if rev["message"]: tooltip += textwrap.indent(rev["message"], " " * 4) revision_log_data.append( { "author": author_name, "id": rev["id"][:7], "message": rev["message"], "date": author_date, "commit_date": committer_date, "url": gen_revision_url(rev["id"], snapshot_context), "tooltip": tooltip, } ) return revision_log_data # list of common readme names ordered by preference # (lower indices have higher priority) _common_readme_names = [ "readme.markdown", "readme.md", "readme.rst", "readme.txt", "readme", ] def get_readme_to_display( readmes: Dict[str, str] ) -> Tuple[Optional[str], Optional[str], Optional[str]]: """ Process a list of readme files found in a directory in order to find the adequate one to display. Args: readmes: a dict where keys are readme file names and values are readme sha1s Returns: A tuple (readme_name, readme_url, readme_html) """ readme_name = None readme_url = None readme_sha1 = None readme_html = None lc_readmes = {k.lower(): {"orig_name": k, "sha1": v} for k, v in readmes.items()} # look for readme names according to the preference order # defined by the _common_readme_names list for common_readme_name in _common_readme_names: if common_readme_name in lc_readmes: readme_name = lc_readmes[common_readme_name]["orig_name"] readme_sha1 = lc_readmes[common_readme_name]["sha1"] readme_url = reverse( "browse-content-raw", url_args={"query_string": readme_sha1}, query_params={"re_encode": "true"}, ) break # otherwise pick the first readme like file if any if not readme_name and len(readmes.items()) > 0: readme_name = next(iter(readmes)) readme_sha1 = readmes[readme_name] readme_url = reverse( "browse-content-raw", url_args={"query_string": readme_sha1}, query_params={"re_encode": "true"}, ) # convert rst README to html server side as there is # no viable solution to perform that task client side if readme_name and readme_name.endswith(".rst"): @django_cache( catch_exception=True, exception_return_value="Readme bytes are not available", ) def _rst_readme_to_html(readme_sha1): rst_doc = request_content(readme_sha1) return rst_to_html(rst_doc["raw_data"]) readme_html = _rst_readme_to_html(readme_sha1) return readme_name, readme_url, readme_html diff --git a/swh/web/browse/views/content.py b/swh/web/browse/views/content.py index aa9428a2..9eb8dd70 100644 --- a/swh/web/browse/views/content.py +++ b/swh/web/browse/views/content.py @@ -1,461 +1,467 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import difflib from distutils.util import strtobool from typing import Any, Dict, Optional from django.http import HttpRequest, HttpResponse, JsonResponse from django.shortcuts import redirect, render from swh.model.hashutil import hash_to_hex from swh.model.swhids import ObjectType from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( content_display_max_size, gen_link, prepare_content_for_display, request_content, ) -from swh.web.common import archive, highlightjs, query -from swh.web.common.exc import ( +from swh.web.utils import ( + archive, + gen_path_info, + highlightjs, + query, + reverse, + swh_object_icons, +) +from swh.web.utils.exc import ( BadInputExc, NotFoundExc, http_status_code_message, sentry_capture_exception, ) -from swh.web.common.identifiers import get_swhids_info -from swh.web.common.typing import ContentMetadata, SWHObjectInfo -from swh.web.common.utils import gen_path_info, reverse, swh_object_icons +from swh.web.utils.identifiers import get_swhids_info +from swh.web.utils.typing import ContentMetadata, SWHObjectInfo @browse_route( r"content/(?P[0-9a-z_:]*[0-9a-f]+)/raw/", view_name="browse-content-raw", checksum_args=["query_string"], ) def content_raw(request: HttpRequest, query_string: str) -> HttpResponse: """Django view that produces a raw display of a content identified by its hash value. The url that points to it is :http:get:`/browse/content/[(algo_hash):](hash)/raw/` """ re_encode = bool(strtobool(request.GET.get("re_encode", "false"))) algo, checksum = query.parse_hash(query_string) checksum = hash_to_hex(checksum) content_data = request_content(query_string, max_size=None, re_encode=re_encode) filename = request.GET.get("filename", None) if not filename: filename = "%s_%s" % (algo, checksum) if ( content_data["mimetype"].startswith("text/") or content_data["mimetype"] == "inode/x-empty" ): response = HttpResponse(content_data["raw_data"], content_type="text/plain") response["Content-disposition"] = "filename=%s" % filename else: response = HttpResponse( content_data["raw_data"], content_type="application/octet-stream" ) response["Content-disposition"] = "attachment; filename=%s" % filename return response _auto_diff_size_limit = 20000 @browse_route( r"content/(?P.*)/diff/(?P.*)/", view_name="diff-contents", ) def _contents_diff( request: HttpRequest, from_query_string: str, to_query_string: str ) -> HttpResponse: """ Browse endpoint used to compute unified diffs between two contents. Diffs are generated only if the two contents are textual. By default, diffs whose size are greater than 20 kB will not be generated. To force the generation of large diffs, the 'force' boolean query parameter must be used. Args: request: input django http request from_query_string: a string of the form "[ALGO_HASH:]HASH" where optional ALGO_HASH can be either ``sha1``, ``sha1_git``, ``sha256``, or ``blake2s256`` (default to ``sha1``) and HASH the hexadecimal representation of the hash value identifying the first content to_query_string: same as above for identifying the second content Returns: A JSON object containing the unified diff. """ diff_data = {} content_from = None content_to = None content_from_size = 0 content_to_size = 0 content_from_lines = [] content_to_lines = [] force_str = request.GET.get("force", "false") path = request.GET.get("path", None) language = "plaintext" force = bool(strtobool(force_str)) if from_query_string == to_query_string: diff_str = "File renamed without changes" else: try: text_diff = True if from_query_string: content_from = request_content(from_query_string, max_size=None) content_from_display_data = prepare_content_for_display( content_from["raw_data"], content_from["mimetype"], path ) language = content_from_display_data["language"] content_from_size = content_from["length"] if not ( content_from["mimetype"].startswith("text/") or content_from["mimetype"] == "inode/x-empty" ): text_diff = False if text_diff and to_query_string: content_to = request_content(to_query_string, max_size=None) content_to_display_data = prepare_content_for_display( content_to["raw_data"], content_to["mimetype"], path ) language = content_to_display_data["language"] content_to_size = content_to["length"] if not ( content_to["mimetype"].startswith("text/") or content_to["mimetype"] == "inode/x-empty" ): text_diff = False diff_size = abs(content_to_size - content_from_size) if not text_diff: diff_str = "Diffs are not generated for non textual content" language = "plaintext" elif not force and diff_size > _auto_diff_size_limit: diff_str = "Large diffs are not automatically computed" language = "plaintext" else: if content_from: content_from_lines = ( content_from["raw_data"].decode("utf-8").splitlines(True) ) if content_from_lines and content_from_lines[-1][-1] != "\n": content_from_lines[-1] += "[swh-no-nl-marker]\n" if content_to: content_to_lines = ( content_to["raw_data"].decode("utf-8").splitlines(True) ) if content_to_lines and content_to_lines[-1][-1] != "\n": content_to_lines[-1] += "[swh-no-nl-marker]\n" diff_lines = difflib.unified_diff(content_from_lines, content_to_lines) diff_str = "".join(list(diff_lines)[2:]) except Exception as exc: sentry_capture_exception(exc) diff_str = str(exc) diff_data["diff_str"] = diff_str diff_data["language"] = language return JsonResponse(diff_data) def _get_content_from_request(request: HttpRequest) -> Dict[str, Any]: path = request.GET.get("path") if path is None: raise BadInputExc("The path query parameter must be provided.") snapshot = request.GET.get("snapshot") or request.GET.get("snapshot_id") origin_url = request.GET.get("origin_url") if snapshot is None and origin_url is None: raise BadInputExc( "The origin_url or snapshot query parameters must be provided." ) visit_id = int(request.GET.get("visit_id", 0)) snapshot_context = get_snapshot_context( snapshot_id=snapshot, origin_url=origin_url, path=path, timestamp=request.GET.get("timestamp"), visit_id=visit_id or None, branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), browse_context="content", ) root_directory = snapshot_context["root_directory"] assert root_directory is not None # to keep mypy happy return archive.lookup_directory_with_path(root_directory, path) @browse_route( r"content/(?P[0-9a-z_:]*[0-9a-f]+)/", r"content/", view_name="browse-content", checksum_args=["query_string"], ) def content_display( request: HttpRequest, query_string: Optional[str] = None ) -> HttpResponse: """Django view that produces an HTML display of a content identified by its hash value. The URLs that points to it are :http:get:`/browse/content/[(algo_hash):](hash)/` :http:get:`/browse/content/` """ if query_string is None: # this case happens when redirected from origin/content or snapshot/content content_data = _get_content_from_request(request) return redirect( reverse( "browse-content", url_args={"query_string": f"sha1_git:{content_data['target']}"}, query_params=request.GET, ), ) algo, checksum = query.parse_hash(query_string) checksum = hash_to_hex(checksum) origin_url = request.GET.get("origin_url") selected_language = request.GET.get("language") if not origin_url: origin_url = request.GET.get("origin") snapshot_id = request.GET.get("snapshot") or request.GET.get("snapshot_id") path = request.GET.get("path") content_data = {} error_info: Dict[str, Any] = {"status_code": 200, "description": None} try: content_data = request_content(query_string) except NotFoundExc as e: error_info["status_code"] = 404 error_info["description"] = f"NotFoundExc: {str(e)}" snapshot_context = None if origin_url is not None or snapshot_id is not None: try: visit_id = int(request.GET.get("visit_id", 0)) snapshot_context = get_snapshot_context( origin_url=origin_url, snapshot_id=snapshot_id, timestamp=request.GET.get("timestamp"), visit_id=visit_id or None, branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), path=path, browse_context="content", ) except NotFoundExc as e: if str(e).startswith("Origin") and origin_url is not None: raw_cnt_url = reverse( "browse-content", url_args={"query_string": query_string} ) error_message = ( "The Software Heritage archive has a content " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the content " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_cnt_url)) ) raise NotFoundExc(error_message) else: raise e content = None language = None mimetype = None if content_data.get("raw_data") is not None: content_display_data = prepare_content_for_display( content_data["raw_data"], content_data["mimetype"], path ) content = content_display_data["content_data"] language = content_display_data["language"] mimetype = content_display_data["mimetype"] # Override language with user-selected language if selected_language is not None: language = selected_language available_languages = None if mimetype and "text/" in mimetype: available_languages = highlightjs.get_supported_languages() filename = None path_info = None directory_id = None root_dir = None if snapshot_context: root_dir = snapshot_context.get("root_directory") query_params = snapshot_context["query_params"] if snapshot_context else {} breadcrumbs = [] if path: split_path = path.split("/") root_dir = root_dir or split_path[0] filename = split_path[-1] if root_dir != path: path = path.replace(root_dir + "/", "") path = path[: -len(filename)] path_info = gen_path_info(path) query_params.pop("path", None) dir_url = reverse( "browse-directory", url_args={"sha1_git": root_dir}, query_params=query_params, ) breadcrumbs.append({"name": root_dir[:7], "url": dir_url}) for pi in path_info: query_params["path"] = pi["path"] dir_url = reverse( "browse-directory", url_args={"sha1_git": root_dir}, query_params=query_params, ) breadcrumbs.append({"name": pi["name"], "url": dir_url}) breadcrumbs.append({"name": filename, "url": ""}) if path and root_dir is not None and root_dir != path: dir_info = archive.lookup_directory_with_path(root_dir, path) directory_id = dir_info["target"] elif root_dir != path: directory_id = root_dir else: root_dir = None query_params = {"filename": filename} content_checksums = content_data.get("checksums", {}) content_url = reverse( "browse-content", url_args={"query_string": query_string}, ) content_raw_url = reverse( "browse-content-raw", url_args={"query_string": query_string}, query_params=query_params, ) content_metadata = ContentMetadata( object_type=ObjectType.CONTENT, object_id=content_checksums.get("sha1_git"), sha1=content_checksums.get("sha1"), sha1_git=content_checksums.get("sha1_git"), sha256=content_checksums.get("sha256"), blake2s256=content_checksums.get("blake2s256"), content_url=content_url, mimetype=content_data.get("mimetype", ""), encoding=content_data.get("encoding", ""), size=content_data.get("length", 0), language=content_data.get("language", ""), root_directory=root_dir, path=f"/{path}" if path else None, filename=filename or "", directory=directory_id, revision=None, release=None, snapshot=None, origin_url=origin_url, ) swh_objects = [] if content_checksums: swh_objects.append( SWHObjectInfo( object_type=ObjectType.CONTENT, object_id=content_checksums.get("sha1_git"), ) ) if directory_id: swh_objects.append( SWHObjectInfo(object_type=ObjectType.DIRECTORY, object_id=directory_id) ) if snapshot_context: if snapshot_context["revision_id"]: swh_objects.append( SWHObjectInfo( object_type=ObjectType.REVISION, object_id=snapshot_context["revision_id"], ) ) swh_objects.append( SWHObjectInfo( object_type=ObjectType.SNAPSHOT, object_id=snapshot_context["snapshot_id"], ) ) if snapshot_context["release_id"]: swh_objects.append( SWHObjectInfo( object_type=ObjectType.RELEASE, object_id=snapshot_context["release_id"], ) ) swhids_info = get_swhids_info( swh_objects, snapshot_context, extra_context=content_metadata, ) heading = "Content - %s" % content_checksums.get("sha1_git") if breadcrumbs: content_path = "/".join(bc["name"] for bc in breadcrumbs) heading += " - %s" % content_path return render( request, "browse/content.html", { "heading": heading, "swh_object_id": swhids_info[0]["swhid"] if swhids_info else "", "swh_object_name": "Content", "swh_object_metadata": content_metadata, "content": content, "content_size": content_data.get("length"), "max_content_size": content_display_max_size, "filename": filename, "encoding": content_data.get("encoding"), "mimetype": mimetype, "language": language, "available_languages": available_languages, "breadcrumbs": breadcrumbs, "top_right_link": { "url": content_raw_url, "icon": swh_object_icons["content"], "text": "Raw File", }, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": True, "swhids_info": swhids_info, "error_code": error_info["status_code"], "error_message": http_status_code_message.get(error_info["status_code"]), "error_description": error_info["description"], }, status=error_info["status_code"], ) diff --git a/swh/web/browse/views/directory.py b/swh/web/browse/views/directory.py index 124470e1..7cc22729 100644 --- a/swh/web/browse/views/directory.py +++ b/swh/web/browse/views/directory.py @@ -1,304 +1,303 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import os from typing import Any, Dict, Optional from django.http import HttpRequest, HttpResponse from django.shortcuts import redirect, render from swh.model.swhids import ObjectType from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import gen_link, get_directory_entries, get_readme_to_display -from swh.web.common import archive -from swh.web.common.exc import ( +from swh.web.utils import archive, gen_path_info, reverse, swh_object_icons +from swh.web.utils.exc import ( NotFoundExc, http_status_code_message, sentry_capture_exception, ) -from swh.web.common.identifiers import get_swhids_info -from swh.web.common.typing import DirectoryMetadata, SWHObjectInfo -from swh.web.common.utils import gen_path_info, reverse, swh_object_icons +from swh.web.utils.identifiers import get_swhids_info +from swh.web.utils.typing import DirectoryMetadata, SWHObjectInfo def _directory_browse( request: HttpRequest, sha1_git: str, path: Optional[str] = None ) -> HttpResponse: root_sha1_git = sha1_git dir_sha1_git: Optional[str] = sha1_git error_info: Dict[str, Any] = {"status_code": 200, "description": None} if path: try: dir_info = archive.lookup_directory_with_path(sha1_git, path) dir_sha1_git = dir_info["target"] except NotFoundExc as e: error_info["status_code"] = 404 error_info["description"] = f"NotFoundExc: {str(e)}" dir_sha1_git = None dirs, files = [], [] if dir_sha1_git is not None: dirs, files = get_directory_entries(dir_sha1_git) origin_url = request.GET.get("origin_url") if not origin_url: origin_url = request.GET.get("origin") snapshot_id = request.GET.get("snapshot") snapshot_context = None if origin_url is not None or snapshot_id is not None: try: snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=request.GET.get("revision"), path=path, ) except NotFoundExc as e: if str(e).startswith("Origin") and origin_url is not None: raw_dir_url = reverse( "browse-directory", url_args={"sha1_git": dir_sha1_git} ) error_message = ( "The Software Heritage archive has a directory " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the directory " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_dir_url)) ) raise NotFoundExc(error_message) else: raise e path_info = gen_path_info(path) query_params = snapshot_context["query_params"] if snapshot_context else {} breadcrumbs = [] breadcrumbs.append( { "name": root_sha1_git[:7], "url": reverse( "browse-directory", url_args={"sha1_git": root_sha1_git}, query_params={**query_params, "path": None}, ), } ) for pi in path_info: breadcrumbs.append( { "name": pi["name"], "url": reverse( "browse-directory", url_args={"sha1_git": root_sha1_git}, query_params={ **query_params, "path": pi["path"], }, ), } ) path = "" if path is None else (path + "/") for d in dirs: if d["type"] == "rev": d["url"] = reverse( "browse-revision", url_args={"sha1_git": d["target"]}, query_params=query_params, ) else: d["url"] = reverse( "browse-directory", url_args={"sha1_git": root_sha1_git}, query_params={ **query_params, "path": path + d["name"], }, ) sum_file_sizes = 0 readmes = {} for f in files: query_string = "sha1_git:" + f["target"] f["url"] = reverse( "browse-content", url_args={"query_string": query_string}, query_params={ **query_params, "path": root_sha1_git + "/" + path + f["name"], }, ) if f["length"] is not None: sum_file_sizes += f["length"] if f["name"].lower().startswith("readme"): readmes[f["name"]] = f["checksums"]["sha1"] readme_name, readme_url, readme_html = get_readme_to_display(readmes) dir_metadata = DirectoryMetadata( object_type=ObjectType.DIRECTORY, object_id=dir_sha1_git, directory=root_sha1_git, nb_files=len(files), nb_dirs=len(dirs), sum_file_sizes=sum_file_sizes, root_directory=root_sha1_git, path=f"/{path}" if path else None, revision=None, revision_found=None, release=None, snapshot=None, ) vault_cooking = { "directory_context": True, "directory_swhid": f"swh:1:dir:{dir_sha1_git}", "revision_context": False, "revision_swhid": None, } swh_objects = [ SWHObjectInfo(object_type=ObjectType.DIRECTORY, object_id=dir_sha1_git) ] if snapshot_context: if snapshot_context["revision_id"]: swh_objects.append( SWHObjectInfo( object_type=ObjectType.REVISION, object_id=snapshot_context["revision_id"], ) ) swh_objects.append( SWHObjectInfo( object_type=ObjectType.SNAPSHOT, object_id=snapshot_context["snapshot_id"], ) ) if snapshot_context["release_id"]: swh_objects.append( SWHObjectInfo( object_type=ObjectType.RELEASE, object_id=snapshot_context["release_id"], ) ) swhids_info = get_swhids_info(swh_objects, snapshot_context, dir_metadata) heading = "Directory - %s" % dir_sha1_git if breadcrumbs: dir_path = "/".join([bc["name"] for bc in breadcrumbs]) + "/" heading += " - %s" % dir_path top_right_link = None if ( snapshot_context is not None and not snapshot_context["is_empty"] and snapshot_context["revision_id"] is not None ): history_url = reverse( "browse-revision-log", url_args={"sha1_git": snapshot_context["revision_id"]}, query_params=query_params, ) top_right_link = { "url": history_url, "icon": swh_object_icons["revisions history"], "text": "History", } return render( request, "browse/directory.html", { "heading": heading, "swh_object_id": swhids_info[0]["swhid"], "swh_object_name": "Directory", "swh_object_metadata": dir_metadata, "dirs": dirs, "files": files, "breadcrumbs": breadcrumbs, "top_right_link": top_right_link, "readme_name": readme_name, "readme_url": readme_url, "readme_html": readme_html, "snapshot_context": snapshot_context, "vault_cooking": vault_cooking, "show_actions": True, "swhids_info": swhids_info, "error_code": error_info["status_code"], "error_message": http_status_code_message.get(error_info["status_code"]), "error_description": error_info["description"], }, status=error_info["status_code"], ) @browse_route( r"directory/(?P[0-9a-f]+)/", view_name="browse-directory", checksum_args=["sha1_git"], ) def directory_browse(request: HttpRequest, sha1_git: str) -> HttpResponse: """Django view for browsing the content of a directory identified by its sha1_git value. The url that points to it is :http:get:`/browse/directory/(sha1_git)/` """ return _directory_browse(request, sha1_git, request.GET.get("path")) @browse_route( r"directory/(?P[0-9a-f]+)/(?P.+)/", view_name="browse-directory-legacy", checksum_args=["sha1_git"], ) def directory_browse_legacy( request: HttpRequest, sha1_git: str, path: str ) -> HttpResponse: """Django view for browsing the content of a directory identified by its sha1_git value. The url that points to it is :http:get:`/browse/directory/(sha1_git)/(path)/` """ return _directory_browse(request, sha1_git, path) @browse_route( r"directory/resolve/content-path/(?P[0-9a-f]+)/", view_name="browse-directory-resolve-content-path", checksum_args=["sha1_git"], ) def _directory_resolve_content_path( request: HttpRequest, sha1_git: str ) -> HttpResponse: """ Internal endpoint redirecting to data url for a specific file path relative to a root directory. """ try: path = os.path.normpath(request.GET.get("path", "")) if not path.startswith("../"): dir_info = archive.lookup_directory_with_path(sha1_git, path) if dir_info["type"] == "file": sha1 = dir_info["checksums"]["sha1"] data_url = reverse( "browse-content-raw", url_args={"query_string": sha1} ) return redirect(data_url) except Exception as exc: sentry_capture_exception(exc) return HttpResponse(status=404) diff --git a/swh/web/browse/views/origin.py b/swh/web/browse/views/origin.py index 3f5ea8bc..63c4817b 100644 --- a/swh/web/browse/views/origin.py +++ b/swh/web/browse/views/origin.py @@ -1,332 +1,332 @@ # Copyright (C) 2021-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Any, Dict, List, Optional, cast from django.http import HttpRequest, HttpResponse from django.shortcuts import redirect, render from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import ( browse_snapshot_directory, get_snapshot_context, ) -from swh.web.common import archive -from swh.web.common.exc import BadInputExc -from swh.web.common.origin_visits import get_origin_visits -from swh.web.common.utils import ( +from swh.web.utils import ( + archive, format_utc_iso_date, parse_iso8601_date_to_utc, redirect_to_new_route, reverse, ) +from swh.web.utils.exc import BadInputExc +from swh.web.utils.origin_visits import get_origin_visits @browse_route( r"origin/directory/", view_name="browse-origin-directory", ) def origin_directory_browse(request: HttpRequest) -> HttpResponse: """Django view for browsing the content of a directory associated to an origin for a given visit. The URL that points to it is :http:get:`/browse/origin/directory/` """ return browse_snapshot_directory( request, origin_url=request.GET.get("origin_url"), snapshot_id=request.GET.get("snapshot"), timestamp=request.GET.get("timestamp"), path=request.GET.get("path"), ) @browse_route( r"origin/(?P.+)/visit/(?P.+)/directory/", r"origin/(?P.+)/visit/(?P.+)/directory/(?P.+)/", r"origin/(?P.+)/directory/(?P.+)/", r"origin/(?P.+)/directory/", view_name="browse-origin-directory-legacy", ) def origin_directory_browse_legacy( request: HttpRequest, origin_url: str, timestamp: Optional[str] = None, path: Optional[str] = None, ) -> HttpResponse: """Django view for browsing the content of a directory associated to an origin for a given visit. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/directory/[(path)/]` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/directory/[(path)/]` """ return browse_snapshot_directory( request, origin_url=origin_url, snapshot_id=request.GET.get("snapshot"), timestamp=timestamp, path=path, ) @browse_route( r"origin/content/", view_name="browse-origin-content", ) def origin_content_browse(request: HttpRequest) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/content` instead Django view that produces an HTML display of a content associated to an origin for a given visit. The URL that points to it is :http:get:`/browse/origin/content/` """ return redirect_to_new_route(request, "browse-content") @browse_route( r"origin/(?P.+)/visit/(?P.+)/content/(?P.+)/", r"origin/(?P.+)/content/(?P.+)/", r"origin/(?P.+)/content/", view_name="browse-origin-content-legacy", ) def origin_content_browse_legacy( request: HttpRequest, origin_url: str, path: Optional[str] = None, timestamp: Optional[str] = None, ) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/content` instead Django view that produces an HTML display of a content associated to an origin for a given visit. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/content/(path)/` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/content/(path)/` """ return redirect_to_new_route(request, "browse-content") @browse_route( r"origin/log/", view_name="browse-origin-log", ) def origin_log_browse(request: HttpRequest) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/snapshot/log` instead Django view that produces an HTML display of revisions history (aka the commit log) associated to a software origin. The URL that points to it is :http:get:`/browse/origin/log/` """ return redirect_to_new_route(request, "browse-snapshot-log") @browse_route( r"origin/(?P.+)/visit/(?P.+)/log/", r"origin/(?P.+)/log/", view_name="browse-origin-log-legacy", ) def origin_log_browse_legacy( request: HttpRequest, origin_url: str, timestamp: Optional[str] = None ) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/snapshot/log` instead Django view that produces an HTML display of revisions history (aka the commit log) associated to a software origin. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/log/` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/log/` """ return redirect_to_new_route( request, "browse-snapshot-log", ) @browse_route( r"origin/branches/", view_name="browse-origin-branches", ) def origin_branches_browse(request: HttpRequest) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/snapshot/branches` instead Django view that produces an HTML display of the list of branches associated to an origin for a given visit. The URL that points to it is :http:get:`/browse/origin/branches/` """ return redirect_to_new_route(request, "browse-snapshot-branches") @browse_route( r"origin/(?P.+)/visit/(?P.+)/branches/", r"origin/(?P.+)/branches/", view_name="browse-origin-branches-legacy", ) def origin_branches_browse_legacy( request: HttpRequest, origin_url: str, timestamp: Optional[str] = None ) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/snapshot/branches` instead Django view that produces an HTML display of the list of branches associated to an origin for a given visit. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/branches/` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/branches/` """ return redirect_to_new_route(request, "browse-snapshot-branches") @browse_route( r"origin/releases/", view_name="browse-origin-releases", ) def origin_releases_browse(request: HttpRequest) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/snapshot/releases` instead Django view that produces an HTML display of the list of releases associated to an origin for a given visit. The URL that points to it is :http:get:`/browse/origin/releases/` """ return redirect_to_new_route(request, "browse-snapshot-releases") @browse_route( r"origin/(?P.+)/visit/(?P.+)/releases/", r"origin/(?P.+)/releases/", view_name="browse-origin-releases-legacy", ) def origin_releases_browse_legacy( request: HttpRequest, origin_url: str, timestamp: Optional[str] = None ) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/snapshot/releases` instead Django view that produces an HTML display of the list of releases associated to an origin for a given visit. The URLs that point to it are :http:get:`/browse/origin/(origin_url)/releases/` and :http:get:`/browse/origin/(origin_url)/visit/(timestamp)/releases/` """ return redirect_to_new_route(request, "browse-snapshot-releases") def _origin_visits_browse( request: HttpRequest, origin_url: Optional[str] ) -> HttpResponse: if origin_url is None: raise BadInputExc("An origin URL must be provided as query parameter.") origin_info = archive.lookup_origin({"url": origin_url}) origin_visits = cast(List[Dict[str, Any]], get_origin_visits(origin_info)) snapshot_context = get_snapshot_context(origin_url=origin_url) for i, visit in enumerate(origin_visits): url_date = format_utc_iso_date(visit["date"], "%Y-%m-%dT%H:%M:%SZ") visit["formatted_date"] = format_utc_iso_date(visit["date"]) query_params = {"origin_url": origin_url, "timestamp": url_date} if i < len(origin_visits) - 1: if visit["date"] == origin_visits[i + 1]["date"]: query_params = {"visit_id": visit["visit"]} if i > 0: if visit["date"] == origin_visits[i - 1]["date"]: query_params = {"visit_id": visit["visit"]} snapshot = visit["snapshot"] if visit["snapshot"] else "" visit["url"] = reverse( "browse-origin-directory", query_params=query_params, ) if not snapshot: visit["snapshot"] = "" visit["date"] = parse_iso8601_date_to_utc(visit["date"]).timestamp() heading = "Origin visits - %s" % origin_url return render( request, "browse/origin-visits.html", { "heading": heading, "swh_object_name": "Visits", "swh_object_metadata": origin_info, "origin_visits": origin_visits, "origin_info": origin_info, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": False, }, ) @browse_route(r"origin/visits/", view_name="browse-origin-visits") def origin_visits_browse(request: HttpRequest) -> HttpResponse: """Django view that produces an HTML display of visits reporting for a given origin. The URL that points to it is :http:get:`/browse/origin/visits/`. """ return _origin_visits_browse(request, request.GET.get("origin_url")) @browse_route( r"origin/(?P.+)/visits/", view_name="browse-origin-visits-legacy" ) def origin_visits_browse_legacy(request: HttpRequest, origin_url: str) -> HttpResponse: """Django view that produces an HTML display of visits reporting for a given origin. The URL that points to it is :http:get:`/browse/origin/(origin_url)/visits/`. """ return _origin_visits_browse(request, origin_url) @browse_route(r"origin/", view_name="browse-origin") def origin_browse(request: HttpRequest) -> HttpResponse: """Django view that redirects to the display of the latest archived snapshot for a given software origin. """ last_snapshot_url = reverse( "browse-origin-directory", query_params=request.GET, ) return redirect(last_snapshot_url) @browse_route(r"origin/(?P.+)/", view_name="browse-origin-legacy") def origin_browse_legacy(request: HttpRequest, origin_url: str) -> HttpResponse: """Django view that redirects to the display of the latest archived snapshot for a given software origin. """ last_snapshot_url = reverse( "browse-origin-directory", query_params={"origin_url": origin_url, **request.GET}, ) return redirect(last_snapshot_url) diff --git a/swh/web/browse/views/release.py b/swh/web/browse/views/release.py index 16670106..d41d6a91 100644 --- a/swh/web/browse/views/release.py +++ b/swh/web/browse/views/release.py @@ -1,247 +1,246 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Optional from django.http import HttpRequest, HttpResponse from django.shortcuts import render from swh.model.swhids import ObjectType from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( gen_content_link, gen_directory_link, gen_link, gen_person_mail_link, gen_release_link, gen_revision_link, ) -from swh.web.common import archive -from swh.web.common.exc import NotFoundExc, sentry_capture_exception -from swh.web.common.identifiers import get_swhids_info -from swh.web.common.typing import ReleaseMetadata, SnapshotContext, SWHObjectInfo -from swh.web.common.utils import format_utc_iso_date, reverse +from swh.web.utils import archive, format_utc_iso_date, reverse +from swh.web.utils.exc import NotFoundExc, sentry_capture_exception +from swh.web.utils.identifiers import get_swhids_info +from swh.web.utils.typing import ReleaseMetadata, SnapshotContext, SWHObjectInfo @browse_route( r"release/(?P[0-9a-f]+)/", view_name="browse-release", checksum_args=["sha1_git"], ) def release_browse(request: HttpRequest, sha1_git: str) -> HttpResponse: """ Django view that produces an HTML display of a release identified by its id. The url that points to it is :http:get:`/browse/release/(sha1_git)/`. """ release = archive.lookup_release(sha1_git) snapshot_context: Optional[SnapshotContext] = None origin_info = None snapshot_id = request.GET.get("snapshot_id") if not snapshot_id: snapshot_id = request.GET.get("snapshot") origin_url = request.GET.get("origin_url") if not origin_url: origin_url = request.GET.get("origin") timestamp = request.GET.get("timestamp") visit_id = int(request.GET.get("visit_id", 0)) if origin_url: try: snapshot_context = get_snapshot_context( snapshot_id, origin_url, timestamp, visit_id or None, release_name=release["name"], ) except NotFoundExc as e: raw_rel_url = reverse("browse-release", url_args={"sha1_git": sha1_git}) error_message = ( "The Software Heritage archive has a release " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the release " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_rel_url)) ) if str(e).startswith("Origin"): raise NotFoundExc(error_message) else: raise e origin_info = snapshot_context["origin_info"] elif snapshot_id: snapshot_context = get_snapshot_context( snapshot_id, release_name=release["name"] ) if snapshot_context is not None: snapshot_id = snapshot_context.get("snapshot_id", None) release_metadata = ReleaseMetadata( object_type=ObjectType.RELEASE, object_id=sha1_git, release=sha1_git, author=release["author"]["fullname"] if release["author"] else "None", author_url=gen_person_mail_link(release["author"]) if release["author"] else "None", date=format_utc_iso_date(release["date"]), name=release["name"], synthetic=release["synthetic"], target=release["target"], target_type=release["target_type"], snapshot=snapshot_id, origin_url=origin_url, ) release_note_lines = [] if release["message"]: release_note_lines = release["message"].split("\n") swh_objects = [SWHObjectInfo(object_type=ObjectType.RELEASE, object_id=sha1_git)] vault_cooking = None rev_directory = None target_link = None if release["target_type"] == ObjectType.REVISION.name.lower(): target_link = gen_revision_link( release["target"], snapshot_context=snapshot_context, link_text=None, link_attrs=None, ) try: revision = archive.lookup_revision(release["target"]) rev_directory = revision["directory"] vault_cooking = { "directory_context": True, "directory_swhid": f"swh:1:dir:{rev_directory}", "revision_context": True, "revision_swhid": f"swh:1:rev:{release['target']}", } swh_objects.append( SWHObjectInfo( object_type=ObjectType.REVISION, object_id=release["target"] ) ) swh_objects.append( SWHObjectInfo(object_type=ObjectType.DIRECTORY, object_id=rev_directory) ) except Exception as exc: sentry_capture_exception(exc) elif release["target_type"] == ObjectType.DIRECTORY.name.lower(): target_link = gen_directory_link( release["target"], snapshot_context=snapshot_context, link_text=None, link_attrs=None, ) try: # check directory exists archive.lookup_directory(release["target"]) vault_cooking = { "directory_context": True, "directory_swhid": f"swh:1:dir:{release['target']}", "revision_context": False, "revision_swhid": None, } swh_objects.append( SWHObjectInfo( object_type=ObjectType.DIRECTORY, object_id=release["target"] ) ) except Exception as exc: sentry_capture_exception(exc) elif release["target_type"] == ObjectType.CONTENT.name.lower(): target_link = gen_content_link( release["target"], snapshot_context=snapshot_context, link_text=None, link_attrs=None, ) swh_objects.append( SWHObjectInfo(object_type=ObjectType.CONTENT, object_id=release["target"]) ) elif release["target_type"] == ObjectType.RELEASE.name.lower(): target_link = gen_release_link( release["target"], snapshot_context=snapshot_context, link_text=None, link_attrs=None, ) rev_directory_url = None if rev_directory is not None: if origin_info: rev_directory_url = reverse( "browse-origin-directory", query_params={ "origin_url": origin_info["url"], "release": release["name"], "snapshot": snapshot_id, }, ) elif snapshot_id: rev_directory_url = reverse( "browse-snapshot-directory", url_args={"snapshot_id": snapshot_id}, query_params={"release": release["name"]}, ) else: rev_directory_url = reverse( "browse-directory", url_args={"sha1_git": rev_directory} ) directory_link = None if rev_directory_url is not None: directory_link = gen_link(rev_directory_url, rev_directory) release["directory_link"] = directory_link release["target_link"] = target_link if snapshot_context: snapshot_id = snapshot_context["snapshot_id"] if snapshot_id: swh_objects.append( SWHObjectInfo(object_type=ObjectType.SNAPSHOT, object_id=snapshot_id) ) swhids_info = get_swhids_info(swh_objects, snapshot_context) note_header = "None" if len(release_note_lines) > 0: note_header = release_note_lines[0] release["note_header"] = note_header release["note_body"] = "\n".join(release_note_lines[1:]) heading = "Release - %s" % release["name"] if snapshot_context: context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading += " - %s" % context_found return render( request, "browse/release.html", { "heading": heading, "swh_object_id": swhids_info[0]["swhid"], "swh_object_name": "Release", "swh_object_metadata": release_metadata, "release": release, "snapshot_context": snapshot_context, "show_actions": True, "breadcrumbs": None, "vault_cooking": vault_cooking, "top_right_link": None, "swhids_info": swhids_info, }, ) diff --git a/swh/web/browse/views/revision.py b/swh/web/browse/views/revision.py index f2fd0af2..14a5820b 100644 --- a/swh/web/browse/views/revision.py +++ b/swh/web/browse/views/revision.py @@ -1,600 +1,600 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import hashlib import json import textwrap from typing import Any, Dict, List, Optional from django.http import HttpRequest, HttpResponse, JsonResponse from django.shortcuts import render from django.utils.safestring import mark_safe from swh.model.hashutil import hash_to_bytes from swh.model.swhids import CoreSWHID, ObjectType from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( content_display_max_size, format_log_entries, gen_link, gen_person_mail_link, gen_revision_url, get_directory_entries, get_readme_to_display, get_revision_log_url, prepare_content_for_display, request_content, ) -from swh.web.common import archive -from swh.web.common.exc import NotFoundExc, http_status_code_message -from swh.web.common.identifiers import get_swhids_info -from swh.web.common.typing import RevisionMetadata, SnapshotContext, SWHObjectInfo -from swh.web.common.utils import ( +from swh.web.utils import ( + archive, format_utc_iso_date, gen_path_info, reverse, swh_object_icons, ) +from swh.web.utils.exc import NotFoundExc, http_status_code_message +from swh.web.utils.identifiers import get_swhids_info +from swh.web.utils.typing import RevisionMetadata, SnapshotContext, SWHObjectInfo def _gen_content_url( revision: Dict[str, Any], query_string: str, path: str, snapshot_context: Optional[SnapshotContext], ) -> str: if snapshot_context: query_params = snapshot_context["query_params"] query_params["path"] = path query_params["revision"] = revision["id"] content_url = reverse("browse-origin-content", query_params=query_params) else: content_path = "%s/%s" % (revision["directory"], path) content_url = reverse( "browse-content", url_args={"query_string": query_string}, query_params={"path": content_path}, ) return content_url def _gen_diff_link(idx: int, diff_anchor: str, link_text: str) -> str: if idx < _max_displayed_file_diffs: return gen_link(diff_anchor, link_text) else: return link_text # TODO: put in conf _max_displayed_file_diffs = 1000 def _gen_revision_changes_list( revision: Dict[str, Any], changes: List[Dict[str, Any]], snapshot_context: Optional[SnapshotContext], ) -> str: """ Returns a HTML string describing the file changes introduced in a revision. As this string will be displayed in the browse revision view, links to adequate file diffs are also generated. Args: revision (str): hexadecimal representation of a revision identifier changes (list): list of file changes in the revision snapshot_context (dict): optional origin context used to reverse the content urls Returns: A string to insert in a revision HTML view. """ changes_msg = [] for i, change in enumerate(changes): hasher = hashlib.sha1() from_query_string = "" to_query_string = "" diff_id = "diff-" if change["from"]: from_query_string = "sha1_git:" + change["from"]["target"] diff_id += change["from"]["target"] + "-" + change["from_path"] diff_id += "-" if change["to"]: to_query_string = "sha1_git:" + change["to"]["target"] diff_id += change["to"]["target"] + change["to_path"] change["path"] = change["to_path"] or change["from_path"] url_args = { "from_query_string": from_query_string, "to_query_string": to_query_string, } query_params = {"path": change["path"]} change["diff_url"] = reverse( "diff-contents", url_args=url_args, query_params=query_params ) hasher.update(diff_id.encode("utf-8")) diff_id = hasher.hexdigest() change["id"] = diff_id diff_link = "#diff_" + diff_id if change["type"] == "modify": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) changes_msg.append( "modified: %s" % _gen_diff_link(i, diff_link, change["to_path"]) ) elif change["type"] == "insert": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) changes_msg.append( "new file: %s" % _gen_diff_link(i, diff_link, change["to_path"]) ) elif change["type"] == "delete": parent = archive.lookup_revision(revision["parents"][0]) change["content_url"] = _gen_content_url( parent, from_query_string, change["from_path"], snapshot_context ) changes_msg.append( "deleted: %s" % _gen_diff_link(i, diff_link, change["from_path"]) ) elif change["type"] == "rename": change["content_url"] = _gen_content_url( revision, to_query_string, change["to_path"], snapshot_context ) link_text = change["from_path"] + " → " + change["to_path"] changes_msg.append( "renamed: %s" % _gen_diff_link(i, diff_link, link_text) ) if not changes: changes_msg.append("No changes") return mark_safe("\n".join(changes_msg)) @browse_route( r"revision/(?P[0-9a-f]+)/diff/", view_name="diff-revision", checksum_args=["sha1_git"], ) def _revision_diff(request: HttpRequest, sha1_git: str) -> HttpResponse: """ Browse internal endpoint to compute revision diff """ revision = archive.lookup_revision(sha1_git) snapshot_context = None origin_url = request.GET.get("origin_url", None) if not origin_url: origin_url = request.GET.get("origin", None) timestamp = request.GET.get("timestamp", None) visit_id_str = request.GET.get("visit_id", None) visit_id = int(visit_id_str) if visit_id_str is not None else None if origin_url: snapshot_context = get_snapshot_context( origin_url=origin_url, timestamp=timestamp, visit_id=visit_id ) changes = archive.diff_revision(sha1_git) changes_msg = _gen_revision_changes_list(revision, changes, snapshot_context) diff_data = { "total_nb_changes": len(changes), "changes": changes[:_max_displayed_file_diffs], "changes_msg": changes_msg, } return JsonResponse(diff_data) NB_LOG_ENTRIES = 100 @browse_route( r"revision/(?P[0-9a-f]+)/log/", view_name="browse-revision-log", checksum_args=["sha1_git"], ) def revision_log_browse(request: HttpRequest, sha1_git: str) -> HttpResponse: """ Django view that produces an HTML display of the history log for a revision identified by its id. The url that points to it is :http:get:`/browse/revision/(sha1_git)/log/` """ origin_url = request.GET.get("origin_url") snapshot_id = request.GET.get("snapshot") snapshot_context = None if origin_url or snapshot_id: visit_id = int(request.GET.get("visit_id", 0)) snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=request.GET.get("timestamp"), visit_id=visit_id or None, branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=sha1_git, ) per_page = int(request.GET.get("per_page", NB_LOG_ENTRIES)) offset = int(request.GET.get("offset", 0)) revs_ordering = request.GET.get("revs_ordering", "committer_date") session_key = "rev_%s_log_ordering_%s" % (sha1_git, revs_ordering) rev_log_session = request.session.get(session_key, None) rev_log = [] revs_walker_state = None if rev_log_session: rev_log = rev_log_session["rev_log"] revs_walker_state = rev_log_session["revs_walker_state"] if len(rev_log) < offset + per_page: revs_walker = archive.get_revisions_walker( revs_ordering, sha1_git, max_revs=offset + per_page + 1, state=revs_walker_state, ) rev_log += [rev["id"] for rev in revs_walker] revs_walker_state = revs_walker.export_state() revs = rev_log[offset : offset + per_page] revision_log = archive.lookup_revision_multiple(revs) request.session[session_key] = { "rev_log": rev_log, "revs_walker_state": revs_walker_state, } revs_ordering = request.GET.get("revs_ordering", "") prev_log_url = None if len(rev_log) > offset + per_page: prev_log_url = reverse( "browse-revision-log", url_args={"sha1_git": sha1_git}, query_params={ "per_page": str(per_page), "offset": str(offset + per_page), "revs_ordering": revs_ordering or None, }, ) next_log_url = None if offset != 0: next_log_url = reverse( "browse-revision-log", url_args={"sha1_git": sha1_git}, query_params={ "per_page": str(per_page), "offset": str(offset - per_page), "revs_ordering": revs_ordering or None, }, ) revision_log_data = format_log_entries(revision_log, per_page) swh_rev_id = str( CoreSWHID(object_type=ObjectType.REVISION, object_id=hash_to_bytes(sha1_git)) ) return render( request, "browse/revision-log.html", { "heading": "Revision history", "swh_object_id": swh_rev_id, "swh_object_name": "Revisions history", "swh_object_metadata": None, "revision_log": revision_log_data, "revs_ordering": revs_ordering, "next_log_url": next_log_url, "prev_log_url": prev_log_url, "breadcrumbs": None, "top_right_link": None, "snapshot_context": snapshot_context, "vault_cooking": None, "show_actions": True, "swhids_info": None, }, ) @browse_route( r"revision/(?P[0-9a-f]+)/", view_name="browse-revision", checksum_args=["sha1_git"], ) def revision_browse(request: HttpRequest, sha1_git: str) -> HttpResponse: """ Django view that produces an HTML display of a revision identified by its id. The url that points to it is :http:get:`/browse/revision/(sha1_git)/`. """ revision = archive.lookup_revision(sha1_git) origin_info = None snapshot_context = None origin_url = request.GET.get("origin_url") if not origin_url: origin_url = request.GET.get("origin") timestamp = request.GET.get("timestamp") visit_id = int(request.GET.get("visit_id", 0)) snapshot_id = request.GET.get("snapshot_id") if not snapshot_id: snapshot_id = request.GET.get("snapshot") path = request.GET.get("path") dir_id = None dirs, files = [], [] content_data = {} if origin_url: try: snapshot_context = get_snapshot_context( snapshot_id=snapshot_id, origin_url=origin_url, timestamp=timestamp, visit_id=visit_id or None, branch_name=request.GET.get("branch"), release_name=request.GET.get("release"), revision_id=sha1_git, path=path, ) except NotFoundExc as e: raw_rev_url = reverse("browse-revision", url_args={"sha1_git": sha1_git}) error_message = ( "The Software Heritage archive has a revision " "with the hash you provided but the origin " "mentioned in your request appears broken: %s. " "Please check the URL and try again.\n\n" "Nevertheless, you can still browse the revision " "without origin information: %s" % (gen_link(origin_url), gen_link(raw_rev_url)) ) if str(e).startswith("Origin"): raise NotFoundExc(error_message) else: raise e origin_info = snapshot_context["origin_info"] snapshot_id = snapshot_context["snapshot_id"] elif snapshot_id: snapshot_context = get_snapshot_context(snapshot_id) error_info: Dict[str, Any] = {"status_code": 200, "description": None} if path: try: file_info = archive.lookup_directory_with_path(revision["directory"], path) if file_info["type"] == "dir": dir_id = file_info["target"] else: query_string = "sha1_git:" + file_info["target"] content_data = request_content(query_string) except NotFoundExc as e: error_info["status_code"] = 404 error_info["description"] = f"NotFoundExc: {str(e)}" else: dir_id = revision["directory"] if dir_id: path = "" if path is None else (path + "/") dirs, files = get_directory_entries(dir_id) revision_metadata = RevisionMetadata( object_type=ObjectType.REVISION, object_id=sha1_git, revision=sha1_git, author=revision["author"]["fullname"] if revision["author"] else "None", author_url=gen_person_mail_link(revision["author"]) if revision["author"] else "None", committer=revision["committer"]["fullname"] if revision["committer"] else "None", committer_url=gen_person_mail_link(revision["committer"]) if revision["committer"] else "None", committer_date=format_utc_iso_date(revision["committer_date"]), date=format_utc_iso_date(revision["date"]), directory=revision["directory"], merge=revision["merge"], metadata=json.dumps( revision["metadata"], sort_keys=True, indent=4, separators=(",", ": ") ), parents=revision["parents"], synthetic=revision["synthetic"], type=revision["type"], snapshot=snapshot_id, origin_url=origin_url, ) message_lines = ["None"] if revision["message"]: message_lines = revision["message"].split("\n") parents = [] for p in revision["parents"]: parent_url = gen_revision_url(p, snapshot_context) parents.append({"id": p, "url": parent_url}) path_info = gen_path_info(path) query_params = snapshot_context["query_params"] if snapshot_context else {} breadcrumbs = [] breadcrumbs.append( { "name": revision["directory"][:7], "url": reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ), } ) for pi in path_info: query_params["path"] = pi["path"] breadcrumbs.append( { "name": pi["name"], "url": reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ), } ) vault_cooking = { "directory_context": False, "directory_swhid": None, "revision_context": True, "revision_swhid": f"swh:1:rev:{sha1_git}", } swh_objects = [SWHObjectInfo(object_type=ObjectType.REVISION, object_id=sha1_git)] content = None content_size = None filename = None mimetype = None language = None readme_name = None readme_url = None readme_html = None readmes = {} extra_context = dict(revision_metadata) extra_context["path"] = f"/{path}" if path else None if content_data: breadcrumbs[-1]["url"] = None content_size = content_data["length"] mimetype = content_data["mimetype"] if content_data["raw_data"]: content_display_data = prepare_content_for_display( content_data["raw_data"], content_data["mimetype"], path ) content = content_display_data["content_data"] language = content_display_data["language"] mimetype = content_display_data["mimetype"] if path: filename = path_info[-1]["name"] query_params["filename"] = filename filepath = "/".join(pi["name"] for pi in path_info[:-1]) extra_context["path"] = f"/{filepath}/" if filepath else "/" extra_context["filename"] = filename top_right_link = { "url": reverse( "browse-content-raw", url_args={"query_string": query_string}, query_params={"filename": filename}, ), "icon": swh_object_icons["content"], "text": "Raw File", } swh_objects.append( SWHObjectInfo(object_type=ObjectType.CONTENT, object_id=file_info["target"]) ) else: for d in dirs: if d["type"] == "rev": d["url"] = reverse( "browse-revision", url_args={"sha1_git": d["target"]} ) else: query_params["path"] = path + d["name"] d["url"] = reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ) for f in files: query_params["path"] = path + f["name"] f["url"] = reverse( "browse-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ) if f["name"].lower().startswith("readme"): readmes[f["name"]] = f["checksums"]["sha1"] readme_name, readme_url, readme_html = get_readme_to_display(readmes) top_right_link = { "url": get_revision_log_url(sha1_git, snapshot_context), "icon": swh_object_icons["revisions history"], "text": "History", } vault_cooking["directory_context"] = True vault_cooking["directory_swhid"] = f"swh:1:dir:{dir_id}" swh_objects.append( SWHObjectInfo(object_type=ObjectType.DIRECTORY, object_id=dir_id) ) query_params.pop("path", None) diff_revision_url = reverse( "diff-revision", url_args={"sha1_git": sha1_git}, query_params=query_params, ) if snapshot_id: swh_objects.append( SWHObjectInfo(object_type=ObjectType.SNAPSHOT, object_id=snapshot_id) ) swhids_info = get_swhids_info(swh_objects, snapshot_context, extra_context) heading = "Revision - %s - %s" % ( sha1_git[:7], textwrap.shorten(message_lines[0], width=70), ) if snapshot_context: context_found = "snapshot: %s" % snapshot_context["snapshot_id"] if origin_info: context_found = "origin: %s" % origin_info["url"] heading += " - %s" % context_found return render( request, "browse/revision.html", { "heading": heading, "swh_object_id": swhids_info[0]["swhid"], "swh_object_name": "Revision", "swh_object_metadata": revision_metadata, "message_header": message_lines[0], "message_body": "\n".join(message_lines[1:]), "parents": parents, "snapshot_context": snapshot_context, "dirs": dirs, "files": files, "content": content, "content_size": content_size, "max_content_size": content_display_max_size, "filename": filename, "encoding": content_data.get("encoding"), "mimetype": mimetype, "language": language, "readme_name": readme_name, "readme_url": readme_url, "readme_html": readme_html, "breadcrumbs": breadcrumbs, "top_right_link": top_right_link, "vault_cooking": vault_cooking, "diff_revision_url": diff_revision_url, "show_actions": True, "swhids_info": swhids_info, "error_code": error_info["status_code"], "error_message": http_status_code_message.get(error_info["status_code"]), "error_description": error_info["description"], }, status=error_info["status_code"], ) diff --git a/swh/web/browse/views/snapshot.py b/swh/web/browse/views/snapshot.py index c02258f4..2e03568b 100644 --- a/swh/web/browse/views/snapshot.py +++ b/swh/web/browse/views/snapshot.py @@ -1,237 +1,237 @@ # Copyright (C) 2018-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Optional from django.http import HttpRequest, HttpResponse from django.shortcuts import redirect from swh.web.browse.browseurls import browse_route from swh.web.browse.snapshot_context import ( browse_snapshot_branches, browse_snapshot_directory, browse_snapshot_log, browse_snapshot_releases, get_snapshot_context, ) -from swh.web.common.exc import BadInputExc -from swh.web.common.utils import redirect_to_new_route, reverse +from swh.web.utils import redirect_to_new_route, reverse +from swh.web.utils.exc import BadInputExc def get_snapshot_from_request(request: HttpRequest) -> str: snapshot = request.GET.get("snapshot") if snapshot: return snapshot if request.GET.get("origin_url") is None: raise BadInputExc("An origin URL must be provided as a query parameter.") return get_snapshot_context( origin_url=request.GET.get("origin_url"), timestamp=request.GET.get("timestamp") )["snapshot_id"] @browse_route( r"snapshot/(?P[0-9a-f]+)/", view_name="browse-snapshot", checksum_args=["snapshot_id"], ) def snapshot_browse(request: HttpRequest, snapshot_id: str) -> HttpResponse: """Django view for browsing the content of a snapshot. The url that points to it is :http:get:`/browse/snapshot/(snapshot_id)/` """ browse_snapshot_url = reverse( "browse-snapshot-directory", url_args={"snapshot_id": snapshot_id}, query_params=request.GET.dict(), ) return redirect(browse_snapshot_url) @browse_route( r"snapshot/(?P[0-9a-f]+)/directory/", view_name="browse-snapshot-directory", checksum_args=["snapshot_id"], ) def snapshot_directory_browse(request: HttpRequest, snapshot_id: str) -> HttpResponse: """Django view for browsing the content of a directory collected in a snapshot. The URL that points to it is :http:get:`/browse/snapshot/(snapshot_id)/directory/` """ return browse_snapshot_directory( request, snapshot_id=snapshot_id, path=request.GET.get("path"), origin_url=request.GET.get("origin_url"), ) @browse_route( r"snapshot/(?P[0-9a-f]+)/directory/(?P.+)/", view_name="browse-snapshot-directory-legacy", checksum_args=["snapshot_id"], ) def snapshot_directory_browse_legacy( request: HttpRequest, snapshot_id: str, path: Optional[str] = None ) -> HttpResponse: """Django view for browsing the content of a directory collected in a snapshot. The URL that points to it is :http:get:`/browse/snapshot/(snapshot_id)/directory/(path)/` """ origin_url = request.GET.get("origin_url", None) if not origin_url: origin_url = request.GET.get("origin", None) return browse_snapshot_directory( request, snapshot_id=snapshot_id, path=path, origin_url=origin_url ) @browse_route( r"snapshot/(?P[0-9a-f]+)/content/", view_name="browse-snapshot-content", checksum_args=["snapshot_id"], ) def snapshot_content_browse(request: HttpRequest, snapshot_id: str) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/content` instead Django view that produces an HTML display of a content collected in a snapshot. The url that points to it is :http:get:`/browse/snapshot/(snapshot_id)/content/` """ return redirect_to_new_route(request, "browse-content") @browse_route( r"snapshot/(?P[0-9a-f]+)/content/(?P.+)/", view_name="browse-snapshot-content-legacy", checksum_args=["snapshot_id"], ) def snapshot_content_browse_legacy( request: HttpRequest, snapshot_id: str, path: str ) -> HttpResponse: """ This route is deprecated; use http:get:`/browse/content` instead Django view that produces an HTML display of a content collected in a snapshot. The url that points to it is :http:get:`/browse/snapshot/(snapshot_id)/content/(path)/` """ return redirect_to_new_route(request, "browse-content") @browse_route( r"snapshot/(?P[0-9a-f]+)/log/", r"snapshot/log/", view_name="browse-snapshot-log", checksum_args=["snapshot_id"], ) def snapshot_log_browse( request: HttpRequest, snapshot_id: Optional[str] = None ) -> HttpResponse: """Django view that produces an HTML display of revisions history (aka the commit log) collected in a snapshot. The URLs that point to it are :http:get:`/browse/snapshot/(snapshot_id)/log/` and :http:get:`/browse/snapshot/log/` """ if snapshot_id is None: # This case happens when redirected from /origin/log snapshot_id = get_snapshot_from_request(request) # Redirect to the same route with snapshot_id return redirect( reverse( "browse-snapshot-log", url_args={"snapshot_id": snapshot_id}, query_params=request.GET, ), ) return browse_snapshot_log( request, snapshot_id=snapshot_id, origin_url=request.GET.get("origin_url"), timestamp=request.GET.get("timestamp"), ) @browse_route( r"snapshot/(?P[0-9a-f]+)/branches/", r"snapshot/branches/", view_name="browse-snapshot-branches", checksum_args=["snapshot_id"], ) def snapshot_branches_browse( request: HttpRequest, snapshot_id: Optional[str] = None ) -> HttpResponse: """Django view that produces an HTML display of the list of branches collected in a snapshot. The URLs that point to it are :http:get:`/browse/snapshot/(snapshot_id)/branches/` and :http:get:`/browse/snapshot/branches/` """ if snapshot_id is None: # This case happens when redirected from /origin/branches snapshot_id = get_snapshot_from_request(request) # Redirect to the same route with the newest snapshot_id # for the given origin return redirect( reverse( "browse-snapshot-branches", url_args={"snapshot_id": snapshot_id}, query_params=request.GET, ), ) return browse_snapshot_branches( request, snapshot_id=snapshot_id, origin_url=request.GET.get("origin_url"), timestamp=request.GET.get("timestamp"), branch_name_include=request.GET.get("name_include"), ) @browse_route( r"snapshot/(?P[0-9a-f]+)/releases/", r"snapshot/releases/", view_name="browse-snapshot-releases", checksum_args=["snapshot_id"], ) def snapshot_releases_browse( request: HttpRequest, snapshot_id: Optional[str] = None ) -> HttpResponse: """Django view that produces an HTML display of the list of releases collected in a snapshot. The URLs that point to it are :http:get:`/browse/snapshot/(snapshot_id)/releases/` :http:get:`/browse/snapshot/releases/` """ if snapshot_id is None: # This case happens when redirected from /origin/releases snapshot_id = get_snapshot_from_request(request) # Redirect to the same route with the newest snapshot_id # for the given origin return redirect( reverse( "browse-snapshot-releases", url_args={"snapshot_id": snapshot_id}, query_params=request.GET, ), ) return browse_snapshot_releases( request, snapshot_id=snapshot_id, origin_url=request.GET.get("origin_url"), timestamp=request.GET.get("timestamp"), release_name_include=request.GET.get("name_include"), ) diff --git a/swh/web/common/utils.py b/swh/web/common/utils.py deleted file mode 100644 index 828900f7..00000000 --- a/swh/web/common/utils.py +++ /dev/null @@ -1,523 +0,0 @@ -# Copyright (C) 2017-2022 The Software Heritage developers -# See the AUTHORS file at the top-level directory of this distribution -# License: GNU Affero General Public License version 3, or any later version -# See top-level LICENSE file for more information - -from datetime import datetime, timezone -import functools -import os -import re -from typing import Any, Callable, Dict, List, Mapping, Optional -import urllib.parse - -from bs4 import BeautifulSoup -from docutils.core import publish_parts -import docutils.parsers.rst -import docutils.utils -from docutils.writers.html5_polyglot import HTMLTranslator, Writer -from iso8601 import ParseError, parse_date -from pkg_resources import get_distribution -from prometheus_client.registry import CollectorRegistry -import requests -from requests.auth import HTTPBasicAuth - -from django.conf import settings -from django.core.cache import cache -from django.core.cache.backends.base import DEFAULT_TIMEOUT -from django.http import HttpRequest, QueryDict -from django.shortcuts import redirect -from django.urls import resolve -from django.urls import reverse as django_reverse - -from swh.web.auth.utils import ( - ADD_FORGE_MODERATOR_PERMISSION, - ADMIN_LIST_DEPOSIT_PERMISSION, - MAILMAP_ADMIN_PERMISSION, -) -from swh.web.common.exc import BadInputExc, sentry_capture_exception -from swh.web.config import SWH_WEB_SERVER_NAME, get_config, search - -SWH_WEB_METRICS_REGISTRY = CollectorRegistry(auto_describe=True) - -SWHID_RE = "swh:1:[a-z]{3}:[0-9a-z]{40}" - - -swh_object_icons = { - "alias": "mdi mdi-star", - "branch": "mdi mdi-source-branch", - "branches": "mdi mdi-source-branch", - "content": "mdi mdi-file-document", - "cnt": "mdi mdi-file-document", - "directory": "mdi mdi-folder", - "dir": "mdi mdi-folder", - "origin": "mdi mdi-source-repository", - "ori": "mdi mdi-source-repository", - "person": "mdi mdi-account", - "revisions history": "mdi mdi-history", - "release": "mdi mdi-tag", - "rel": "mdi mdi-tag", - "releases": "mdi mdi-tag", - "revision": "mdi mdi-rotate-90 mdi-source-commit", - "rev": "mdi mdi-rotate-90 mdi-source-commit", - "snapshot": "mdi mdi-camera", - "snp": "mdi mdi-camera", - "visits": "mdi mdi-calendar-month", -} - - -def reverse( - viewname: str, - url_args: Optional[Dict[str, Any]] = None, - query_params: Optional[Mapping[str, Optional[str]]] = None, - current_app: Optional[str] = None, - urlconf: Optional[str] = None, - request: Optional[HttpRequest] = None, -) -> str: - """An override of django reverse function supporting query parameters. - - Args: - viewname: the name of the django view from which to compute a url - url_args: dictionary of url arguments indexed by their names - query_params: dictionary of query parameters to append to the - reversed url - current_app: the name of the django app tighten to the view - urlconf: url configuration module - request: build an absolute URI if provided - - Returns: - str: the url of the requested view with processed arguments and - query parameters - """ - - if url_args: - url_args = {k: v for k, v in url_args.items() if v is not None} - - url = django_reverse( - viewname, urlconf=urlconf, kwargs=url_args, current_app=current_app - ) - - params: Dict[str, str] = {} - if query_params: - params = {k: v for k, v in query_params.items() if v is not None} - - if params: - query_dict = QueryDict("", mutable=True) - query_dict.update(dict(sorted(params.items()))) - url += "?" + query_dict.urlencode(safe="/;:") - - if request is not None: - url = request.build_absolute_uri(url) - - return url - - -def datetime_to_utc(date): - """Returns datetime in UTC without timezone info - - Args: - date (datetime.datetime): input datetime with timezone info - - Returns: - datetime.datetime: datetime in UTC without timezone info - """ - if date.tzinfo and date.tzinfo != timezone.utc: - return date.astimezone(tz=timezone.utc) - else: - return date - - -def parse_iso8601_date_to_utc(iso_date: str) -> datetime: - """Given an ISO 8601 datetime string, parse the result as UTC datetime. - - Returns: - a timezone-aware datetime representing the parsed date - - Raises: - swh.web.common.exc.BadInputExc: provided date does not respect ISO 8601 format - - Samples: - - 2016-01-12 - - 2016-01-12T09:19:12+0100 - - 2007-01-14T20:34:22Z - - """ - try: - date = parse_date(iso_date) - return datetime_to_utc(date) - except ParseError as e: - raise BadInputExc(e) - - -def shorten_path(path): - """Shorten the given path: for each hash present, only return the first - 8 characters followed by an ellipsis""" - - sha256_re = r"([0-9a-f]{8})[0-9a-z]{56}" - sha1_re = r"([0-9a-f]{8})[0-9a-f]{32}" - - ret = re.sub(sha256_re, r"\1...", path) - return re.sub(sha1_re, r"\1...", ret) - - -def format_utc_iso_date(iso_date, fmt="%d %B %Y, %H:%M:%S UTC"): - """Turns a string representation of an ISO 8601 datetime string - to UTC and format it into a more human readable one. - - For instance, from the following input - string: '2017-05-04T13:27:13+02:00' the following one - is returned: '04 May 2017, 11:27 UTC'. - Custom format string may also be provided - as parameter - - Args: - iso_date (str): a string representation of an ISO 8601 date - fmt (str): optional date formatting string - - Returns: - str: a formatted string representation of the input iso date - """ - if not iso_date: - return iso_date - date = parse_iso8601_date_to_utc(iso_date) - return date.strftime(fmt) - - -def gen_path_info(path): - """Function to generate path data navigation for use - with a breadcrumb in the swh web ui. - - For instance, from a path /folder1/folder2/folder3, - it returns the following list:: - - [{'name': 'folder1', 'path': 'folder1'}, - {'name': 'folder2', 'path': 'folder1/folder2'}, - {'name': 'folder3', 'path': 'folder1/folder2/folder3'}] - - Args: - path: a filesystem path - - Returns: - list: a list of path data for navigation as illustrated above. - - """ - path_info = [] - if path: - sub_paths = path.strip("/").split("/") - path_from_root = "" - for p in sub_paths: - path_from_root += "/" + p - path_info.append({"name": p, "path": path_from_root.strip("/")}) - return path_info - - -def parse_rst(text, report_level=2): - """ - Parse a reStructuredText string with docutils. - - Args: - text (str): string with reStructuredText markups in it - report_level (int): level of docutils report messages to print - (1 info 2 warning 3 error 4 severe 5 none) - - Returns: - docutils.nodes.document: a parsed docutils document - """ - parser = docutils.parsers.rst.Parser() - components = (docutils.parsers.rst.Parser,) - settings = docutils.frontend.OptionParser( - components=components - ).get_default_values() - settings.report_level = report_level - document = docutils.utils.new_document("rst-doc", settings=settings) - parser.parse(text, document) - return document - - -def get_client_ip(request): - """ - Return the client IP address from an incoming HTTP request. - - Args: - request (django.http.HttpRequest): the incoming HTTP request - - Returns: - str: The client IP address - """ - x_forwarded_for = request.META.get("HTTP_X_FORWARDED_FOR") - if x_forwarded_for: - ip = x_forwarded_for.split(",")[0] - else: - ip = request.META.get("REMOTE_ADDR") - return ip - - -def is_swh_web_development(request: HttpRequest) -> bool: - """Indicate if we are running a development version of swh-web.""" - site_base_url = request.build_absolute_uri("/") - return any( - host in site_base_url for host in ("localhost", "127.0.0.1", "testserver") - ) - - -def is_swh_web_staging(request: HttpRequest) -> bool: - """Indicate if we are running a staging version of swh-web.""" - config = get_config() - site_base_url = request.build_absolute_uri("/") - return any( - server_name in site_base_url for server_name in config["staging_server_names"] - ) - - -def is_swh_web_production(request: HttpRequest) -> bool: - """Indicate if we are running the public production version of swh-web.""" - return SWH_WEB_SERVER_NAME in request.build_absolute_uri("/") - - -browsers_supported_image_mimes = set( - [ - "image/gif", - "image/png", - "image/jpeg", - "image/bmp", - "image/webp", - "image/svg", - "image/svg+xml", - ] -) - - -def context_processor(request): - """ - Django context processor used to inject variables - in all swh-web templates. - """ - config = get_config() - if ( - hasattr(request, "user") - and request.user.is_authenticated - and not hasattr(request.user, "backend") - ): - # To avoid django.template.base.VariableDoesNotExist errors - # when rendering templates when standard Django user is logged in. - request.user.backend = "django.contrib.auth.backends.ModelBackend" - - return { - "swh_object_icons": swh_object_icons, - "available_languages": None, - "swh_client_config": config["client_config"], - "oidc_enabled": bool(config["keycloak"]["server_url"]), - "browsers_supported_image_mimes": browsers_supported_image_mimes, - "keycloak": config["keycloak"], - "site_base_url": request.build_absolute_uri("/"), - "DJANGO_SETTINGS_MODULE": os.environ["DJANGO_SETTINGS_MODULE"], - "status": config["status"], - "swh_web_dev": is_swh_web_development(request), - "swh_web_staging": is_swh_web_staging(request), - "swh_web_prod": is_swh_web_production(request), - "swh_web_version": get_distribution("swh.web").version, - "iframe_mode": False, - "ADMIN_LIST_DEPOSIT_PERMISSION": ADMIN_LIST_DEPOSIT_PERMISSION, - "ADD_FORGE_MODERATOR_PERMISSION": ADD_FORGE_MODERATOR_PERMISSION, - "MAILMAP_ADMIN_PERMISSION": MAILMAP_ADMIN_PERMISSION, - "lang": "en", - "sidebar_state": request.COOKIES.get("sidebar-state", "expanded"), - "SWH_DJANGO_APPS": settings.SWH_DJANGO_APPS, - } - - -def resolve_branch_alias( - snapshot: Dict[str, Any], branch: Optional[Dict[str, Any]] -) -> Optional[Dict[str, Any]]: - """ - Resolve branch alias in snapshot content. - - Args: - snapshot: a full snapshot content - branch: a branch alias contained in the snapshot - Returns: - The real snapshot branch that got aliased. - """ - while branch and branch["target_type"] == "alias": - if branch["target"] in snapshot["branches"]: - branch = snapshot["branches"][branch["target"]] - else: - from swh.web.common import archive - - snp = archive.lookup_snapshot( - snapshot["id"], branches_from=branch["target"], branches_count=1 - ) - if snp and branch["target"] in snp["branches"]: - branch = snp["branches"][branch["target"]] - else: - branch = None - return branch - - -class _NoHeaderHTMLTranslator(HTMLTranslator): - """ - Docutils translator subclass to customize the generation of HTML - from reST-formatted docstrings - """ - - def __init__(self, document): - super().__init__(document) - self.body_prefix = [] - self.body_suffix = [] - - -_HTML_WRITER = Writer() -_HTML_WRITER.translator_class = _NoHeaderHTMLTranslator - - -def rst_to_html(rst: str) -> str: - """ - Convert reStructuredText document into HTML. - - Args: - rst: A string containing a reStructuredText document - - Returns: - Body content of the produced HTML conversion. - - """ - settings = { - "initial_header_level": 2, - "halt_level": 4, - "traceback": True, - "file_insertion_enabled": False, - "raw_enabled": False, - } - pp = publish_parts(rst, writer=_HTML_WRITER, settings_overrides=settings) - return f'
{pp["html_body"]}
' - - -def prettify_html(html: str) -> str: - """ - Prettify an HTML document. - - Args: - html: Input HTML document - - Returns: - The prettified HTML document - """ - return BeautifulSoup(html, "lxml").prettify() - - -def django_cache( - timeout: int = DEFAULT_TIMEOUT, - catch_exception: bool = False, - exception_return_value: Any = None, - invalidate_cache_pred: Callable[[Any], bool] = lambda val: False, -): - """Decorator to put the result of a function call in Django cache, - subsequent calls will directly return the cached value. - - Args: - timeout: The number of seconds value will be hold in cache - catch_exception: If :const:`True`, any thrown exception by - the decorated function will be caught and not reraised - exception_return_value: The value to return if previous - parameter is set to :const:`True` - invalidate_cache_pred: A predicate function enabling to - invalidate the cache under certain conditions, decorated - function will then be called again - - Returns: - The returned value of the decorated function for the specified - parameters - - """ - - def inner(func): - @functools.wraps(func) - def wrapper(*args, **kwargs): - func_args = args + (0,) + tuple(sorted(kwargs.items())) - cache_key = str(hash((func.__module__, func.__name__) + func_args)) - ret = cache.get(cache_key) - if ret is None or invalidate_cache_pred(ret): - try: - ret = func(*args, **kwargs) - except Exception as exc: - if catch_exception: - sentry_capture_exception(exc) - return exception_return_value - else: - raise - else: - cache.set(cache_key, ret, timeout=timeout) - return ret - - return wrapper - - return inner - - -def _deposits_list_url( - deposits_list_base_url: str, page_size: int, username: Optional[str] -) -> str: - params = {"page_size": str(page_size)} - if username is not None: - params["username"] = username - return f"{deposits_list_base_url}?{urllib.parse.urlencode(params)}" - - -def get_deposits_list(username: Optional[str] = None) -> List[Dict[str, Any]]: - """Return the list of software deposits using swh-deposit API""" - config = get_config()["deposit"] - private_api_url = config["private_api_url"].rstrip("/") + "/" - deposits_list_base_url = private_api_url + "deposits" - deposits_list_auth = HTTPBasicAuth( - config["private_api_user"], config["private_api_password"] - ) - - deposits_list_url = _deposits_list_url( - deposits_list_base_url, page_size=1, username=username - ) - - nb_deposits = requests.get( - deposits_list_url, auth=deposits_list_auth, timeout=30 - ).json()["count"] - - @django_cache(invalidate_cache_pred=lambda data: data["count"] != nb_deposits) - def _get_deposits_data(): - deposits_list_url = _deposits_list_url( - deposits_list_base_url, page_size=nb_deposits, username=username - ) - return requests.get( - deposits_list_url, - auth=deposits_list_auth, - timeout=30, - ).json() - - deposits_data = _get_deposits_data() - - return deposits_data["results"] - - -_origin_visit_types_cache_timeout = 24 * 60 * 60 # 24 hours - - -@django_cache( - timeout=_origin_visit_types_cache_timeout, - catch_exception=True, - exception_return_value=[], -) -def origin_visit_types() -> List[str]: - """Return the exhaustive list of visit types for origins - ingested into the archive. - """ - return sorted(search().visit_types_count().keys()) - - -def redirect_to_new_route(request, new_route, permanent=True): - """Redirect a request to another route with url args and query parameters - eg: /origin//log?path=test can be redirected as - /log?url=&path=test. This can be used to deprecate routes - """ - request_path = resolve(request.path_info) - args = {**request_path.kwargs, **request.GET.dict()} - return redirect( - reverse(new_route, query_params=args), - permanent=permanent, - ) diff --git a/swh/web/inbound_email/management/commands/process_inbound_email.py b/swh/web/inbound_email/management/commands/process_inbound_email.py index fd7445c3..386b56c2 100644 --- a/swh/web/inbound_email/management/commands/process_inbound_email.py +++ b/swh/web/inbound_email/management/commands/process_inbound_email.py @@ -1,72 +1,72 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import email import email.message import email.policy import logging import sys from typing import Callable from django.core.management.base import BaseCommand -from swh.web.common.exc import sentry_capture_exception from swh.web.inbound_email import signals +from swh.web.utils.exc import sentry_capture_exception logger = logging.getLogger(__name__) class Command(BaseCommand): help = "Process a new inbound email" def handle(self, *args, **options): raw_message = sys.stdin.buffer.read() try: message = email.message_from_bytes(raw_message, policy=email.policy.default) except Exception as exc: sentry_capture_exception(exc) self.handle_failed_message(raw_message) # XXX make sure having logging doesn't make postfix unhappy logger.exception("Could not convert email from bytes") return responses = signals.email_received.send_robust( sender=self.__class__, message=message ) handled = False for receiver, response in responses: if isinstance(response, Exception): sentry_capture_exception(response) self.handle_failing_receiver(message, receiver) logger.error( "Receiver produced the following exception", exc_info=response ) elif response is signals.EmailProcessingStatus.FAILED: self.handle_failing_receiver(message, receiver) elif response is signals.EmailProcessingStatus.PROCESSED: handled = True if not handled: self.handle_unhandled_message(message) def handle_failed_message(self, raw_message: bytes): # TODO: forward email as attachment for inspection logger.error("Failed message: %s", raw_message.decode("ascii", "replace")) def handle_failing_receiver( self, message: email.message.EmailMessage, receiver: Callable ): # TODO: forward email for inspection logger.error( "Failed receiver %s:%s; message: %s", receiver.__module__, receiver.__qualname__, str(message), ) def handle_unhandled_message(self, message: email.message.EmailMessage): # TODO: pass email through to a fallback alias? logger.error("Unhandled message: %s", str(message)) diff --git a/swh/web/misc/badges.py b/swh/web/misc/badges.py index 22dfe943..2e6006da 100644 --- a/swh/web/misc/badges.py +++ b/swh/web/misc/badges.py @@ -1,186 +1,185 @@ # Copyright (C) 2019-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from base64 import b64encode from typing import Optional, cast from pybadges import badge from django.contrib.staticfiles import finders from django.http import HttpRequest, HttpResponse from django.urls import re_path as url from swh.model.exceptions import ValidationError from swh.model.hashutil import hash_to_bytes, hash_to_hex from swh.model.swhids import CoreSWHID, ObjectType, QualifiedSWHID -from swh.web.common import archive -from swh.web.common.exc import BadInputExc, NotFoundExc -from swh.web.common.identifiers import parse_object_type, resolve_swhid -from swh.web.common.utils import reverse +from swh.web.utils import archive, reverse +from swh.web.utils.exc import BadInputExc, NotFoundExc +from swh.web.utils.identifiers import parse_object_type, resolve_swhid _orange = "#f36a24" _blue = "#0172b2" _red = "#cd5741" _swh_logo_data = None _badge_config = { "content": { "color": _blue, "title": "Archived source file", }, "directory": { "color": _blue, "title": "Archived source tree", }, "origin": { "color": _orange, "title": "Archived software repository", }, "release": { "color": _blue, "title": "Archived software release", }, "revision": { "color": _blue, "title": "Archived commit", }, "snapshot": { "color": _blue, "title": "Archived software repository snapshot", }, "error": {"color": _red, "title": "An error occurred when generating the badge"}, } def _get_logo_data() -> str: """ Get data-URI for Software Heritage SVG logo to embed it in the generated badges. """ global _swh_logo_data if _swh_logo_data is None: swh_logo_path = cast(str, finders.find("img/swh-logo-white.svg")) with open(swh_logo_path, "rb") as swh_logo_file: _swh_logo_data = "data:image/svg+xml;base64,%s" % b64encode( swh_logo_file.read() ).decode("ascii") return _swh_logo_data def _swh_badge( request: HttpRequest, object_type: str, object_id: str, object_swhid: Optional[str] = "", ) -> HttpResponse: """ Generate a Software Heritage badge for a given object type and id. Args: request: input http request object_type: The type of swh object to generate a badge for, either *content*, *directory*, *revision*, *release*, *origin* or *snapshot* object_id: The id of the swh object, either an url for origin type or a *sha1* for other object types object_swhid: If provided, the object SWHID will not be recomputed Returns: HTTP response with content type *image/svg+xml* containing the SVG badge data. If the provided parameters are invalid, HTTP 400 status code will be returned. If the object can not be found in the archive, HTTP 404 status code will be returned. """ left_text = "error" whole_link = None try: if object_type == "origin": archive.lookup_origin({"url": object_id}) right_text = "repository" whole_link = reverse( "browse-origin", query_params={"origin_url": object_id} ) else: # when SWHID is provided, object type and id will be parsed # from it if object_swhid: parsed_swhid = QualifiedSWHID.from_string(object_swhid) parsed_object_type = parsed_swhid.object_type object_id = hash_to_hex(parsed_swhid.object_id) swh_object = archive.lookup_object(parsed_swhid.object_type, object_id) # remove SWHID qualified if any for badge text right_text = str( CoreSWHID( object_type=parsed_swhid.object_type, object_id=parsed_swhid.object_id, ) ) object_type = parsed_swhid.object_type.name.lower() else: parsed_object_type = parse_object_type(object_type) right_text = str( CoreSWHID( object_type=parsed_object_type, object_id=hash_to_bytes(object_id), ) ) swh_object = archive.lookup_object(parsed_object_type, object_id) whole_link = resolve_swhid(str(right_text))["browse_url"] # use release name for badge text if parsed_object_type == ObjectType.RELEASE: right_text = "release %s" % swh_object["name"] left_text = "archived" except (BadInputExc, ValidationError): right_text = f'invalid {object_type if object_type else "object"} id' object_type = "error" except NotFoundExc: right_text = f'{object_type if object_type else "object"} not found' object_type = "error" badge_data = badge( left_text=left_text, right_text=right_text, right_color=_badge_config[object_type]["color"], whole_link=request.build_absolute_uri(whole_link), whole_title=_badge_config[object_type]["title"], logo=_get_logo_data(), embed_logo=True, ) return HttpResponse(badge_data, content_type="image/svg+xml") def _swh_badge_swhid(request: HttpRequest, object_swhid: str) -> HttpResponse: """ Generate a Software Heritage badge for a given object SWHID. Args: request (django.http.HttpRequest): input http request object_swhid (str): a SWHID of an archived object Returns: django.http.HttpResponse: An http response with content type *image/svg+xml* containing the SVG badge data. If any error occurs, a status code of 400 will be returned. """ return _swh_badge(request, "", "", object_swhid) urlpatterns = [ url( r"^badge/(?P[a-z]+)/(?P.+)/$", _swh_badge, name="swh-badge", ), url( r"^badge/(?Pswh:[0-9]+:[a-z]+:[0-9a-f]+.*)/$", _swh_badge_swhid, name="swh-badge-swhid", ), ] diff --git a/swh/web/misc/coverage.py b/swh/web/misc/coverage.py index 82c92179..9c28d0a7 100644 --- a/swh/web/misc/coverage.py +++ b/swh/web/misc/coverage.py @@ -1,501 +1,501 @@ # Copyright (C) 2018-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from collections import Counter, defaultdict from typing import Any, Dict, List, Tuple from urllib.parse import urlparse from django.http.request import HttpRequest from django.http.response import HttpResponse from django.shortcuts import render from django.urls import re_path as url from django.views.decorators.cache import never_cache from django.views.decorators.clickjacking import xframe_options_exempt from swh.scheduler.model import SchedulerMetrics -from swh.web.common import archive -from swh.web.common.utils import ( +from swh.web.config import scheduler +from swh.web.utils import ( + archive, django_cache, get_deposits_list, is_swh_web_development, is_swh_web_production, reverse, ) -from swh.web.config import scheduler _swh_arch_overview_doc = ( "https://docs.softwareheritage.org/devel/architecture/overview.html" ) # Current coverage list of the archive in a high level overview fashion, # categorized as follow: # - listed origins: origins discovered using a swh lister # - legacy: origins where public hosting service has closed # - deposited: origins coming from swh-deposit # # TODO: Store that list in a database table somewhere (swh-scheduler, swh-storage ?) # and retrieve it dynamically listed_origins: Dict[str, Any] = { "info": ( "These software origins get continuously discovered and archived using " f'the listers implemented by Software Heritage.' ), "origins": [ { "type": "bitbucket", "info_url": "https://bitbucket.org", "info": "public repositories from Bitbucket", "search_pattern": { "default": "https://bitbucket.org/", }, }, { "type": "cgit", "info_url": "https://git.zx2c4.com/cgit/about", "info": "public repositories from cgit instances", "search_pattern": { "default": "cgit", }, }, { "type": "CRAN", "info_url": "https://cran.r-project.org", "info": "source packages from The Comprehensive R Archive Network", "search_pattern": { "default": "https://cran.r-project.org/", }, }, { "type": "debian", "info_url": "https://www.debian.org", "info": "source packages from Debian and Debian-based distributions", "search_pattern": { "default": "deb://", }, }, { "type": "gitea", "info_url": "https://gitea.io", "info": "public repositories from Gitea instances", "search_pattern": { "default": "gitea", }, }, { "type": "github", "info_url": "https://github.com", "info": "public repositories from GitHub", "search_pattern": { "default": "https://github.com/", }, }, { "type": "gitlab", "info_url": "https://gitlab.com", "info": "public repositories from multiple GitLab instances", "search_pattern": { "default": "gitlab", }, }, { "type": "guix", "info_url": "https://guix.gnu.org", "info": "source code tarballs used to build the Guix package collection", "visit_types": ["nixguix"], "search_pattern": { "default": "https://guix.gnu.org/sources.json", }, }, { "type": "GNU", "info_url": "https://www.gnu.org", "info": "releases from the GNU project (as of August 2015)", "search_pattern": { "default": "gnu", }, }, { "type": "heptapod", "info_url": "https://heptapod.net/", "info": "public repositories from multiple Heptapod instances", "search_pattern": { "default": "heptapod", }, }, { "type": "launchpad", "info_url": "https://launchpad.net", "logo": "img/logos/launchpad.png", "info": "public repositories from Launchpad", "search_pattern": { "default": "launchpad.net/", }, }, { "type": "maven", "info_url": "https://maven.apache.org/", "info": "java source packages from maven repositories", "search_pattern": { "default": "maven", "cvs": "", "git": "", "hg": "", "svn": "", }, }, { "type": "nixos", "info_url": "https://nixos.org", "info": "source code tarballs used to build the Nix package collection", "visit_types": ["nixguix"], "search_pattern": { "default": ( "https://nix-community.github.io/nixpkgs-swh/sources-unstable.json" ) }, }, { "type": "npm", "info_url": "https://www.npmjs.com", "info": "public packages from the package registry for javascript", "search_pattern": { "default": "https://www.npmjs.com", }, }, { "type": "opam", "info_url": "https://opam.ocaml.org/", "info": "public packages from the source-based package manager for OCaml", "search_pattern": { "default": "opam+https://", }, }, { "type": "Packagist", "info_url": "https://packagist.org/", "info": "source code repositories referenced by The PHP Package Repository", "search_pattern": { "default": "", }, }, { "type": "phabricator", "info_url": "https://www.phacility.com/phabricator", "info": "public repositories from multiple Phabricator instances", "search_pattern": { "default": "phabricator", }, }, { "type": "pypi", "info_url": "https://pypi.org", "info": "source packages from the Python Package Index", "search_pattern": { "default": "https://pypi.org", }, }, { "type": "sourceforge", "info_url": "https://sourceforge.net", "info": "public repositories from SourceForge", "search_pattern": { "default": "code.sf.net", "bzr": "bzr.sourceforge.net", "cvs": "cvs.sourceforge.net", }, }, ], } legacy_origins: Dict[str, Any] = { "info": ( "Discontinued hosting services. Those origins have been archived " "by Software Heritage." ), "origins": [ { "type": "gitorious", "info_url": "https://en.wikipedia.org/wiki/Gitorious", "info": ( "public repositories from the former Gitorious code hosting service" ), "visit_types": ["git"], "search_pattern": "https://gitorious.org", "count": "122,014", }, { "type": "googlecode", "info_url": "https://code.google.com/archive", "info": ( "public repositories from the former Google Code project " "hosting service" ), "visit_types": ["git", "hg", "svn"], "search_pattern": "googlecode.com", "count": "790,026", }, { "type": "bitbucket", "info_url": "https://bitbucket.org", "info": "public repositories from Bitbucket", "search_pattern": "https://bitbucket.org/", "visit_types": ["hg"], "count": "336,795", }, ], } deposited_origins: Dict[str, Any] = { "info": ( "These origins are directly pushed into the archive by trusted partners " f'using the deposit service of Software Heritage.' ), "origins": [ { "type": "elife", "info_url": "https://elifesciences.org", "info": ( "research software source code associated to the articles " "eLife publishes" ), "search_pattern": "elife.stencila.io", "visit_types": ["deposit"], }, { "type": "hal", "info_url": "https://hal.archives-ouvertes.fr", "info": "scientific software source code deposited in the open archive HAL", "visit_types": ["deposit"], "search_pattern": "hal.archives-ouvertes.fr", }, { "type": "ipol", "info_url": "https://www.ipol.im", "info": "software artifacts associated to the articles IPOL publishes", "visit_types": ["deposit"], "search_pattern": "doi.org/10.5201", }, ], } _cache_timeout = 60 * 60 # one hour def _get_listers_metrics( cache_metrics: bool = False, ) -> Dict[str, List[Tuple[str, SchedulerMetrics]]]: """Returns scheduler metrics in the following mapping: Dict[lister_name, List[Tuple[instance_name, SchedulerMetrics]]] as a lister instance has one SchedulerMetrics object per visit type. """ @django_cache( timeout=_cache_timeout, catch_exception=True, exception_return_value={}, invalidate_cache_pred=lambda m: not cache_metrics, ) def _get_listers_metrics_internal(): listers_metrics = defaultdict(list) listers = scheduler().get_listers() scheduler_metrics = scheduler().get_metrics() for lister in listers: for metrics in filter( lambda m: m.lister_id == lister.id, scheduler_metrics ): listers_metrics[lister.name].append((lister.instance_name, metrics)) return listers_metrics return _get_listers_metrics_internal() def _get_deposits_netloc_counts(cache_counts: bool = False) -> Counter: """Return deposit counts per origin url network location.""" def _process_origin_url(origin_url): parsed_url = urlparse(origin_url) netloc = parsed_url.netloc # special treatment for doi.org netloc as it is not specific enough # for origins mapping if parsed_url.netloc == "doi.org": netloc += "/" + parsed_url.path.split("/")[1] return netloc @django_cache( timeout=_cache_timeout, catch_exception=True, exception_return_value=Counter(), invalidate_cache_pred=lambda m: not cache_counts, ) def _get_deposits_netloc_counts_internal(): netlocs = [] deposits = get_deposits_list() netlocs = [ _process_origin_url(d["origin_url"]) for d in deposits if d["status"] == "done" ] deposits_netloc_counts = Counter(netlocs) return deposits_netloc_counts return _get_deposits_netloc_counts_internal() def _get_nixguix_origins_count(origin_url: str, cache_count: bool = False) -> int: """Returns number of archived tarballs for NixOS, aka the number of branches in a dedicated origin in the archive. """ @django_cache( timeout=_cache_timeout, catch_exception=True, exception_return_value=0, invalidate_cache_pred=lambda m: not cache_count, ) def _get_nixguix_origins_count_internal(): snapshot = archive.lookup_latest_origin_snapshot(origin_url) if snapshot: snapshot_sizes = archive.lookup_snapshot_sizes(snapshot["id"]) nixguix_origins_count = snapshot_sizes["release"] else: nixguix_origins_count = 0 return nixguix_origins_count return _get_nixguix_origins_count_internal() def _search_url(query: str, visit_type: str) -> str: return reverse( "browse-search", query_params={ "q": query, "visit_type": visit_type, "with_visit": "true", "with_content": "true", }, ) @xframe_options_exempt @never_cache def _swh_coverage(request: HttpRequest) -> HttpResponse: use_cache = is_swh_web_production(request) listers_metrics = _get_listers_metrics(use_cache) for origins in listed_origins["origins"]: origins["count"] = "0" origins["instances"] = {} origins_type = origins["type"] # special processing for nixos/guix origins as there is no # scheduler metrics for those if origins_type in ("nixos", "guix"): count = _get_nixguix_origins_count( origins["search_pattern"]["default"], use_cache ) origins["count"] = f"{count:,}" origins["instances"][origins_type] = {"nixguix": {"count": count}} if origins_type not in listers_metrics: continue count_total = sum( [metrics.origins_enabled for _, metrics in listers_metrics[origins_type]] ) count_never_visited = sum( [ metrics.origins_never_visited for _, metrics in listers_metrics[origins_type] ] ) count = count_total - count_never_visited origins["count"] = f"{count:,}" origins["instances"] = defaultdict(dict) for instance, metrics in listers_metrics[origins_type]: instance_count = metrics.origins_enabled - metrics.origins_never_visited # no archived origins for that visit type, skip it if instance_count == 0: continue origins["instances"][instance].update( {metrics.visit_type: {"count": f"{instance_count:,}"}} ) origins["visit_types"] = list( set(origins["instances"][instance].keys()) | set(origins.get("visit_types", [])) ) if origins_type == "CRAN": origins["instances"]["cran"]["cran"] = {"count": origins["count"]} # defaultdict cannot be iterated in django template origins["instances"] = dict(origins["instances"]) for origins in listed_origins["origins"]: instances = origins["instances"] nb_instances = len(instances) for instance_name, visit_types in instances.items(): for visit_type in visit_types: search_url = "" if visit_type in origins["search_pattern"]: search_pattern = origins["search_pattern"][visit_type] elif nb_instances > 1: search_pattern = instance_name else: search_pattern = origins["search_pattern"]["default"] if search_pattern: search_url = _search_url(search_pattern, visit_type) visit_types[visit_type]["search_url"] = search_url # filter out origin types without archived origins on production and staging if not is_swh_web_development(request): listed_origins["origins"] = list( filter(lambda o: o["count"] != "0", listed_origins["origins"]) ) for origins in legacy_origins["origins"]: origins["search_urls"] = {} for visit_type in origins["visit_types"]: origins["search_urls"][visit_type] = _search_url( origins["search_pattern"], visit_type ) deposits_counts = _get_deposits_netloc_counts(use_cache) for origins in deposited_origins["origins"]: origins["count"] = "0" if origins["search_pattern"] in deposits_counts: origins["count"] = f"{deposits_counts[origins['search_pattern']]:,}" origins["search_urls"] = { "deposit": _search_url(origins["search_pattern"], "deposit") } focus = [] focus_param = request.GET.get("focus") if focus_param: focus = focus_param.split(",") return render( request, "misc/coverage.html", { "origins": { "Regular crawling": listed_origins, "Discontinued hosting": legacy_origins, "On demand archival": deposited_origins, }, "focus": focus, }, ) urlpatterns = [ url(r"^coverage/$", _swh_coverage, name="swh-coverage"), ] diff --git a/swh/web/misc/iframe.py b/swh/web/misc/iframe.py index cc48c28e..43a01f5b 100644 --- a/swh/web/misc/iframe.py +++ b/swh/web/misc/iframe.py @@ -1,340 +1,339 @@ # Copyright (C) 2021-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from typing import Any, Dict, List, Optional, Tuple from django.shortcuts import render from django.urls import re_path as url from django.views.decorators.clickjacking import xframe_options_exempt from swh.model.hashutil import hash_to_bytes from swh.model.swhids import CoreSWHID, ObjectType, QualifiedSWHID from swh.web.browse.snapshot_context import get_snapshot_context from swh.web.browse.utils import ( content_display_max_size, get_directory_entries, prepare_content_for_display, request_content, ) -from swh.web.common import archive -from swh.web.common.exc import BadInputExc, NotFoundExc, http_status_code_message -from swh.web.common.identifiers import get_swhid, get_swhids_info -from swh.web.common.typing import SnapshotContext, SWHObjectInfo -from swh.web.common.utils import gen_path_info, reverse +from swh.web.utils import archive, gen_path_info, reverse +from swh.web.utils.exc import BadInputExc, NotFoundExc, http_status_code_message +from swh.web.utils.identifiers import get_swhid, get_swhids_info +from swh.web.utils.typing import SnapshotContext, SWHObjectInfo def _get_content_rendering_data(cnt_swhid: QualifiedSWHID, path: str) -> Dict[str, Any]: content_data = request_content(f"sha1_git:{cnt_swhid.object_id.hex()}") content = None language = None mimetype = None if content_data.get("raw_data") is not None: content_display_data = prepare_content_for_display( content_data["raw_data"], content_data["mimetype"], path ) content = content_display_data["content_data"] language = content_display_data["language"] mimetype = content_display_data["mimetype"] return { "content": content, "content_size": content_data.get("length"), "max_content_size": content_display_max_size, "filename": path.split("/")[-1], "encoding": content_data.get("encoding"), "mimetype": mimetype, "language": language, } def _get_directory_rendering_data( dir_swhid: QualifiedSWHID, focus_swhid: QualifiedSWHID, path: str, ) -> Dict[str, Any]: dirs, files = get_directory_entries(dir_swhid.object_id.hex()) for d in dirs: if d["type"] == "rev": d["url"] = None else: dir_swhid = QualifiedSWHID( object_type=ObjectType.DIRECTORY, object_id=hash_to_bytes(d["target"]), origin=dir_swhid.origin, visit=dir_swhid.visit, anchor=dir_swhid.anchor, path=(path or "/") + d["name"] + "/", ) d["url"] = reverse( "swhid-iframe", url_args={"swhid": str(dir_swhid)}, query_params={"focus_swhid": str(focus_swhid)}, ) for f in files: object_id = hash_to_bytes(f["target"]) cnt_swhid = QualifiedSWHID( object_type=ObjectType.CONTENT, object_id=object_id, origin=dir_swhid.origin, visit=dir_swhid.visit, anchor=dir_swhid.anchor, path=(path or "/") + f["name"], lines=(focus_swhid.lines if object_id == focus_swhid.object_id else None), ) f["url"] = reverse( "swhid-iframe", url_args={"swhid": str(cnt_swhid)}, query_params={"focus_swhid": str(focus_swhid)}, ) return {"dirs": dirs, "files": files} def _get_breacrumbs_data( swhid: QualifiedSWHID, focus_swhid: QualifiedSWHID, path: str, snapshot_context: Optional[SnapshotContext] = None, ) -> Tuple[List[Dict[str, Any]], Optional[str]]: breadcrumbs = [] filename = None # strip any leading or trailing slash from path qualifier of SWHID if path and path[0] == "/": path = path[1:] if path and path[-1] == "/": path = path[:-1] if swhid.object_type == ObjectType.CONTENT: split_path = path.split("/") filename = split_path[-1] path = path[: -len(filename)] path_info = gen_path_info(path) if path != "/" else [] root_dir = None if snapshot_context and snapshot_context["root_directory"]: root_dir = snapshot_context["root_directory"] elif swhid.anchor and swhid.anchor.object_type == ObjectType.DIRECTORY: root_dir = swhid.anchor.object_id.hex() elif focus_swhid.object_type == ObjectType.DIRECTORY: root_dir = focus_swhid.object_id.hex() if root_dir: root_dir_swhid = QualifiedSWHID( object_type=ObjectType.DIRECTORY, object_id=hash_to_bytes(root_dir), origin=swhid.origin, visit=swhid.visit, anchor=swhid.anchor, ) breadcrumbs.append( { "name": root_dir[:7], "object_id": root_dir_swhid.object_id.hex(), "path": "/", "url": reverse( "swhid-iframe", url_args={"swhid": str(root_dir_swhid)}, query_params={ "focus_swhid": str(focus_swhid) if focus_swhid != root_dir_swhid else None }, ), } ) for pi in path_info: dir_info = archive.lookup_directory_with_path(root_dir, pi["path"]) dir_swhid = QualifiedSWHID( object_type=ObjectType.DIRECTORY, object_id=hash_to_bytes(dir_info["target"]), origin=swhid.origin, visit=swhid.visit, anchor=swhid.anchor, path="/" + pi["path"] + "/", ) breadcrumbs.append( { "name": pi["name"], "object_id": dir_swhid.object_id.hex(), "path": dir_swhid.path.decode("utf-8") if dir_swhid.path else "", "url": reverse( "swhid-iframe", url_args={"swhid": str(dir_swhid)}, query_params={"focus_swhid": str(focus_swhid)}, ), } ) if filename: breadcrumbs.append( { "name": filename, "object_id": swhid.object_id.hex(), "path": path, "url": "", } ) return breadcrumbs, root_dir @xframe_options_exempt def swhid_iframe(request, swhid: str): """Django view that can be embedded in an iframe to display objects archived by Software Heritage (currently contents and directories) in a minimalist Web UI. """ focus_swhid = request.GET.get("focus_swhid", swhid) parsed_swhid = None view_data = {} breadcrumbs: List[Dict[str, Any]] = [] swh_objects = [] snapshot_context = None swhids_info_extra_context = {} archive_link = None try: parsed_swhid = get_swhid(swhid) parsed_focus_swhid = get_swhid(focus_swhid) path = parsed_swhid.path.decode("utf-8") if parsed_swhid.path else "" snapshot_context = None revision_id = None if ( parsed_swhid.anchor and parsed_swhid.anchor.object_type == ObjectType.REVISION ): revision_id = parsed_swhid.anchor.object_id.hex() if parsed_swhid.origin or parsed_swhid.visit: snapshot_context = get_snapshot_context( origin_url=parsed_swhid.origin, snapshot_id=parsed_swhid.visit.object_id.hex() if parsed_swhid.visit else None, revision_id=revision_id, ) error_info: Dict[str, Any] = {"status_code": 200, "description": ""} if parsed_swhid and parsed_swhid.object_type == ObjectType.CONTENT: view_data = _get_content_rendering_data(parsed_swhid, path) swh_objects.append( SWHObjectInfo( object_type=ObjectType.CONTENT, object_id=parsed_swhid.object_id.hex(), ) ) elif parsed_swhid and parsed_swhid.object_type == ObjectType.DIRECTORY: view_data = _get_directory_rendering_data( parsed_swhid, parsed_focus_swhid, path ) swh_objects.append( SWHObjectInfo( object_type=ObjectType.DIRECTORY, object_id=parsed_swhid.object_id.hex(), ) ) elif parsed_swhid: error_info = { "status_code": 400, "description": ( f"Objects of type {parsed_swhid.object_type} are not supported" ), } swhids_info_extra_context["path"] = path if parsed_swhid and view_data: breadcrumbs, root_dir = _get_breacrumbs_data( parsed_swhid, parsed_focus_swhid, path, snapshot_context ) if parsed_swhid.object_type == ObjectType.CONTENT and len(breadcrumbs) > 1: swh_objects.append( SWHObjectInfo( object_type=ObjectType.DIRECTORY, object_id=breadcrumbs[-2]["object_id"], ) ) swhids_info_extra_context["path"] = breadcrumbs[-2]["path"] swhids_info_extra_context["filename"] = breadcrumbs[-1]["name"] if snapshot_context: swh_objects.append( SWHObjectInfo( object_type=ObjectType.REVISION, object_id=snapshot_context["revision_id"] or "", ) ) swh_objects.append( SWHObjectInfo( object_type=ObjectType.SNAPSHOT, object_id=snapshot_context["snapshot_id"] or "", ) ) archive_link = reverse("browse-swhid", url_args={"swhid": swhid}) if ( parsed_swhid.origin is None and parsed_swhid.visit is None and parsed_swhid.anchor is None and root_dir is not None ): # qualifier values cannot be used to get root directory from them, # we need to add it as anchor in the SWHID argument of the archive link root_dir_swhid = CoreSWHID( object_type=ObjectType.DIRECTORY, object_id=hash_to_bytes(root_dir) ) archive_swhid = QualifiedSWHID( object_type=parsed_swhid.object_type, object_id=parsed_swhid.object_id, path=parsed_swhid.path, anchor=root_dir_swhid, ) archive_link = reverse( "browse-swhid", url_args={"swhid": f"{archive_swhid}"}, ) except BadInputExc as e: error_info = {"status_code": 400, "description": f"BadInputExc: {str(e)}"} except NotFoundExc as e: error_info = {"status_code": 404, "description": f"NotFoundExc: {str(e)}"} except Exception as e: error_info = {"status_code": 500, "description": str(e)} return render( request, "misc/iframe.html", { **view_data, "iframe_mode": True, "object_type": parsed_swhid.object_type.value if parsed_swhid else None, "lines": parsed_swhid.lines if parsed_swhid else None, "breadcrumbs": breadcrumbs, "swhid": swhid, "focus_swhid": focus_swhid, "archive_link": archive_link, "error_code": error_info["status_code"], "error_message": http_status_code_message.get(error_info["status_code"]), "error_description": error_info["description"], "snapshot_context": None, "swhids_info": get_swhids_info( swh_objects, snapshot_context, swhids_info_extra_context ), }, status=error_info["status_code"], ) urlpatterns = [ url( r"^embed/(?Pswh:[0-9]+:[a-z]+:[0-9a-f]+.*)/$", swhid_iframe, name="swhid-iframe", ), ] diff --git a/swh/web/misc/metrics.py b/swh/web/misc/metrics.py index 95764aad..29acc9dc 100644 --- a/swh/web/misc/metrics.py +++ b/swh/web/misc/metrics.py @@ -1,21 +1,21 @@ # Copyright (C) 2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from prometheus_client.exposition import CONTENT_TYPE_LATEST, generate_latest from django.http import HttpResponse -from swh.web.common.utils import SWH_WEB_METRICS_REGISTRY from swh.web.save_code_now.origin_save import compute_save_requests_metrics +from swh.web.utils import SWH_WEB_METRICS_REGISTRY def prometheus_metrics(request): compute_save_requests_metrics() return HttpResponse( content=generate_latest(registry=SWH_WEB_METRICS_REGISTRY), content_type=CONTENT_TYPE_LATEST, ) diff --git a/swh/web/misc/urls.py b/swh/web/misc/urls.py index 683258c2..87d30106 100644 --- a/swh/web/misc/urls.py +++ b/swh/web/misc/urls.py @@ -1,120 +1,120 @@ # Copyright (C) 2019-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json import requests from django.conf.urls import include from django.contrib.staticfiles import finders from django.http import JsonResponse from django.shortcuts import render from django.urls import re_path as url from django.views.decorators.clickjacking import xframe_options_exempt -from swh.web.common import archive -from swh.web.common.exc import sentry_capture_exception from swh.web.config import get_config from swh.web.misc.metrics import prometheus_metrics +from swh.web.utils import archive +from swh.web.utils.exc import sentry_capture_exception def _jslicenses(request): jslicenses_file = finders.find("jssources/jslicenses.json") jslicenses_data = json.load(open(jslicenses_file)) jslicenses_data = sorted( jslicenses_data.items(), key=lambda item: item[0].split("/")[-1] ) return render(request, "misc/jslicenses.html", {"jslicenses_data": jslicenses_data}) def _stat_counters(request): stat_counters = archive.stat_counters() url = get_config()["history_counters_url"] stat_counters_history = {} try: response = requests.get(url, timeout=5) stat_counters_history = json.loads(response.text) except Exception as exc: sentry_capture_exception(exc) counters = { "stat_counters": stat_counters, "stat_counters_history": stat_counters_history, } return JsonResponse(counters) @xframe_options_exempt def hiring_banner(request): lang = request.GET.get("lang") return render( request, "misc/hiring-banner-iframe.html", { "lang": lang if lang else "en", }, ) urlpatterns = [ url(r"^", include("swh.web.misc.coverage")), url(r"^jslicenses/$", _jslicenses, name="jslicenses"), url(r"^stat_counters/$", _stat_counters, name="stat-counters"), url(r"^", include("swh.web.misc.badges")), url(r"^metrics/prometheus/$", prometheus_metrics, name="metrics-prometheus"), url(r"^", include("swh.web.misc.iframe")), url(r"^", include("swh.web.misc.fundraising")), url(r"^hiring/banner/$", hiring_banner, name="swh-hiring-banner"), ] # when running end to end tests through cypress, declare some extra # endpoints to provide input data for some of those tests if get_config()["e2e_tests_mode"]: from swh.web.tests.views import ( get_content_code_data_all_exts, get_content_code_data_all_filenames, get_content_code_data_by_ext, get_content_code_data_by_filename, get_content_other_data_by_ext, ) urlpatterns.append( url( r"^tests/data/content/code/extension/(?P.+)/$", get_content_code_data_by_ext, name="tests-content-code-extension", ) ) urlpatterns.append( url( r"^tests/data/content/other/extension/(?P.+)/$", get_content_other_data_by_ext, name="tests-content-other-extension", ) ) urlpatterns.append( url( r"^tests/data/content/code/extensions/$", get_content_code_data_all_exts, name="tests-content-code-extensions", ) ) urlpatterns.append( url( r"^tests/data/content/code/filename/(?P.+)/$", get_content_code_data_by_filename, name="tests-content-code-filename", ) ) urlpatterns.append( url( r"^tests/data/content/code/filenames/$", get_content_code_data_all_filenames, name="tests-content-code-filenames", ) ) diff --git a/swh/web/save_code_now/models.py b/swh/web/save_code_now/models.py index 52c0ea09..7381b00e 100644 --- a/swh/web/save_code_now/models.py +++ b/swh/web/save_code_now/models.py @@ -1,135 +1,135 @@ # Copyright (C) 2018-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.db import models -from swh.web.common.typing import SaveOriginRequestInfo +from swh.web.utils.typing import SaveOriginRequestInfo class SaveAuthorizedOrigin(models.Model): """ Model table holding origin urls authorized to be loaded into the archive. """ url = models.CharField(max_length=200, null=False) class Meta: app_label = "swh_web_save_code_now" db_table = "save_authorized_origin" indexes = [models.Index(fields=["url"])] def __str__(self): return self.url class SaveUnauthorizedOrigin(models.Model): """ Model table holding origin urls not authorized to be loaded into the archive. """ url = models.CharField(max_length=200, null=False) class Meta: app_label = "swh_web_save_code_now" db_table = "save_unauthorized_origin" indexes = [models.Index(fields=["url"])] def __str__(self): return self.url SAVE_REQUEST_ACCEPTED = "accepted" SAVE_REQUEST_REJECTED = "rejected" SAVE_REQUEST_PENDING = "pending" SAVE_REQUEST_STATUS = [ (SAVE_REQUEST_ACCEPTED, SAVE_REQUEST_ACCEPTED), (SAVE_REQUEST_REJECTED, SAVE_REQUEST_REJECTED), (SAVE_REQUEST_PENDING, SAVE_REQUEST_PENDING), ] SAVE_TASK_NOT_CREATED = "not created" SAVE_TASK_NOT_YET_SCHEDULED = "not yet scheduled" SAVE_TASK_SCHEDULED = "scheduled" SAVE_TASK_SUCCEEDED = "succeeded" SAVE_TASK_FAILED = "failed" SAVE_TASK_RUNNING = "running" SAVE_TASK_STATUS = [ (SAVE_TASK_NOT_CREATED, SAVE_TASK_NOT_CREATED), (SAVE_TASK_NOT_YET_SCHEDULED, SAVE_TASK_NOT_YET_SCHEDULED), (SAVE_TASK_SCHEDULED, SAVE_TASK_SCHEDULED), (SAVE_TASK_SUCCEEDED, SAVE_TASK_SUCCEEDED), (SAVE_TASK_FAILED, SAVE_TASK_FAILED), (SAVE_TASK_RUNNING, SAVE_TASK_RUNNING), ] VISIT_STATUS_CREATED = "created" VISIT_STATUS_ONGOING = "ongoing" VISIT_STATUS_FULL = "full" VISIT_STATUS_PARTIAL = "partial" VISIT_STATUS_NOT_FOUND = "not_found" VISIT_STATUS_FAILED = "failed" VISIT_STATUSES = [ (VISIT_STATUS_CREATED, VISIT_STATUS_CREATED), (VISIT_STATUS_ONGOING, VISIT_STATUS_ONGOING), (VISIT_STATUS_FULL, VISIT_STATUS_FULL), (VISIT_STATUS_PARTIAL, VISIT_STATUS_PARTIAL), (VISIT_STATUS_NOT_FOUND, VISIT_STATUS_NOT_FOUND), (VISIT_STATUS_FAILED, VISIT_STATUS_FAILED), ] class SaveOriginRequest(models.Model): """ Model table holding all the save origin requests issued by users. """ id = models.BigAutoField(primary_key=True) request_date = models.DateTimeField(auto_now_add=True) visit_type = models.CharField(max_length=200, null=False) visit_status = models.TextField(choices=VISIT_STATUSES, null=True) origin_url = models.CharField(max_length=200, null=False) status = models.TextField(choices=SAVE_REQUEST_STATUS, default=SAVE_REQUEST_PENDING) loading_task_id = models.IntegerField(default=-1) visit_date = models.DateTimeField(null=True) loading_task_status = models.TextField( choices=SAVE_TASK_STATUS, default=SAVE_TASK_NOT_CREATED ) # store ids of users that submitted the request as string list user_ids = models.TextField(null=True) note = models.TextField(null=True) class Meta: app_label = "swh_web_save_code_now" db_table = "save_origin_request" ordering = ["-id"] indexes = [models.Index(fields=["origin_url", "status"])] def to_dict(self) -> SaveOriginRequestInfo: """Map the request save model object to a json serializable dict. Returns: The corresponding SaveOriginRequetsInfo json serializable dict. """ visit_date = self.visit_date return SaveOriginRequestInfo( id=self.id, origin_url=self.origin_url, visit_type=self.visit_type, save_request_date=self.request_date.isoformat(), save_request_status=self.status, save_task_status=self.loading_task_status, visit_status=self.visit_status, visit_date=visit_date.isoformat() if visit_date else None, loading_task_id=self.loading_task_id, note=self.note, ) def __str__(self) -> str: return str(self.to_dict()) diff --git a/swh/web/save_code_now/origin_save.py b/swh/web/save_code_now/origin_save.py index 9fffd110..b2883f03 100644 --- a/swh/web/save_code_now/origin_save.py +++ b/swh/web/save_code_now/origin_save.py @@ -1,942 +1,941 @@ # Copyright (C) 2018-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from datetime import datetime, timedelta, timezone from functools import lru_cache from itertools import product import json import logging from typing import Any, Dict, List, Optional, Tuple from urllib.parse import urlparse from prometheus_client import Gauge import requests from django.core.exceptions import ObjectDoesNotExist, ValidationError from django.core.validators import URLValidator from django.db.models import Q, QuerySet from django.utils.html import escape from swh.scheduler.utils import create_oneshot_task_dict -from swh.web.common import archive -from swh.web.common.exc import ( - BadInputExc, - ForbiddenExc, - NotFoundExc, - sentry_capture_exception, -) -from swh.web.common.typing import OriginExistenceCheckInfo, SaveOriginRequestInfo -from swh.web.common.utils import SWH_WEB_METRICS_REGISTRY, parse_iso8601_date_to_utc from swh.web.config import get_config, scheduler from swh.web.save_code_now.models import ( SAVE_REQUEST_ACCEPTED, SAVE_REQUEST_PENDING, SAVE_REQUEST_REJECTED, SAVE_TASK_FAILED, SAVE_TASK_NOT_CREATED, SAVE_TASK_NOT_YET_SCHEDULED, SAVE_TASK_RUNNING, SAVE_TASK_SCHEDULED, SAVE_TASK_SUCCEEDED, VISIT_STATUS_CREATED, VISIT_STATUS_ONGOING, SaveAuthorizedOrigin, SaveOriginRequest, SaveUnauthorizedOrigin, ) +from swh.web.utils import SWH_WEB_METRICS_REGISTRY, archive, parse_iso8601_date_to_utc +from swh.web.utils.exc import ( + BadInputExc, + ForbiddenExc, + NotFoundExc, + sentry_capture_exception, +) +from swh.web.utils.typing import OriginExistenceCheckInfo, SaveOriginRequestInfo logger = logging.getLogger(__name__) # Number of days in the past to lookup for information MAX_THRESHOLD_DAYS = 30 # Non terminal visit statuses which needs updates NON_TERMINAL_STATUSES = [ VISIT_STATUS_CREATED, VISIT_STATUS_ONGOING, ] def get_origin_save_authorized_urls() -> List[str]: """ Get the list of origin url prefixes authorized to be immediately loaded into the archive (whitelist). Returns: list: The list of authorized origin url prefix """ return [origin.url for origin in SaveAuthorizedOrigin.objects.all()] def get_origin_save_unauthorized_urls() -> List[str]: """ Get the list of origin url prefixes forbidden to be loaded into the archive (blacklist). Returns: list: the list of unauthorized origin url prefix """ return [origin.url for origin in SaveUnauthorizedOrigin.objects.all()] def can_save_origin(origin_url: str, bypass_pending_review: bool = False) -> str: """ Check if a software origin can be saved into the archive. Based on the origin url, the save request will be either: * immediately accepted if the url is whitelisted * rejected if the url is blacklisted * put in pending state for manual review otherwise Args: origin_url (str): the software origin url to check Returns: str: the origin save request status, either **accepted**, **rejected** or **pending** """ # origin url may be blacklisted for url_prefix in get_origin_save_unauthorized_urls(): if origin_url.startswith(url_prefix): return SAVE_REQUEST_REJECTED # if the origin url is in the white list, it can be immediately saved for url_prefix in get_origin_save_authorized_urls(): if origin_url.startswith(url_prefix): return SAVE_REQUEST_ACCEPTED # otherwise, the origin url needs to be manually verified if the user # that submitted it does not have special permission if bypass_pending_review: # mark the origin URL as trusted in that case SaveAuthorizedOrigin.objects.get_or_create(url=origin_url) return SAVE_REQUEST_ACCEPTED else: return SAVE_REQUEST_PENDING # map visit type to scheduler task # TODO: do not hardcode the task name here (T1157) _visit_type_task = { "git": "load-git", "hg": "load-hg", "svn": "load-svn", "cvs": "load-cvs", "bzr": "load-bzr", } _visit_type_task_privileged = { "archives": "load-archive-files", } # map scheduler task status to origin save status _save_task_status = { "next_run_not_scheduled": SAVE_TASK_NOT_YET_SCHEDULED, "next_run_scheduled": SAVE_TASK_SCHEDULED, "completed": SAVE_TASK_SUCCEEDED, "disabled": SAVE_TASK_FAILED, } # map scheduler task_run status to origin save status _save_task_run_status = { "scheduled": SAVE_TASK_SCHEDULED, "started": SAVE_TASK_RUNNING, "eventful": SAVE_TASK_SUCCEEDED, "uneventful": SAVE_TASK_SUCCEEDED, "failed": SAVE_TASK_FAILED, "permfailed": SAVE_TASK_FAILED, "lost": SAVE_TASK_FAILED, } @lru_cache() def get_scheduler_load_task_types() -> List[str]: task_types = scheduler().get_task_types() return [t["type"] for t in task_types if t["type"].startswith("load")] def get_savable_visit_types_dict(privileged_user: bool = False) -> Dict: """Returned the supported task types the user has access to. Args: privileged_user: Flag to determine if all visit types should be returned or not. Default to False to only list unprivileged visit types. Returns: the dict of supported visit types for the user """ if privileged_user: task_types = {**_visit_type_task, **_visit_type_task_privileged} else: task_types = _visit_type_task # filter visit types according to scheduler load task types if available try: load_task_types = get_scheduler_load_task_types() return {k: v for k, v in task_types.items() if v in load_task_types} except Exception: return task_types def get_savable_visit_types(privileged_user: bool = False) -> List[str]: """Return the list of visit types the user can perform save requests on. Args: privileged_user: Flag to determine if all visit types should be returned or not. Default to False to only list unprivileged visit types. Returns: the list of saveable visit types """ return sorted(list(get_savable_visit_types_dict(privileged_user).keys())) def _check_visit_type_savable(visit_type: str, privileged_user: bool = False) -> None: visit_type_tasks = get_savable_visit_types(privileged_user) if visit_type not in visit_type_tasks: allowed_visit_types = ", ".join(visit_type_tasks) raise BadInputExc( f"Visit of type {visit_type} can not be saved! " f"Allowed types are the following: {allowed_visit_types}" ) _validate_url = URLValidator( schemes=["http", "https", "svn", "git", "rsync", "pserver", "ssh", "bzr"] ) def _check_origin_url_valid(origin_url: str) -> None: try: _validate_url(origin_url) except ValidationError: raise BadInputExc( f"The provided origin url ({escape(origin_url)}) is not valid!" ) parsed_url = urlparse(origin_url) if parsed_url.password not in (None, "", "anonymous"): raise BadInputExc( "The provided origin url contains a password and cannot be " "accepted for security reasons." ) def origin_exists(origin_url: str) -> OriginExistenceCheckInfo: """Check the origin url for existence. If it exists, extract some more useful information on the origin. """ resp = requests.head(origin_url, allow_redirects=True) exists = resp.ok content_length: Optional[int] = None last_modified: Optional[str] = None if exists: # Also process X-Archive-Orig-* headers in case the URL targets the # Internet Archive. size_ = resp.headers.get( "Content-Length", resp.headers.get("X-Archive-Orig-Content-Length") ) content_length = int(size_) if size_ else None try: date_str = resp.headers.get( "Last-Modified", resp.headers.get("X-Archive-Orig-Last-Modified", "") ) date = datetime.strptime(date_str, "%a, %d %b %Y %H:%M:%S %Z") last_modified = date.isoformat() except ValueError: # if not provided or not parsable as per the expected format, keep it None pass return OriginExistenceCheckInfo( origin_url=origin_url, exists=exists, last_modified=last_modified, content_length=content_length, ) def _check_origin_exists(url: str) -> OriginExistenceCheckInfo: """Ensure an URL exists, if not raise an explicit message.""" metadata = origin_exists(url) if not metadata["exists"]: raise BadInputExc(f"The provided url ({escape(url)}) does not exist!") return metadata def _get_visit_info_for_save_request( save_request: SaveOriginRequest, ) -> Tuple[Optional[datetime], Optional[str]]: """Retrieve visit information out of a save request Args: save_request: Input save origin request to retrieve information for. Returns: Tuple of (visit date, optional visit status) for such save request origin """ visit_date = None visit_status = None time_now = datetime.now(tz=timezone.utc) time_delta = time_now - save_request.request_date # stop trying to find a visit date one month after save request submission # as those requests to storage are expensive and associated loading task # surely ended up with errors if time_delta.days <= MAX_THRESHOLD_DAYS: origin = save_request.origin_url ovs = archive.origin_visit_find_by_date(origin, save_request.request_date) if ovs: visit_date = parse_iso8601_date_to_utc(ovs["date"]) visit_status = ovs["status"] return visit_date, visit_status def _check_visit_update_status( save_request: SaveOriginRequest, ) -> Tuple[Optional[datetime], Optional[str], Optional[str]]: """Given a save request, determine whether a save request was successful or failed. Args: save_request: Input save origin request to retrieve information for. Returns: Tuple of (optional visit date, optional visit status, optional save task status) for such save request origin """ visit_date, visit_status = _get_visit_info_for_save_request(save_request) loading_task_status = None if visit_date and visit_status in ("full", "partial"): # visit has been performed, mark the saving task as succeeded loading_task_status = SAVE_TASK_SUCCEEDED elif visit_status in ("created", "ongoing"): # visit is currently running loading_task_status = SAVE_TASK_RUNNING elif visit_status in ("not_found", "failed"): loading_task_status = SAVE_TASK_FAILED else: time_now = datetime.now(tz=timezone.utc) time_delta = time_now - save_request.request_date # consider the task as failed if it is still in scheduled state # 30 days after its submission if time_delta.days > MAX_THRESHOLD_DAYS: loading_task_status = SAVE_TASK_FAILED return visit_date, visit_status, loading_task_status def _compute_task_loading_status( task: Optional[Dict[str, Any]] = None, task_run: Optional[Dict[str, Any]] = None, ) -> Optional[str]: loading_task_status: Optional[str] = None # First determine the loading task status out of task information if task: loading_task_status = _save_task_status[task["status"]] if task_run: loading_task_status = _save_task_run_status[task_run["status"]] return loading_task_status def _update_save_request_info( save_request: SaveOriginRequest, task: Optional[Dict[str, Any]] = None, task_run: Optional[Dict[str, Any]] = None, ) -> SaveOriginRequestInfo: """Update save request information out of the visit status and fallback to the task and task_run information if the visit status is missing. Args: save_request: Save request task: Associated scheduler task information about the save request task_run: Most recent run occurrence of the associated task Returns: Summary of the save request information updated. """ must_save = False # To determine the save code now request's final status, the visit date must be set # and the visit status must be a final one. Once they do, the save code now is # definitely done. if ( not save_request.visit_date or not save_request.visit_status or save_request.visit_status in NON_TERMINAL_STATUSES ): visit_date, visit_status, loading_task_status = _check_visit_update_status( save_request ) if not loading_task_status: # fallback when not provided loading_task_status = _compute_task_loading_status(task, task_run) if visit_date != save_request.visit_date: must_save = True save_request.visit_date = visit_date if visit_status != save_request.visit_status: must_save = True save_request.visit_status = visit_status if ( loading_task_status is not None and loading_task_status != save_request.loading_task_status ): must_save = True save_request.loading_task_status = loading_task_status if must_save: save_request.save() return save_request.to_dict() def create_save_origin_request( visit_type: str, origin_url: str, privileged_user: bool = False, user_id: Optional[int] = None, **kwargs, ) -> SaveOriginRequestInfo: """Create a loading task to save a software origin into the archive. This function aims to create a software origin loading task through the use of the swh-scheduler component. First, some checks are performed to see if the visit type and origin url are valid but also if the the save request can be accepted. For the 'archives' visit type, this also ensures the artifacts actually exists. If those checks passed, the loading task is then created. Otherwise, the save request is put in pending or rejected state. All the submitted save requests are logged into the swh-web database to keep track of them. Args: visit_type: the type of visit to perform (e.g. git, hg, svn, archives, ...) origin_url: the url of the origin to save privileged: Whether the user has some more privilege than other (bypass review, access to privileged other visit types) user_id: User identifier (provided when authenticated) kwargs: Optional parameters (e.g. artifact_url, artifact_filename, artifact_version) Raises: BadInputExc: the visit type or origin url is invalid or inexistent ForbiddenExc: the provided origin url is blacklisted Returns: dict: A dict describing the save request with the following keys: * **visit_type**: the type of visit to perform * **origin_url**: the url of the origin * **save_request_date**: the date the request was submitted * **save_request_status**: the request status, either **accepted**, **rejected** or **pending** * **save_task_status**: the origin loading task status, either **not created**, **not yet scheduled**, **scheduled**, **succeed** or **failed** """ visit_type_tasks = get_savable_visit_types_dict(privileged_user) _check_visit_type_savable(visit_type, privileged_user) _check_origin_url_valid(origin_url) # if all checks passed so far, we can try and save the origin save_request_status = can_save_origin(origin_url, privileged_user) task = None # if the origin save request is accepted, create a scheduler # task to load it into the archive if save_request_status == SAVE_REQUEST_ACCEPTED: # create a task with high priority task_kwargs: Dict[str, Any] = { "priority": "high", "url": origin_url, } if visit_type == "archives": # extra arguments for that type are required archives_data = kwargs.get("archives_data", []) if not archives_data: raise BadInputExc( "Artifacts data are missing for the archives visit type." ) artifacts = [] for artifact in archives_data: artifact_url = artifact.get("artifact_url") artifact_version = artifact.get("artifact_version") if not artifact_url or not artifact_version: raise BadInputExc("Missing url or version for an artifact to load.") metadata = _check_origin_exists(artifact_url) artifacts.append( { "url": artifact_url, "version": artifact_version, "time": metadata["last_modified"], "length": metadata["content_length"], } ) task_kwargs = dict(**task_kwargs, artifacts=artifacts, snapshot_append=True) sor = None # get list of previously submitted save requests (most recent first) current_sors = list( SaveOriginRequest.objects.filter( visit_type=visit_type, origin_url=origin_url ).order_by("-request_date") ) can_create_task = False # if no save requests previously submitted, create the scheduler task if not current_sors: can_create_task = True else: # get the latest submitted save request sor = current_sors[0] # if it was in pending state, we need to create the scheduler task # and update the save request info in the database if sor.status == SAVE_REQUEST_PENDING: can_create_task = True # a task has already been created to load the origin elif sor.loading_task_id != -1: # get the scheduler task and its status tasks = scheduler().get_tasks([sor.loading_task_id]) task = tasks[0] if tasks else None task_runs = scheduler().get_task_runs([sor.loading_task_id]) task_run = task_runs[0] if task_runs else None save_request_info = _update_save_request_info(sor, task, task_run) task_status = save_request_info["save_task_status"] # create a new scheduler task only if the previous one has been # already executed if ( task_status == SAVE_TASK_FAILED or task_status == SAVE_TASK_SUCCEEDED ): can_create_task = True sor = None else: can_create_task = False if can_create_task: # effectively create the scheduler task task_dict = create_oneshot_task_dict( visit_type_tasks[visit_type], **task_kwargs ) task = scheduler().create_tasks([task_dict])[0] # pending save request has been accepted if sor: sor.status = SAVE_REQUEST_ACCEPTED sor.loading_task_id = task["id"] sor.save() else: sor = SaveOriginRequest.objects.create( visit_type=visit_type, origin_url=origin_url, status=save_request_status, loading_task_id=task["id"], user_ids=f'"{user_id}"' if user_id else None, ) # save request must be manually reviewed for acceptation elif save_request_status == SAVE_REQUEST_PENDING: # check if there is already such a save request already submitted, # no need to add it to the database in that case try: sor = SaveOriginRequest.objects.get( visit_type=visit_type, origin_url=origin_url, status=save_request_status ) user_ids = sor.user_ids if sor.user_ids is not None else "" if user_id is not None and f'"{user_id}"' not in user_ids: # update user ids list sor.user_ids = f'{sor.user_ids},"{user_id}"' sor.save() # if not add it to the database except ObjectDoesNotExist: sor = SaveOriginRequest.objects.create( visit_type=visit_type, origin_url=origin_url, status=save_request_status, user_ids=f'"{user_id}"' if user_id else None, ) # origin can not be saved as its url is blacklisted, # log the request to the database anyway else: sor = SaveOriginRequest.objects.create( visit_type=visit_type, origin_url=origin_url, status=save_request_status, user_ids=f'"{user_id}"' if user_id else None, ) if save_request_status == SAVE_REQUEST_REJECTED: raise ForbiddenExc( ( 'The "save code now" request has been rejected ' "because the provided origin url is blacklisted." ) ) assert sor is not None return _update_save_request_info(sor, task) def update_save_origin_requests_from_queryset( requests_queryset: QuerySet, ) -> List[SaveOriginRequestInfo]: """Update all save requests from a SaveOriginRequest queryset, update their status in db and return the list of impacted save_requests. Args: requests_queryset: input SaveOriginRequest queryset Returns: list: A list of save origin request info dicts as described in :func:`swh.web.save_code_now.origin_save.create_save_origin_request` """ task_ids = [] for sor in requests_queryset: task_ids.append(sor.loading_task_id) save_requests = [] if task_ids: try: tasks = scheduler().get_tasks(task_ids) tasks = {task["id"]: task for task in tasks} task_runs = scheduler().get_task_runs(tasks) task_runs = {task_run["task"]: task_run for task_run in task_runs} except Exception: # allow to avoid mocking api GET responses for /origin/save endpoint when # running cypress tests as scheduler is not available tasks = {} task_runs = {} for sor in requests_queryset: sr_dict = _update_save_request_info( sor, tasks.get(sor.loading_task_id), task_runs.get(sor.loading_task_id), ) save_requests.append(sr_dict) return save_requests def refresh_save_origin_request_statuses() -> List[SaveOriginRequestInfo]: """Refresh non-terminal save origin requests (SOR) in the backend. Non-terminal SOR are requests whose status is **accepted** and their task status are either **created**, **not yet scheduled**, **scheduled** or **running**. This shall compute this list of SOR, checks their status in the scheduler and optionally elasticsearch for their current status. Then update those in db. Finally, this returns the refreshed information on those SOR. """ pivot_date = datetime.now(tz=timezone.utc) - timedelta(days=MAX_THRESHOLD_DAYS) save_requests = SaveOriginRequest.objects.filter( # Retrieve accepted request statuses (all statuses) Q(status=SAVE_REQUEST_ACCEPTED), # those without the required information we need to update Q(visit_date__isnull=True) | Q(visit_status__isnull=True) | Q(visit_status__in=NON_TERMINAL_STATUSES), # limit results to recent ones (that is roughly 30 days old at best) Q(request_date__gte=pivot_date), ) return ( update_save_origin_requests_from_queryset(save_requests) if save_requests.count() > 0 else [] ) def get_save_origin_requests( visit_type: str, origin_url: str ) -> List[SaveOriginRequestInfo]: """ Get all save requests for a given software origin. Args: visit_type: the type of visit origin_url: the url of the origin Raises: BadInputExc: the visit type or origin url is invalid - swh.web.common.exc.NotFoundExc: no save requests can be found for the + swh.web.utils.exc.NotFoundExc: no save requests can be found for the given origin Returns: list: A list of save origin requests dict as described in :func:`swh.web.save_code_now.origin_save.create_save_origin_request` """ _check_visit_type_savable(visit_type) _check_origin_url_valid(origin_url) sors = SaveOriginRequest.objects.filter( visit_type=visit_type, origin_url=origin_url ) if sors.count() == 0: raise NotFoundExc( f"No save requests found for visit of type {visit_type} " f"on origin with url {origin_url}." ) return update_save_origin_requests_from_queryset(sors) def get_save_origin_task_info( save_request_id: int, full_info: bool = True ) -> Dict[str, Any]: """ Get detailed information about an accepted save origin request and its associated loading task. If the associated loading task info is archived and removed from the scheduler database, returns an empty dictionary. Args: save_request_id: identifier of a save origin request full_info: whether to return detailed info for staff users Returns: A dictionary with the following keys: - **type**: loading task type - **arguments**: loading task arguments - **id**: loading task database identifier - **backend_id**: loading task celery identifier - **scheduled**: loading task scheduling date - **ended**: loading task termination date - **status**: loading task execution status - **visit_status**: Actual visit status Depending on the availability of the task logs in the elasticsearch cluster of Software Heritage, the returned dictionary may also contain the following keys: - **name**: associated celery task name - **message**: relevant log message from task execution - **duration**: task execution time (only if it succeeded) - **worker**: name of the worker that executed the task """ try: save_request = SaveOriginRequest.objects.get(id=save_request_id) except ObjectDoesNotExist: return {} task_info: Dict[str, Any] = {} if save_request.note is not None: task_info["note"] = save_request.note try: task = scheduler().get_tasks([save_request.loading_task_id]) except Exception: # to avoid mocking GET responses of /save/task/info/ endpoint when running # cypress tests as scheduler is not available in that case task = None task = task[0] if task else None if task is None: return task_info task_run = scheduler().get_task_runs([task["id"]]) task_run = task_run[0] if task_run else None if task_run is None: return task_info task_info.update(task_run) task_info["type"] = task["type"] task_info["arguments"] = task["arguments"] task_info["id"] = task_run["task"] del task_info["task"] del task_info["metadata"] # Enrich the task info with the loading visit status task_info["visit_status"] = save_request.visit_status es_workers_index_url = get_config()["es_workers_index_url"] if not es_workers_index_url: return task_info es_workers_index_url += "/_search" if save_request.visit_date: min_ts = save_request.visit_date max_ts = min_ts + timedelta(days=7) else: min_ts = save_request.request_date max_ts = min_ts + timedelta(days=MAX_THRESHOLD_DAYS) min_ts_unix = int(min_ts.timestamp()) * 1000 max_ts_unix = int(max_ts.timestamp()) * 1000 save_task_status = _save_task_status[task["status"]] priority = "3" if save_task_status == SAVE_TASK_FAILED else "6" query = { "bool": { "must": [ {"match_phrase": {"syslog.priority": {"query": priority}}}, { "match_phrase": { "journald.custom.swh_task_id": {"query": task_run["backend_id"]} } }, { "range": { "@timestamp": { "gte": min_ts_unix, "lte": max_ts_unix, "format": "epoch_millis", } } }, ] } } try: response = requests.post( es_workers_index_url, json={"query": query, "sort": ["@timestamp"]}, timeout=30, ) results = json.loads(response.text) if results["hits"]["total"]["value"] >= 1: task_run_info = results["hits"]["hits"][-1]["_source"] journald_custom = task_run_info.get("journald", {}).get("custom", {}) task_info["duration"] = journald_custom.get( "swh_logging_args_runtime", "not available" ) task_info["message"] = task_run_info.get("message", "not available") task_info["name"] = journald_custom.get("swh_task_name", "not available") task_info["worker"] = task_run_info.get("host", {}).get("hostname") except Exception as exc: logger.warning("Request to Elasticsearch failed\n%s", exc) sentry_capture_exception(exc) if not full_info: for field in ("id", "backend_id", "worker"): # remove some staff only fields task_info.pop(field, None) if "message" in task_run and "Loading failure" in task_run["message"]: # hide traceback for non staff users, only display exception message_lines = task_info["message"].split("\n") message = "" for line in message_lines: if line.startswith("Traceback"): break message += f"{line}\n" message += message_lines[-1] task_info["message"] = message return task_info SUBMITTED_SAVE_REQUESTS_METRIC = "swh_web_submitted_save_requests" _submitted_save_requests_gauge = Gauge( name=SUBMITTED_SAVE_REQUESTS_METRIC, documentation="Number of submitted origin save requests", labelnames=["status", "visit_type"], registry=SWH_WEB_METRICS_REGISTRY, ) ACCEPTED_SAVE_REQUESTS_METRIC = "swh_web_accepted_save_requests" _accepted_save_requests_gauge = Gauge( name=ACCEPTED_SAVE_REQUESTS_METRIC, documentation="Number of accepted origin save requests", labelnames=["load_task_status", "visit_type"], registry=SWH_WEB_METRICS_REGISTRY, ) # Metric on the delay of save code now request per status and visit_type. This is the # time difference between the save code now is requested and the time it got ingested. ACCEPTED_SAVE_REQUESTS_DELAY_METRIC = "swh_web_save_requests_delay_seconds" _accepted_save_requests_delay_gauge = Gauge( name=ACCEPTED_SAVE_REQUESTS_DELAY_METRIC, documentation="Save Requests Duration", labelnames=["load_task_status", "visit_type"], registry=SWH_WEB_METRICS_REGISTRY, ) def compute_save_requests_metrics() -> None: """Compute Prometheus metrics related to origin save requests: - Number of submitted origin save requests - Number of accepted origin save requests - Save Code Now requests delay between request time and actual time of ingestion """ request_statuses = ( SAVE_REQUEST_ACCEPTED, SAVE_REQUEST_REJECTED, SAVE_REQUEST_PENDING, ) load_task_statuses = ( SAVE_TASK_NOT_CREATED, SAVE_TASK_NOT_YET_SCHEDULED, SAVE_TASK_SCHEDULED, SAVE_TASK_SUCCEEDED, SAVE_TASK_FAILED, SAVE_TASK_RUNNING, ) # for metrics, we want access to all visit types visit_types = get_savable_visit_types(privileged_user=True) labels_set = product(request_statuses, visit_types) for labels in labels_set: _submitted_save_requests_gauge.labels(*labels).set(0) labels_set = product(load_task_statuses, visit_types) for labels in labels_set: _accepted_save_requests_gauge.labels(*labels).set(0) duration_load_task_statuses = ( SAVE_TASK_FAILED, SAVE_TASK_SUCCEEDED, ) for labels in product(duration_load_task_statuses, visit_types): _accepted_save_requests_delay_gauge.labels(*labels).set(0) for sor in SaveOriginRequest.objects.all(): if sor.status == SAVE_REQUEST_ACCEPTED: _accepted_save_requests_gauge.labels( load_task_status=sor.loading_task_status, visit_type=sor.visit_type, ).inc() _submitted_save_requests_gauge.labels( status=sor.status, visit_type=sor.visit_type ).inc() if ( sor.loading_task_status in (SAVE_TASK_SUCCEEDED, SAVE_TASK_FAILED) and sor.visit_date is not None and sor.request_date is not None ): delay = sor.visit_date.timestamp() - sor.request_date.timestamp() _accepted_save_requests_delay_gauge.labels( load_task_status=sor.loading_task_status, visit_type=sor.visit_type, ).inc(delay) diff --git a/swh/web/settings/common.py b/swh/web/settings/common.py index 76118cee..04abc13c 100644 --- a/swh/web/settings/common.py +++ b/swh/web/settings/common.py @@ -1,349 +1,349 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information """ Django common settings for swh-web. """ from importlib.util import find_spec import os import sys from typing import Any, Dict from django.utils import encoding -from swh.web.auth.utils import OIDC_SWH_WEB_CLIENT_ID from swh.web.config import get_config # Fix django-js-reverse 0.9.1 compatibility with django 4.x # TODO: Remove that hack once a new django-js-reverse release # is available on PyPI if not hasattr(encoding, "force_text"): setattr(encoding, "force_text", encoding.force_str) swh_web_config = get_config() # Build paths inside the project like this: os.path.join(BASE_DIR, ...) PROJECT_DIR = os.path.dirname(os.path.abspath(__file__)) # Quick-start development settings - unsuitable for production # See https://docs.djangoproject.com/en/1.11/howto/deployment/checklist/ # SECURITY WARNING: keep the secret key used in production secret! SECRET_KEY = swh_web_config["secret_key"] # SECURITY WARNING: don't run with debug turned on in production! DEBUG = swh_web_config["debug"] DEBUG_PROPAGATE_EXCEPTIONS = swh_web_config["debug"] ALLOWED_HOSTS = ["127.0.0.1", "localhost"] + swh_web_config["allowed_hosts"] # Application definition SWH_BASE_DJANGO_APPS = [ "swh.web.auth", "swh.web.browse", - "swh.web.common", + "swh.web.utils", "swh.web.api", ] SWH_EXTRA_DJANGO_APPS = [ app for app in swh_web_config["swh_extra_django_apps"] if app not in SWH_BASE_DJANGO_APPS ] # swh.web.api must be the last loaded application due to the way # its URLS are registered SWH_DJANGO_APPS = SWH_EXTRA_DJANGO_APPS + SWH_BASE_DJANGO_APPS INSTALLED_APPS = [ "django.contrib.admin", "django.contrib.auth", "django.contrib.contenttypes", "django.contrib.sessions", "django.contrib.messages", "django.contrib.staticfiles", "rest_framework", "webpack_loader", "django_js_reverse", "corsheaders", ] + SWH_DJANGO_APPS MIDDLEWARE = [ "django.middleware.security.SecurityMiddleware", "django.contrib.sessions.middleware.SessionMiddleware", "corsheaders.middleware.CorsMiddleware", "django.middleware.common.CommonMiddleware", "django.middleware.csrf.CsrfViewMiddleware", "django.contrib.auth.middleware.AuthenticationMiddleware", "swh.auth.django.middlewares.OIDCSessionExpiredMiddleware", "django.contrib.messages.middleware.MessageMiddleware", "django.middleware.clickjacking.XFrameOptionsMiddleware", - "swh.web.common.middlewares.ThrottlingHeadersMiddleware", - "swh.web.common.middlewares.ExceptionMiddleware", + "swh.web.utils.middlewares.ThrottlingHeadersMiddleware", + "swh.web.utils.middlewares.ExceptionMiddleware", ] # Compress all assets (static ones and dynamically generated html) # served by django in a local development environment context. # In a production environment, assets compression will be directly # handled by web servers like apache or nginx. if swh_web_config["serve_assets"]: MIDDLEWARE.insert(0, "django.middleware.gzip.GZipMiddleware") ROOT_URLCONF = "swh.web.urls" SWH_APP_TEMPLATES = [os.path.join(PROJECT_DIR, "../templates")] # Add templates directory from each SWH Django application for app in SWH_DJANGO_APPS: try: app_spec = find_spec(app) assert app_spec is not None, f"Django application {app} not found !" assert app_spec.origin is not None SWH_APP_TEMPLATES.append( os.path.join(os.path.dirname(app_spec.origin), "templates") ) except ModuleNotFoundError: assert False, f"Django application {app} not found !" TEMPLATES = [ { "BACKEND": "django.template.backends.django.DjangoTemplates", "DIRS": SWH_APP_TEMPLATES, "APP_DIRS": True, "OPTIONS": { "context_processors": [ "django.template.context_processors.debug", "django.template.context_processors.request", "django.contrib.auth.context_processors.auth", "django.contrib.messages.context_processors.messages", - "swh.web.common.utils.context_processor", + "swh.web.utils.context_processor", ], "libraries": { - "swh_templatetags": "swh.web.common.swh_templatetags", + "swh_templatetags": "swh.web.utils.swh_templatetags", }, }, }, ] DATABASES = { "default": { "ENGINE": "django.db.backends.sqlite3", "NAME": swh_web_config.get("development_db", ""), } } # Password validation # https://docs.djangoproject.com/en/1.11/ref/settings/#auth-password-validators AUTH_PASSWORD_VALIDATORS = [ { "NAME": "django.contrib.auth.password_validation.UserAttributeSimilarityValidator", # noqa }, { "NAME": "django.contrib.auth.password_validation.MinimumLengthValidator", }, { "NAME": "django.contrib.auth.password_validation.CommonPasswordValidator", }, { "NAME": "django.contrib.auth.password_validation.NumericPasswordValidator", }, ] # Internationalization # https://docs.djangoproject.com/en/1.11/topics/i18n/ LANGUAGE_CODE = "en-us" TIME_ZONE = "UTC" USE_I18N = True USE_L10N = True USE_TZ = True # Static files (CSS, JavaScript, Images) # https://docs.djangoproject.com/en/1.11/howto/static-files/ STATIC_URL = "/static/" # static folder location when swh-web has been installed with pip STATIC_DIR = os.path.join(sys.prefix, "share/swh/web/static") if not os.path.exists(STATIC_DIR): # static folder location when developping swh-web STATIC_DIR = os.path.join(PROJECT_DIR, "../../../static") STATICFILES_DIRS = [STATIC_DIR] INTERNAL_IPS = ["127.0.0.1"] throttle_rates = {} http_requests = ["GET", "HEAD", "POST", "PUT", "DELETE", "OPTIONS", "PATCH"] throttling = swh_web_config["throttling"] for limiter_scope, limiter_conf in throttling["scopes"].items(): if "default" in limiter_conf["limiter_rate"]: throttle_rates[limiter_scope] = limiter_conf["limiter_rate"]["default"] # for backward compatibility else: throttle_rates[limiter_scope] = limiter_conf["limiter_rate"] # register sub scopes specific for HTTP request types for http_request in http_requests: if http_request in limiter_conf["limiter_rate"]: throttle_rates[limiter_scope + "_" + http_request.lower()] = limiter_conf[ "limiter_rate" ][http_request] REST_FRAMEWORK: Dict[str, Any] = { "DEFAULT_RENDERER_CLASSES": ( "rest_framework.renderers.JSONRenderer", "swh.web.api.renderers.YAMLRenderer", "rest_framework.renderers.TemplateHTMLRenderer", ), "DEFAULT_THROTTLE_CLASSES": ( "swh.web.api.throttling.SwhWebRateThrottle", "swh.web.api.throttling.SwhWebUserRateThrottle", ), "DEFAULT_THROTTLE_RATES": throttle_rates, "DEFAULT_AUTHENTICATION_CLASSES": [ "rest_framework.authentication.SessionAuthentication", "swh.auth.django.backends.OIDCBearerTokenAuthentication", ], "EXCEPTION_HANDLER": "swh.web.api.apiresponse.error_response_handler", } LOGGING = { "version": 1, "disable_existing_loggers": False, "filters": { "require_debug_false": { "()": "django.utils.log.RequireDebugFalse", }, "require_debug_true": { "()": "django.utils.log.RequireDebugTrue", }, }, "formatters": { "request": { "format": "[%(asctime)s] [%(levelname)s] %(request)s %(status_code)s", "datefmt": "%d/%b/%Y %H:%M:%S", }, "simple": { "format": "[%(asctime)s] [%(levelname)s] %(message)s", "datefmt": "%d/%b/%Y %H:%M:%S", }, "verbose": { "format": ( "[%(asctime)s] [%(levelname)s] %(name)s.%(funcName)s:%(lineno)s " "- %(message)s" ), "datefmt": "%d/%b/%Y %H:%M:%S", }, }, "handlers": { "console": { "level": "DEBUG", "filters": ["require_debug_true"], "class": "logging.StreamHandler", "formatter": "simple", }, "file": { "level": "WARNING", "filters": ["require_debug_false"], "class": "logging.FileHandler", "filename": os.path.join(swh_web_config["log_dir"], "swh-web.log"), "formatter": "simple", }, "file_request": { "level": "WARNING", "filters": ["require_debug_false"], "class": "logging.FileHandler", "filename": os.path.join(swh_web_config["log_dir"], "swh-web.log"), "formatter": "request", }, "console_verbose": { "level": "DEBUG", "filters": ["require_debug_true"], "class": "logging.StreamHandler", "formatter": "verbose", }, "file_verbose": { "level": "WARNING", "filters": ["require_debug_false"], "class": "logging.FileHandler", "filename": os.path.join(swh_web_config["log_dir"], "swh-web.log"), "formatter": "verbose", }, "null": { "class": "logging.NullHandler", }, }, "loggers": { "": { "handlers": ["console_verbose", "file_verbose"], "level": "DEBUG" if DEBUG else "WARNING", }, "django": { "handlers": ["console"], "level": "DEBUG" if DEBUG else "WARNING", "propagate": False, }, "django.request": { "handlers": ["file_request"], "level": "DEBUG" if DEBUG else "WARNING", "propagate": False, }, "django.db.backends": {"handlers": ["null"], "propagate": False}, "django.utils.autoreload": { "level": "INFO", }, "swh.core.statsd": { "level": "INFO", }, "urllib3": { "level": "INFO", }, }, } WEBPACK_LOADER = { "DEFAULT": { "CACHE": False, "BUNDLE_DIR_NAME": "./", "STATS_FILE": os.path.join(STATIC_DIR, "webpack-stats.json"), "POLL_INTERVAL": 0.1, "TIMEOUT": None, "IGNORE": [".+\\.hot-update.js", ".+\\.map"], } } LOGIN_URL = "/login/" LOGIN_REDIRECT_URL = "swh-web-homepage" SESSION_ENGINE = "django.contrib.sessions.backends.cache" CACHES = { "default": {"BACKEND": "django.core.cache.backends.locmem.LocMemCache"}, } JS_REVERSE_JS_MINIFY = False CORS_ORIGIN_ALLOW_ALL = True CORS_URLS_REGEX = r"^/(badge|api)/.*$" AUTHENTICATION_BACKENDS = [ "django.contrib.auth.backends.ModelBackend", "swh.auth.django.backends.OIDCAuthorizationCodePKCEBackend", ] +OIDC_SWH_WEB_CLIENT_ID = "swh-web" SWH_AUTH_SERVER_URL = swh_web_config["keycloak"]["server_url"] SWH_AUTH_REALM_NAME = swh_web_config["keycloak"]["realm_name"] SWH_AUTH_CLIENT_ID = OIDC_SWH_WEB_CLIENT_ID SWH_AUTH_SESSION_EXPIRED_REDIRECT_VIEW = "logout" DEFAULT_AUTO_FIELD = "django.db.models.AutoField" diff --git a/swh/web/settings/development.py b/swh/web/settings/development.py index e4cb0877..f7fe881e 100644 --- a/swh/web/settings/development.py +++ b/swh/web/settings/development.py @@ -1,19 +1,19 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information """ Django development settings for swh-web. """ from django.core.cache import cache from .common import * # noqa from .common import MIDDLEWARE -MIDDLEWARE += ["swh.web.common.middlewares.HtmlPrettifyMiddleware"] +MIDDLEWARE += ["swh.web.utils.middlewares.HtmlPrettifyMiddleware"] AUTH_PASSWORD_VALIDATORS = [] # disable any pwd validation mechanism cache.clear() diff --git a/swh/web/settings/production.py b/swh/web/settings/production.py index 9c533379..f4053d19 100644 --- a/swh/web/settings/production.py +++ b/swh/web/settings/production.py @@ -1,71 +1,71 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information """ Django production settings for swh-web. """ from .common import ( CACHES, DEBUG, MIDDLEWARE, REST_FRAMEWORK, WEBPACK_LOADER, swh_web_config, ) from .common import * # noqa MIDDLEWARE += [ - "swh.web.common.middlewares.HtmlMinifyMiddleware", + "swh.web.utils.middlewares.HtmlMinifyMiddleware", ] if swh_web_config.get("throttling", {}).get("cache_uri"): CACHES.update( { "default": { "BACKEND": "django.core.cache.backends.memcached.MemcachedCache", "LOCATION": swh_web_config["throttling"]["cache_uri"], } } ) # Setup support for proxy headers USE_X_FORWARDED_HOST = True SECURE_PROXY_SSL_HEADER = ("HTTP_X_FORWARDED_PROTO", "https") # We're going through seven (or, in that case, 2) proxies thanks to Varnish REST_FRAMEWORK["NUM_PROXIES"] = 2 db_conf = swh_web_config["production_db"] if db_conf.get("name", "").startswith("postgresql://"): # poor man's support for dsn connection string... import psycopg2 with psycopg2.connect(db_conf.get("name")) as cnx: dsn_dict = cnx.get_dsn_parameters() db_conf["name"] = dsn_dict.get("dbname") db_conf["host"] = dsn_dict.get("host") db_conf["port"] = dsn_dict.get("port") db_conf["user"] = dsn_dict.get("user") db_conf["password"] = dsn_dict.get("password") # https://docs.djangoproject.com/en/1.10/ref/settings/#databases DATABASES = { "default": { "ENGINE": "django.db.backends.postgresql", "NAME": db_conf.get("name"), "HOST": db_conf.get("host"), "PORT": db_conf.get("port"), "USER": db_conf.get("user"), "PASSWORD": db_conf.get("password"), } } WEBPACK_LOADER["DEFAULT"]["CACHE"] = not DEBUG LOGIN_URL = "/oidc/login/" LOGIN_REDIRECT_URL = "/oidc/profile/" diff --git a/swh/web/tests/add_forge_now/test_api_views.py b/swh/web/tests/add_forge_now/test_api_views.py index 5c0d1138..2b241653 100644 --- a/swh/web/tests/add_forge_now/test_api_views.py +++ b/swh/web/tests/add_forge_now/test_api_views.py @@ -1,651 +1,651 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import copy import datetime import threading import time from typing import Dict from urllib.parse import urlencode, urlparse import iso8601 import pytest from swh.web.add_forge_now.models import Request, RequestHistory -from swh.web.common.utils import reverse from swh.web.config import get_config from swh.web.inbound_email.utils import get_address_for_pk -from swh.web.tests.utils import ( +from swh.web.tests.helpers import ( check_api_get_responses, check_api_post_response, check_http_get_response, check_http_post_response, ) +from swh.web.utils import reverse @pytest.mark.django_db def test_add_forge_request_create_anonymous_user(api_client): url = reverse("api-1-add-forge-request-create") check_api_post_response(api_client, url, status_code=403) @pytest.mark.django_db def test_add_forge_request_create_empty(api_client, regular_user): api_client.force_login(regular_user) url = reverse("api-1-add-forge-request-create") resp = check_api_post_response(api_client, url, status_code=400) assert '"forge_type"' in resp.data["reason"] ADD_FORGE_DATA_FORGE1: Dict = { "forge_type": "gitlab", "forge_url": "https://gitlab.example.org", "forge_contact_email": "admin@gitlab.example.org", "forge_contact_name": "gitlab.example.org admin", "forge_contact_comment": "user marked as owner in forge members", "submitter_forward_username": True, } ADD_FORGE_DATA_FORGE2: Dict = { "forge_type": "gitea", "forge_url": "https://gitea.example.org", "forge_contact_email": "admin@gitea.example.org", "forge_contact_name": "gitea.example.org admin", "forge_contact_comment": "user marked as owner in forge members", "submitter_forward_username": True, } ADD_FORGE_DATA_FORGE3: Dict = { "forge_type": "heptapod", "forge_url": "https://heptapod.host/", "forge_contact_email": "admin@example.org", "forge_contact_name": "heptapod admin", "forge_contact_comment": "", # authorized empty or null comment "submitter_forward_username": False, } ADD_FORGE_DATA_FORGE4: Dict = { **ADD_FORGE_DATA_FORGE3, "forge_url": "https://heptapod2.host/", "submitter_forward_username": "on", } ADD_FORGE_DATA_FORGE5: Dict = { **ADD_FORGE_DATA_FORGE3, "forge_url": "https://heptapod3.host/", "submitter_forward_username": "off", } def inbound_email_for_pk(pk: int) -> str: """Check that the inbound email matches the one expected for the given pk""" base_address = get_config()["add_forge_now"]["email_address"] return get_address_for_pk( salt="swh_web_add_forge_now", base_address=base_address, pk=pk ) @pytest.mark.django_db(transaction=True, reset_sequences=True) @pytest.mark.parametrize( "add_forge_data", [ ADD_FORGE_DATA_FORGE1, ADD_FORGE_DATA_FORGE2, ADD_FORGE_DATA_FORGE3, ADD_FORGE_DATA_FORGE4, ], ) def test_add_forge_request_create_success_post( api_client, regular_user, add_forge_data ): api_client.force_login(regular_user) url = reverse("api-1-add-forge-request-create") date_before = datetime.datetime.now(tz=datetime.timezone.utc) resp = check_api_post_response( api_client, url, data=add_forge_data, status_code=201, ) date_after = datetime.datetime.now(tz=datetime.timezone.utc) consent = add_forge_data["submitter_forward_username"] # map the expected result with what's expectedly read from the db to ease comparison expected_consent_bool = consent == "on" if isinstance(consent, str) else consent assert resp.data == { **add_forge_data, "id": resp.data["id"], "status": "PENDING", "submission_date": resp.data["submission_date"], "submitter_name": regular_user.username, "submitter_email": regular_user.email, "submitter_forward_username": expected_consent_bool, "last_moderator": resp.data["last_moderator"], "last_modified_date": resp.data["last_modified_date"], "inbound_email_address": inbound_email_for_pk(resp.data["id"]), "forge_domain": urlparse(add_forge_data["forge_url"]).netloc, } assert date_before < iso8601.parse_date(resp.data["submission_date"]) < date_after request = Request.objects.all().last() assert request.forge_url == add_forge_data["forge_url"] assert request.submitter_name == regular_user.username @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_create_success_form_encoded(client, regular_user): client.force_login(regular_user) url = reverse("api-1-add-forge-request-create") date_before = datetime.datetime.now(tz=datetime.timezone.utc) resp = check_http_post_response( client, url, request_content_type="application/x-www-form-urlencoded", data=urlencode(ADD_FORGE_DATA_FORGE1), status_code=201, ) date_after = datetime.datetime.now(tz=datetime.timezone.utc) assert resp.data == { **ADD_FORGE_DATA_FORGE1, "id": resp.data["id"], "status": "PENDING", "submission_date": resp.data["submission_date"], "submitter_name": regular_user.username, "submitter_email": regular_user.email, "last_moderator": resp.data["last_moderator"], "last_modified_date": resp.data["last_modified_date"], "inbound_email_address": inbound_email_for_pk(1), "forge_domain": urlparse(ADD_FORGE_DATA_FORGE1["forge_url"]).netloc, } assert date_before < iso8601.parse_date(resp.data["submission_date"]) < date_after request = Request.objects.all()[0] assert request.forge_url == ADD_FORGE_DATA_FORGE1["forge_url"] assert request.submitter_name == regular_user.username @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_create_duplicate(api_client, regular_user): api_client.force_login(regular_user) url = reverse("api-1-add-forge-request-create") check_api_post_response( api_client, url, data=ADD_FORGE_DATA_FORGE1, status_code=201, ) check_api_post_response( api_client, url, data=ADD_FORGE_DATA_FORGE1, status_code=409, ) requests = Request.objects.all() assert len(requests) == 1 @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_create_invalid_forge_url(api_client, regular_user): api_client.force_login(regular_user) url = reverse("api-1-add-forge-request-create") forge_data = copy.deepcopy(ADD_FORGE_DATA_FORGE1) forge_data["forge_url"] = "foo" resp = check_api_post_response( api_client, url, data=forge_data, status_code=400, ) assert resp.data == { "exception": "BadInputExc", "reason": '{"forge_url": ["Enter a valid URL."]}', } @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_update_anonymous_user(api_client): url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) check_api_post_response(api_client, url, status_code=403) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_update_regular_user(api_client, regular_user): api_client.force_login(regular_user) url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) check_api_post_response(api_client, url, status_code=403) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_update_non_existent(api_client, add_forge_moderator): api_client.force_login(add_forge_moderator) url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) check_api_post_response(api_client, url, status_code=400) def create_add_forge_request(api_client, regular_user, data=ADD_FORGE_DATA_FORGE1): api_client.force_login(regular_user) url = reverse("api-1-add-forge-request-create") return check_api_post_response( api_client, url, data=data, status_code=201, ) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_update_empty(api_client, regular_user, add_forge_moderator): create_add_forge_request(api_client, regular_user) api_client.force_login(add_forge_moderator) url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) check_api_post_response(api_client, url, status_code=400) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_update_missing_field( api_client, regular_user, add_forge_moderator ): create_add_forge_request(api_client, regular_user) api_client.force_login(add_forge_moderator) url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) check_api_post_response(api_client, url, data={}, status_code=400) check_api_post_response( api_client, url, data={"new_status": "REJECTED"}, status_code=400 ) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_update(api_client, regular_user, add_forge_moderator): create_add_forge_request(api_client, regular_user) api_client.force_login(add_forge_moderator) url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) check_api_post_response( api_client, url, data={"text": "updating request"}, status_code=200 ) check_api_post_response( api_client, url, data={"new_status": "REJECTED", "text": "request rejected"}, status_code=200, ) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_update_invalid_new_status( api_client, regular_user, add_forge_moderator ): create_add_forge_request(api_client, regular_user) api_client.force_login(add_forge_moderator) url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) check_api_post_response( api_client, url, data={"new_status": "ACCEPTED", "text": "request accepted"}, status_code=400, ) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_update_status_concurrent( api_client, regular_user, add_forge_moderator, mocker ): _block_while_testing = mocker.patch( "swh.web.add_forge_now.api_views._block_while_testing" ) _block_while_testing.side_effect = lambda: time.sleep(1) create_add_forge_request(api_client, regular_user) api_client.force_login(add_forge_moderator) url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) worker_ended = False def worker(): nonlocal worker_ended check_api_post_response( api_client, url, data={"new_status": "WAITING_FOR_FEEDBACK", "text": "waiting for message"}, status_code=200, ) worker_ended = True # this thread will first modify the request status to WAITING_FOR_FEEDBACK thread = threading.Thread(target=worker) thread.start() # the other thread (slower) will attempt to modify the request status to REJECTED # but it will not be allowed as the first faster thread already modified it # and REJECTED state can not be reached from WAITING_FOR_FEEDBACK one time.sleep(0.5) check_api_post_response( api_client, url, data={"new_status": "REJECTED", "text": "request accepted"}, status_code=400, ) thread.join() assert worker_ended @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_list_anonymous(api_client, regular_user): url = reverse("api-1-add-forge-request-list") resp = check_api_get_responses(api_client, url, status_code=200) assert resp.data == [] create_add_forge_request(api_client, regular_user) resp = check_api_get_responses(api_client, url, status_code=200) add_forge_request = { "forge_url": ADD_FORGE_DATA_FORGE1["forge_url"], "forge_type": ADD_FORGE_DATA_FORGE1["forge_type"], "status": "PENDING", "submission_date": resp.data[0]["submission_date"], "id": resp.data[0]["id"], } assert resp.data == [add_forge_request] create_add_forge_request(api_client, regular_user, data=ADD_FORGE_DATA_FORGE2) resp = check_api_get_responses(api_client, url, status_code=200) other_forge_request = { "forge_url": ADD_FORGE_DATA_FORGE2["forge_url"], "forge_type": ADD_FORGE_DATA_FORGE2["forge_type"], "status": "PENDING", "submission_date": resp.data[0]["submission_date"], "id": resp.data[0]["id"], } assert resp.data == [other_forge_request, add_forge_request] @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_list_moderator( api_client, regular_user, add_forge_moderator ): url = reverse("api-1-add-forge-request-list") create_add_forge_request(api_client, regular_user) create_add_forge_request(api_client, regular_user, data=ADD_FORGE_DATA_FORGE2) api_client.force_login(add_forge_moderator) resp = check_api_get_responses(api_client, url, status_code=200) add_forge_request = { **ADD_FORGE_DATA_FORGE1, "status": "PENDING", "submission_date": resp.data[1]["submission_date"], "submitter_name": regular_user.username, "submitter_email": regular_user.email, "last_moderator": resp.data[1]["last_moderator"], "last_modified_date": resp.data[1]["last_modified_date"], "id": resp.data[1]["id"], "inbound_email_address": inbound_email_for_pk(resp.data[1]["id"]), "forge_domain": urlparse(ADD_FORGE_DATA_FORGE1["forge_url"]).netloc, } other_forge_request = { **ADD_FORGE_DATA_FORGE2, "status": "PENDING", "submission_date": resp.data[0]["submission_date"], "submitter_name": regular_user.username, "submitter_email": regular_user.email, "last_moderator": resp.data[0]["last_moderator"], "last_modified_date": resp.data[0]["last_modified_date"], "id": resp.data[0]["id"], "inbound_email_address": inbound_email_for_pk(resp.data[0]["id"]), "forge_domain": urlparse(ADD_FORGE_DATA_FORGE2["forge_url"]).netloc, } assert resp.data == [other_forge_request, add_forge_request] @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_list_pagination( api_client, regular_user, api_request_factory ): create_add_forge_request(api_client, regular_user) create_add_forge_request(api_client, regular_user, data=ADD_FORGE_DATA_FORGE2) url = reverse("api-1-add-forge-request-list", query_params={"per_page": 1}) resp = check_api_get_responses(api_client, url, 200) assert len(resp.data) == 1 request = api_request_factory.get(url) next_url = reverse( "api-1-add-forge-request-list", query_params={"page": 2, "per_page": 1}, request=request, ) assert resp["Link"] == f'<{next_url}>; rel="next"' resp = check_api_get_responses(api_client, next_url, 200) assert len(resp.data) == 1 prev_url = reverse( "api-1-add-forge-request-list", query_params={"page": 1, "per_page": 1}, request=request, ) assert resp["Link"] == f'<{prev_url}>; rel="previous"' @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_list_submitter_filtering( api_client, regular_user, regular_user2 ): create_add_forge_request(api_client, regular_user) create_add_forge_request(api_client, regular_user2, data=ADD_FORGE_DATA_FORGE2) api_client.force_login(regular_user) url = reverse( "api-1-add-forge-request-list", query_params={"user_requests_only": 1} ) resp = check_api_get_responses(api_client, url, status_code=200) assert len(resp.data) == 1 @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_get(api_client, regular_user, add_forge_moderator): resp = create_add_forge_request(api_client, regular_user) submission_date = resp.data["submission_date"] url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) api_client.force_login(add_forge_moderator) check_api_post_response( api_client, url, data={"new_status": "WAITING_FOR_FEEDBACK", "text": "waiting for message"}, status_code=200, ) api_client.logout() url = reverse("api-1-add-forge-request-get", url_args={"id": 1}) resp = check_api_get_responses(api_client, url, status_code=200) assert resp.data == { "request": { "forge_url": ADD_FORGE_DATA_FORGE1["forge_url"], "forge_type": ADD_FORGE_DATA_FORGE1["forge_type"], "id": 1, "status": "WAITING_FOR_FEEDBACK", "submission_date": submission_date, }, "history": [ { "id": 1, "actor_role": "SUBMITTER", "date": resp.data["history"][0]["date"], "new_status": "PENDING", }, { "id": 2, "actor_role": "MODERATOR", "date": resp.data["history"][1]["date"], "new_status": "WAITING_FOR_FEEDBACK", }, ], } @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_get_moderator(api_client, regular_user, add_forge_moderator): resp = create_add_forge_request(api_client, regular_user) submission_date = resp.data["submission_date"] url = reverse("api-1-add-forge-request-update", url_args={"id": 1}) api_client.force_login(add_forge_moderator) check_api_post_response( api_client, url, data={"new_status": "WAITING_FOR_FEEDBACK", "text": "waiting for message"}, status_code=200, ) url = reverse("api-1-add-forge-request-get", url_args={"id": 1}) resp = check_api_get_responses(api_client, url, status_code=200) resp.data["history"] = [dict(history_item) for history_item in resp.data["history"]] assert resp.data == { "request": { **ADD_FORGE_DATA_FORGE1, "id": 1, "status": "WAITING_FOR_FEEDBACK", "submission_date": submission_date, "submitter_name": regular_user.username, "submitter_email": regular_user.email, "last_moderator": add_forge_moderator.username, "last_modified_date": resp.data["history"][1]["date"], "inbound_email_address": inbound_email_for_pk(1), "forge_domain": urlparse(ADD_FORGE_DATA_FORGE1["forge_url"]).netloc, }, "history": [ { "id": 1, "text": "", "actor": regular_user.username, "actor_role": "SUBMITTER", "date": resp.data["history"][0]["date"], "new_status": "PENDING", "message_source_url": None, }, { "id": 2, "text": "waiting for message", "actor": add_forge_moderator.username, "actor_role": "MODERATOR", "date": resp.data["history"][1]["date"], "new_status": "WAITING_FOR_FEEDBACK", "message_source_url": None, }, ], } @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_get_moderator_message_source( api_client, regular_user, add_forge_moderator ): resp = create_add_forge_request(api_client, regular_user) rh = RequestHistory( request=Request.objects.get(pk=resp.data["id"]), new_status="WAITING_FOR_FEEDBACK", text="waiting for message", actor=add_forge_moderator.username, actor_role="MODERATOR", message_source=b"test with a message source", ) rh.save() api_client.force_login(add_forge_moderator) url = reverse("api-1-add-forge-request-get", url_args={"id": resp.data["id"]}) resp = check_api_get_responses(api_client, url, status_code=200) resp.data["history"] = [dict(history_item) for history_item in resp.data["history"]] # Check that the authentified moderator can't urlhack non-existent message sources assert resp.data["history"][0]["message_source_url"] is None empty_message_url = reverse( "forge-add-message-source", url_args={"id": resp.data["history"][0]["id"]} ) check_http_get_response(api_client, empty_message_url, status_code=404) # Check that the authentified moderator can't urlhack non-existent message sources non_existent_message_url = reverse( "forge-add-message-source", url_args={"id": 9001} ) check_http_get_response(api_client, non_existent_message_url, status_code=404) # Check that the authentified moderator can access the message source when the url is # given message_source_url = resp.data["history"][-1]["message_source_url"] assert message_source_url is not None message_source_resp = check_http_get_response( api_client, message_source_url, status_code=200, content_type="text/email" ) # Check that the message source shows up as an attachment assert message_source_resp.content == rh.message_source disposition = message_source_resp["Content-Disposition"] assert disposition.startswith("attachment; filename=") assert disposition.endswith('.eml"') # Check that a regular user can't access message sources api_client.force_login(regular_user) check_http_get_response(api_client, message_source_url, status_code=302) api_client.force_login(add_forge_moderator) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_get_invalid(api_client): url = reverse("api-1-add-forge-request-get", url_args={"id": 3}) check_api_get_responses(api_client, url, status_code=400) diff --git a/swh/web/tests/add_forge_now/test_app.py b/swh/web/tests/add_forge_now/test_app.py index 47cc92d8..d460c349 100644 --- a/swh/web/tests/add_forge_now/test_app.py +++ b/swh/web/tests/add_forge_now/test_app.py @@ -1,33 +1,33 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import pytest from django.urls import get_resolver from swh.web.add_forge_now.urls import urlpatterns -from swh.web.common.utils import reverse from swh.web.tests.django_asserts import assert_not_contains -from swh.web.tests.utils import check_html_get_response +from swh.web.tests.helpers import check_html_get_response +from swh.web.utils import reverse @pytest.mark.django_db def test_add_forge_now_deactivate(client, staff_user, django_settings): """Check Add forge now feature is deactivated when the swh.web.add_forge_now django application is not in installed apps.""" django_settings.SWH_DJANGO_APPS = [ app for app in django_settings.SWH_DJANGO_APPS if app != "swh.web.add_forge_now" ] url = reverse("swh-web-homepage") client.force_login(staff_user) resp = check_html_get_response(client, url, status_code=200) assert_not_contains(resp, "swh-add-forge-now-item") assert_not_contains(resp, "swh-add-forge-now-moderation-item") add_forge_now_view_names = set(urlpattern.name for urlpattern in urlpatterns) all_view_names = set(get_resolver().reverse_dict.keys()) assert add_forge_now_view_names & all_view_names == set() diff --git a/swh/web/tests/add_forge_now/test_views.py b/swh/web/tests/add_forge_now/test_views.py index a6a8eb23..b259b7b8 100644 --- a/swh/web/tests/add_forge_now/test_views.py +++ b/swh/web/tests/add_forge_now/test_views.py @@ -1,222 +1,222 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json import pytest -from swh.web.common.utils import reverse -from swh.web.tests.utils import check_http_get_response +from swh.web.tests.helpers import check_http_get_response +from swh.web.utils import reverse from .test_api_views import create_add_forge_request NB_FORGE_TYPE = 2 NB_FORGES_PER_TYPE = 20 def create_add_forge_requests(client, regular_user, regular_user2): requests = [] for i in range(NB_FORGES_PER_TYPE): request = { "forge_type": "gitlab", "forge_url": f"https://gitlab.example{i:02d}.org", "forge_contact_email": f"admin@gitlab.example{i:02d}.org", "forge_contact_name": f"gitlab.example{i:02d}.org admin", "forge_contact_comment": "user marked as owner in forge members", } requests.append( json.loads( create_add_forge_request( client, regular_user, data=request, ).content ) ) request = { "forge_type": "gitea", "forge_url": f"https://gitea.example{i:02d}.org", "forge_contact_email": f"admin@gitea.example{i:02d}.org", "forge_contact_name": f"gitea.example{i:02d}.org admin", "forge_contact_comment": "user marked as owner in forge members", } requests.append( json.loads( create_add_forge_request( client, regular_user2, data=request, ).content ) ) return requests @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_list_datatables_no_parameters( client, regular_user, regular_user2 ): create_add_forge_requests(client, regular_user, regular_user2) url = reverse("add-forge-request-list-datatables") resp = check_http_get_response(client, url, status_code=200) data = json.loads(resp.content) length = 10 assert data["draw"] == 0 assert data["recordsFiltered"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert data["recordsTotal"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert len(data["data"]) == length # default ordering is by descending id assert data["data"][0]["id"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert data["data"][-1]["id"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE - length + 1 assert "submitter_name" not in data["data"][0] @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_list_datatables( client, regular_user, regular_user2, add_forge_moderator ): create_add_forge_requests(client, regular_user, regular_user2) length = 10 url = reverse( "add-forge-request-list-datatables", query_params={"draw": 1, "length": length, "start": 0}, ) client.force_login(regular_user) resp = check_http_get_response(client, url, status_code=200) data = json.loads(resp.content) assert data["draw"] == 1 assert data["recordsFiltered"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert data["recordsTotal"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert len(data["data"]) == length # default ordering is by descending id assert data["data"][0]["id"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert data["data"][-1]["id"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE - length + 1 assert "submitter_name" not in data["data"][0] client.force_login(add_forge_moderator) resp = check_http_get_response(client, url, status_code=200) data = json.loads(resp.content) assert data["draw"] == 1 assert data["recordsFiltered"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert data["recordsTotal"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert len(data["data"]) == length # default ordering is by descending id assert data["data"][0]["id"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert data["data"][-1]["id"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE - length + 1 assert "submitter_name" in data["data"][0] assert "last_moderator" in data["data"][0] assert "last_modified_date" in data["data"][0] return data @pytest.mark.django_db(transaction=True, reset_sequences=True) @pytest.mark.parametrize("order_field", ["forge_url", "last_modified_date"]) def test_add_forge_request_list_datatables_ordering( client, add_forge_moderator, admin_user, order_field ): requests = create_add_forge_requests(client, add_forge_moderator, admin_user) requests_sorted = list(sorted(requests, key=lambda d: d[order_field])) forge_urls_asc = [request[order_field] for request in requests_sorted] forge_urls_desc = list(reversed(forge_urls_asc)) length = 10 client.force_login(admin_user) for direction in ("asc", "desc"): for i in range(4): url = reverse( "add-forge-request-list-datatables", query_params={ "draw": 1, "length": length, "start": i * length, "order[0][column]": 2, "order[0][dir]": direction, "columns[2][name]": order_field, }, ) resp = check_http_get_response(client, url, status_code=200) data = json.loads(resp.content) assert data["draw"] == 1 assert data["recordsFiltered"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert data["recordsTotal"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert len(data["data"]) == length page_forge_urls = [request[order_field] for request in data["data"]] if direction == "asc": expected_forge_urls = forge_urls_asc[i * length : (i + 1) * length] else: expected_forge_urls = forge_urls_desc[i * length : (i + 1) * length] assert page_forge_urls == expected_forge_urls @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_list_datatables_search(client, regular_user, regular_user2): create_add_forge_requests(client, regular_user, regular_user2) url = reverse( "add-forge-request-list-datatables", query_params={ "draw": 1, "length": NB_FORGES_PER_TYPE, "start": 0, "search[value]": "gitlab", }, ) client.force_login(regular_user) resp = check_http_get_response(client, url, status_code=200) data = json.loads(resp.content) assert data["draw"] == 1 assert data["recordsFiltered"] == NB_FORGES_PER_TYPE assert data["recordsTotal"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert len(data["data"]) == NB_FORGES_PER_TYPE page_forge_type = [request["forge_type"] for request in data["data"]] assert page_forge_type == ["gitlab"] * NB_FORGES_PER_TYPE @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_add_forge_request_list_datatables_user_requests( client, regular_user, regular_user2 ): create_add_forge_requests(client, regular_user, regular_user2) url = reverse( "add-forge-request-list-datatables", query_params={ "draw": 1, "length": NB_FORGES_PER_TYPE * NB_FORGE_TYPE, "start": 0, "user_requests_only": 1, }, ) client.force_login(regular_user2) resp = check_http_get_response(client, url, status_code=200) data = json.loads(resp.content) assert data["draw"] == 1 assert data["recordsFiltered"] == NB_FORGES_PER_TYPE assert data["recordsTotal"] == NB_FORGE_TYPE * NB_FORGES_PER_TYPE assert len(data["data"]) == NB_FORGES_PER_TYPE page_forge_type = [request["forge_type"] for request in data["data"]] assert page_forge_type == ["gitea"] * NB_FORGES_PER_TYPE diff --git a/swh/web/tests/api/test_api_lookup.py b/swh/web/tests/api/test_api_lookup.py index 03b66850..4b1dc7ec 100644 --- a/swh/web/tests/api/test_api_lookup.py +++ b/swh/web/tests/api/test_api_lookup.py @@ -1,123 +1,123 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import pytest from swh.web.api.views import utils -from swh.web.common.exc import NotFoundExc +from swh.web.utils.exc import NotFoundExc def test_genericapi_lookup_nothing_is_found(): def test_generic_lookup_fn(sha1, another_unused_arg): assert another_unused_arg == "unused_arg" assert sha1 == "sha1" return None notfound_msg = "This will be raised because None is returned." with pytest.raises(NotFoundExc) as e: utils.api_lookup( test_generic_lookup_fn, "sha1", "unused_arg", notfound_msg=notfound_msg ) assert e.match(notfound_msg) def test_generic_api_map_are_enriched_and_transformed_to_list(): def test_generic_lookup_fn_1(criteria0, param0, param1): assert criteria0 == "something" return map(lambda x: x + 1, [1, 2, 3]) actual_result = utils.api_lookup( test_generic_lookup_fn_1, "something", "some param 0", "some param 1", notfound_msg=( "This is not the error message you are looking for. " "Move along." ), enrich_fn=lambda x, request: x * 2, ) assert actual_result == [4, 6, 8] def test_generic_api_list_are_enriched_too(): def test_generic_lookup_fn_2(crit): assert crit == "something" return ["a", "b", "c"] actual_result = utils.api_lookup( test_generic_lookup_fn_2, "something", notfound_msg=( "Not the error message you are looking for, it is. " "Along, you move!" ), enrich_fn=lambda x, request: "".join(["=", x, "="]), ) assert actual_result == ["=a=", "=b=", "=c="] def test_generic_api_generator_are_enriched_and_returned_as_list(): def test_generic_lookup_fn_3(crit): assert crit == "crit" return (i for i in [4, 5, 6]) actual_result = utils.api_lookup( test_generic_lookup_fn_3, "crit", notfound_msg="Move!", enrich_fn=lambda x, request: x - 1, ) assert actual_result == [3, 4, 5] def test_generic_api_simple_data_are_enriched_and_returned_too(): def test_generic_lookup_fn_4(crit): assert crit == "123" return {"a": 10} def test_enrich_data(x, request): x["a"] = x["a"] * 10 return x actual_result = utils.api_lookup( test_generic_lookup_fn_4, "123", notfound_msg="Nothing to do", enrich_fn=test_enrich_data, ) assert actual_result == {"a": 100} def test_api_lookup_not_found(): notfound_msg = "this is the error message raised as it is None" with pytest.raises(NotFoundExc) as e: utils.api_lookup(lambda x: None, "something", notfound_msg=notfound_msg) assert e.match(notfound_msg) def test_api_lookup_with_result(): actual_result = utils.api_lookup( lambda x: x + "!", "something", notfound_msg="this is the error which won't be used here", ) assert actual_result == "something!" def test_api_lookup_with_result_as_map(): actual_result = utils.api_lookup( lambda x: map(lambda y: y + 1, x), [1, 2, 3], notfound_msg="this is the error which won't be used here", ) assert actual_result == [2, 3, 4] diff --git a/swh/web/tests/api/test_apidoc.py b/swh/web/tests/api/test_apidoc.py index 67c17972..e5259773 100644 --- a/swh/web/tests/api/test_apidoc.py +++ b/swh/web/tests/api/test_apidoc.py @@ -1,481 +1,481 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import textwrap import pytest from rest_framework.response import Response from swh.storage.exc import StorageAPIError, StorageDBError from swh.web.api.apidoc import _parse_httpdomain_doc, api_doc from swh.web.api.apiurls import api_route -from swh.web.common.exc import BadInputExc, ForbiddenExc, NotFoundExc -from swh.web.common.utils import prettify_html, reverse -from swh.web.tests.utils import check_api_get_responses, check_html_get_response +from swh.web.tests.helpers import check_api_get_responses, check_html_get_response +from swh.web.utils import prettify_html, reverse +from swh.web.utils.exc import BadInputExc, ForbiddenExc, NotFoundExc _httpdomain_doc = """ .. http:get:: /api/1/revision/(sha1_git)/ Get information about a revision in the archive. Revisions are identified by **sha1** checksums, compatible with Git commit identifiers. See :func:`swh.model.git_objects.revision_git_object` in our data model module for details about how they are computed. :param string sha1_git: hexadecimal representation of the revision **sha1_git** identifier :reqheader Accept: the requested response content type, either ``application/json`` (default) or ``application/yaml`` :resheader Content-Type: this depends on :http:header:`Accept` header of request :json object author: information about the author of the revision :>json object committer: information about the committer of the revision :>json string committer_date: RFC3339 representation of the commit date :>json string date: RFC3339 representation of the revision date :>json string directory: the unique identifier that revision points to :>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>json string id: the revision unique identifier :>json boolean merge: whether or not the revision corresponds to a merge commit :>json string message: the message associated to the revision :>json array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` to get more information about it :>json string type: the type of the revision :statuscode 200: no error :statuscode 400: an invalid **sha1_git** value has been provided :statuscode 404: requested revision can not be found in the archive **Example:** .. parsed-literal:: :swh_web_api:`revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/` """ _exception_http_code = { BadInputExc: 400, ForbiddenExc: 403, NotFoundExc: 404, Exception: 500, StorageAPIError: 503, StorageDBError: 503, } def test_apidoc_nodoc_failure(): with pytest.raises(Exception): @api_doc("/my/nodoc/url/") def apidoc_nodoc_tester(request, arga=0, argb=0): return Response(arga + argb) @api_route(r"/some/(?P[0-9]+)/(?P[0-9]+)/", "api-1-some-doc-route") @api_doc("/some/doc/route/") def apidoc_route(request, myarg, myotherarg, akw=0): """ Sample doc """ return {"result": int(myarg) + int(myotherarg) + akw} def test_apidoc_route_doc(client): url = reverse("api-1-some-doc-route-doc") check_html_get_response(client, url, status_code=200, template_used="apidoc.html") def test_apidoc_route_fn(api_client): url = reverse("api-1-some-doc-route", url_args={"myarg": 1, "myotherarg": 1}) check_api_get_responses(api_client, url, status_code=200) @api_route(r"/test/error/(?P.+)/", "api-1-test-error") @api_doc("/test/error/") def apidoc_test_error_route(request, exc_name): """ Sample doc """ for e in _exception_http_code.keys(): if e.__name__ == exc_name: raise e("Error") def test_apidoc_error(api_client): for exc, code in _exception_http_code.items(): url = reverse("api-1-test-error", url_args={"exc_name": exc.__name__}) check_api_get_responses(api_client, url, status_code=code) @api_route( r"/some/full/(?P[0-9]+)/(?P[0-9]+)/", "api-1-some-complete-doc-route", ) @api_doc("/some/complete/doc/route/") def apidoc_full_stack(request, myarg, myotherarg, akw=0): """ Sample doc """ return {"result": int(myarg) + int(myotherarg) + akw} def test_apidoc_full_stack_doc(client): url = reverse("api-1-some-complete-doc-route-doc") check_html_get_response(client, url, status_code=200, template_used="apidoc.html") def test_apidoc_full_stack_fn(api_client): url = reverse( "api-1-some-complete-doc-route", url_args={"myarg": 1, "myotherarg": 1} ) check_api_get_responses(api_client, url, status_code=200) @api_route(r"/test/post/only/", "api-1-test-post-only", methods=["POST"]) @api_doc("/test/post/only/") def apidoc_test_post_only(request, exc_name): """ Sample doc """ return {"result": "some data"} def test_apidoc_post_only(client): # a dedicated view accepting GET requests should have # been created to display the HTML documentation url = reverse("api-1-test-post-only-doc") check_html_get_response(client, url, status_code=200, template_used="apidoc.html") def test_api_doc_parse_httpdomain(): doc_data = { "description": "", "urls": [], "args": [], "params": [], "resheaders": [], "reqheaders": [], "input_type": "", "inputs": [], "return_type": "", "returns": [], "status_codes": [], "examples": [], } _parse_httpdomain_doc(_httpdomain_doc, doc_data) expected_urls = [ { "rule": "/api/1/revision/ **\\(sha1_git\\)** /", "methods": ["GET", "HEAD", "OPTIONS"], } ] assert "urls" in doc_data assert doc_data["urls"] == expected_urls expected_description = ( "Get information about a revision in the archive. " "Revisions are identified by **sha1** checksums, " "compatible with Git commit identifiers. See " "**swh.model.git_objects.revision_git_object** in " "our data model module for details about how they " "are computed." ) assert "description" in doc_data assert doc_data["description"] == expected_description expected_args = [ { "name": "sha1_git", "type": "string", "doc": ( "hexadecimal representation of the revision " "**sha1_git** identifier" ), } ] assert "args" in doc_data assert doc_data["args"] == expected_args expected_params = [] assert "params" in doc_data assert doc_data["params"] == expected_params expected_reqheaders = [ { "doc": ( "the requested response content type, either " "``application/json`` (default) or ``application/yaml``" ), "name": "Accept", } ] assert "reqheaders" in doc_data assert doc_data["reqheaders"] == expected_reqheaders expected_resheaders = [ {"doc": "this depends on **Accept** header of request", "name": "Content-Type"} ] assert "resheaders" in doc_data assert doc_data["resheaders"] == expected_resheaders expected_statuscodes = [ {"code": "200", "doc": "no error"}, {"code": "400", "doc": "an invalid **sha1_git** value has been provided"}, {"code": "404", "doc": "requested revision can not be found in the archive"}, ] assert "status_codes" in doc_data assert doc_data["status_codes"] == expected_statuscodes expected_input_type = "object" assert "input_type" in doc_data assert doc_data["input_type"] == expected_input_type expected_inputs = [ {"name": "n", "type": "int", "doc": "sample input integer"}, {"name": "s", "type": "string", "doc": "sample input string"}, {"name": "a", "type": "array", "doc": "sample input array"}, ] assert "inputs" in doc_data assert doc_data["inputs"] == expected_inputs expected_return_type = "object" assert "return_type" in doc_data assert doc_data["return_type"] == expected_return_type expected_returns = [ { "name": "author", "type": "object", "doc": "information about the author of the revision", }, { "name": "committer", "type": "object", "doc": "information about the committer of the revision", }, { "name": "committer_date", "type": "string", "doc": "RFC3339 representation of the commit date", }, { "name": "date", "type": "string", "doc": "RFC3339 representation of the revision date", }, { "name": "directory", "type": "string", "doc": "the unique identifier that revision points to", }, { "name": "directory_url", "type": "string", "doc": ( "link to `/api/1/directory/ `_ " "to get information about the directory associated to " "the revision" ), }, {"name": "id", "type": "string", "doc": "the revision unique identifier"}, { "name": "merge", "type": "boolean", "doc": "whether or not the revision corresponds to a merge commit", }, { "name": "message", "type": "string", "doc": "the message associated to the revision", }, { "name": "parents", "type": "array", "doc": ( "the parents of the revision, i.e. the previous revisions " "that head directly to it, each entry of that array " "contains an unique parent revision identifier but also a " "link to `/api/1/revision/ `_ " "to get more information about it" ), }, {"name": "type", "type": "string", "doc": "the type of the revision"}, ] assert "returns" in doc_data assert doc_data["returns"] == expected_returns expected_examples = ["/api/1/revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/"] assert "examples" in doc_data assert doc_data["examples"] == expected_examples @api_route(r"/post/endpoint/", "api-1-post-endpoint", methods=["POST"]) @api_doc("/post/endpoint/") def apidoc_test_post_endpoint(request): """ .. http:post:: /api/1/post/endpoint/ Endpoint documentation :json object : an object whose keys are input SWHIDs and values objects with the following keys: * **known (bool)**: whether the object was found """ pass def test_apidoc_input_output_doc(client): url = reverse("api-1-post-endpoint-doc") rv = check_html_get_response( client, url, status_code=200, template_used="apidoc.html" ) input_html_doc = textwrap.indent( ( '
\n' '
\n' " array\n" "
\n" '
\n' "

\n" " Input array of SWHIDs\n" "

\n" "
\n" "
\n" ), " " * 7, ) output_html_doc = textwrap.indent( ( '
\n' '
\n' " object\n" "
\n" '
\n' "

\n" " an object containing the following keys:\n" "

\n" '
\n' "
\n" "
    \n" "
  • \n" "

    \n" " \n" " <swhid> (object)\n" " \n" " : an object whose keys are input SWHIDs" " and values objects with the following keys:\n" "

    \n" "
    \n" '
      \n' "
    • \n" "

      \n" " \n" " known (bool)\n" " \n" " : whether the object was found\n" "

      \n" "
    • \n" "
    \n" "
    \n" "
  • \n" "
\n" "
\n" "
\n" "
\n" "
\n" ), " " * 7, ) html = prettify_html(rv.content) assert input_html_doc in html assert output_html_doc in html @api_route(r"/endpoint/links/in/doc/", "api-1-endpoint-links-in-doc") @api_doc("/endpoint/links/in/doc/") def apidoc_test_endpoint_with_links_in_doc(request): """ .. http:get:: /api/1/post/endpoint/ Endpoint documentation with links to :http:get:`/api/1/content/[(hash_type):](hash)/`, :http:get:`/api/1/directory/(sha1_git)/[(path)/]` and `archive `_. """ pass def test_apidoc_with_links(client): url = reverse("api-1-endpoint-links-in-doc") rv = check_html_get_response( client, url, status_code=200, template_used="apidoc.html" ) html = prettify_html(rv.content) first_link = textwrap.indent( ( '\n' " /api/1/content/\n" "" ), " " * 9, ) second_link = textwrap.indent( ( '\n' " /api/1/directory/\n" "" ), " " * 9, ) third_link = textwrap.indent( ( '\n' " archive\n" "" ), " " * 9, ) assert first_link in html assert second_link in html assert third_link in html diff --git a/swh/web/tests/api/test_apiresponse.py b/swh/web/tests/api/test_apiresponse.py index 998a3535..d0b1e445 100644 --- a/swh/web/tests/api/test_apiresponse.py +++ b/swh/web/tests/api/test_apiresponse.py @@ -1,187 +1,187 @@ # Copyright (C) 2015-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json from corsheaders.middleware import ( ACCESS_CONTROL_ALLOW_HEADERS, ACCESS_CONTROL_ALLOW_METHODS, ACCESS_CONTROL_ALLOW_ORIGIN, ) from swh.model.swhids import ObjectType from swh.web.api.apiresponse import ( compute_link_header, filter_by_fields, make_api_response, transform, ) -from swh.web.common.identifiers import gen_swhid -from swh.web.common.utils import reverse from swh.web.tests.django_asserts import assert_contains -from swh.web.tests.utils import check_http_get_response, check_http_post_response +from swh.web.tests.helpers import check_http_get_response, check_http_post_response +from swh.web.utils import reverse +from swh.web.utils.identifiers import gen_swhid def test_compute_link_header(): next_link = "/api/endpoint/next" prev_link = "/api/endpoint/prev" rv = { "headers": {"link-next": next_link, "link-prev": prev_link}, "results": [1, 2, 3], } options = {} headers = compute_link_header(rv, options) assert headers == { "Link": (f'<{next_link}>; rel="next",' f'<{prev_link}>; rel="previous"') } def test_compute_link_header_nothing_changed(): rv = {} options = {} headers = compute_link_header(rv, options) assert headers == {} def test_compute_link_header_nothing_changed_2(): rv = {"headers": {}} options = {} headers = compute_link_header(rv, options) assert headers == {} def test_transform_only_return_results_1(): rv = {"results": {"some-key": "some-value"}} assert transform(rv) == {"some-key": "some-value"} def test_transform_only_return_results_2(): rv = {"headers": {"something": "do changes"}, "results": {"some-key": "some-value"}} assert transform(rv) == {"some-key": "some-value"} def test_transform_do_remove_headers(): rv = {"headers": {"something": "do changes"}, "some-key": "some-value"} assert transform(rv) == {"some-key": "some-value"} def test_transform_do_nothing(): rv = {"some-key": "some-value"} assert transform(rv) == {"some-key": "some-value"} def test_swh_multi_response_mimetype(mocker, api_request_factory): mock_shorten_path = mocker.patch("swh.web.api.apiresponse.shorten_path") mock_filter = mocker.patch("swh.web.api.apiresponse.filter_by_fields") mock_json = mocker.patch("swh.web.api.apiresponse.json") data = {"data": [12, 34], "id": "adc83b19e793491b1c6ea0fd8b46cd9f32e592fc"} mock_filter.return_value = data mock_shorten_path.return_value = "my_short_path" mock_json.dumps.return_value = json.dumps(data) accepted_response_formats = { "html": "text/html", "yaml": "application/yaml", "json": "application/json", } for resp_format in accepted_response_formats: request = api_request_factory.get("/api/test/path/") content_type = accepted_response_formats[resp_format] setattr(request, "accepted_media_type", content_type) rv = make_api_response(request, data) mock_filter.assert_called_with(request, data) if resp_format != "html": assert rv.status_code == 200, rv.data assert rv.data == data else: assert rv.status_code == 200, rv.content assert_contains(rv, json.dumps(data)) def test_swh_filter_renderer_do_nothing(api_request_factory): input_data = {"a": "some-data"} request = api_request_factory.get("/api/test/path/", data={}) setattr(request, "query_params", request.GET) actual_data = filter_by_fields(request, input_data) assert actual_data == input_data def test_swh_filter_renderer_do_filter(mocker, api_request_factory): mock_ffk = mocker.patch("swh.web.api.apiresponse.utils.filter_field_keys") mock_ffk.return_value = {"a": "some-data"} request = api_request_factory.get("/api/test/path/", data={"fields": "a,c"}) setattr(request, "query_params", request.GET) input_data = {"a": "some-data", "b": "some-other-data"} actual_data = filter_by_fields(request, input_data) assert actual_data == {"a": "some-data"} mock_ffk.assert_called_once_with(input_data, {"a", "c"}) def test_error_response_handler(mocker, api_client): mock_archive = mocker.patch("swh.web.api.views.stat.archive") mock_archive.stat_counters.side_effect = Exception("Something went wrong") url = reverse("api-1-stat-counters") resp = api_client.get(url) assert resp.status_code == 500 assert "traceback" in resp.data assert "Traceback" in resp.data["traceback"] def test_api_endpoints_have_cors_headers(client, content, directory, revision): url = reverse("api-1-stat-counters") resp = check_http_get_response( client, url, status_code=200, http_origin="https://example.org" ) assert ACCESS_CONTROL_ALLOW_ORIGIN in resp swhids = [ gen_swhid(ObjectType.CONTENT, content["sha1_git"]), gen_swhid(ObjectType.DIRECTORY, directory), gen_swhid(ObjectType.REVISION, revision), ] url = reverse("api-1-known") ac_request_method = "POST" ac_request_headers = "Content-Type" resp = client.options( url, HTTP_ORIGIN="https://example.org", HTTP_ACCESS_CONTROL_REQUEST_METHOD=ac_request_method, HTTP_ACCESS_CONTROL_REQUEST_HEADERS=ac_request_headers, ) assert resp.status_code == 200 assert ACCESS_CONTROL_ALLOW_ORIGIN in resp assert ACCESS_CONTROL_ALLOW_METHODS in resp assert ac_request_method in resp[ACCESS_CONTROL_ALLOW_METHODS] assert ACCESS_CONTROL_ALLOW_HEADERS in resp assert ac_request_headers.lower() in resp[ACCESS_CONTROL_ALLOW_HEADERS] resp = resp = check_http_post_response( client, url, data=swhids, status_code=200, http_origin="https://example.org" ) assert ACCESS_CONTROL_ALLOW_ORIGIN in resp diff --git a/swh/web/tests/api/test_apiurls.py b/swh/web/tests/api/test_apiurls.py index bd8d011b..b2b9fa47 100644 --- a/swh/web/tests/api/test_apiurls.py +++ b/swh/web/tests/api/test_apiurls.py @@ -1,57 +1,57 @@ # Copyright (C) 2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.web.api.apiurls import api_route -from swh.web.common.utils import reverse -from swh.web.tests.utils import check_api_get_responses +from swh.web.tests.helpers import check_api_get_responses +from swh.web.utils import reverse @api_route(r"/some/route/(?P[0-9]+)/", "api-1-some-route") def api_some_route(request, int_arg): return {"result": int(int_arg)} @api_route( r"/never/cache/route/(?P[0-9]+)/", "api-1-never-cache-route", never_cache=True, ) def api_never_cache_route(request, int_arg): return {"result": int(int_arg)} @api_route( r"/never/cache/route/error/", "api-1-never-cache-route-with-error", never_cache=True, ) def api_never_cache_route_with_error(request): raise Exception("error") def test_api_route_with_cache(api_client): url = reverse("api-1-some-route", url_args={"int_arg": 1}) resp = check_api_get_responses(api_client, url, status_code=200) assert resp.data == {"result": 1} assert "Cache-Control" not in resp _cache_control = "max-age=0, no-cache, no-store, must-revalidate" def test_api_route_never_cache(api_client): url = reverse("api-1-never-cache-route", url_args={"int_arg": 1}) resp = check_api_get_responses(api_client, url, status_code=200) assert resp.data == {"result": 1} assert "Cache-Control" in resp assert resp["Cache-Control"].startswith(_cache_control) def test_api_route_never_cache_with_error(api_client): url = reverse("api-1-never-cache-route-with-error") resp = check_api_get_responses(api_client, url, status_code=500) assert "Cache-Control" in resp assert resp["Cache-Control"].startswith(_cache_control) diff --git a/swh/web/tests/api/test_throttling.py b/swh/web/tests/api/test_throttling.py index c0fef1f0..748642ae 100644 --- a/swh/web/tests/api/test_throttling.py +++ b/swh/web/tests/api/test_throttling.py @@ -1,230 +1,230 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import pytest from django.test.utils import override_settings from django.urls import re_path as url from rest_framework.decorators import api_view from rest_framework.response import Response from rest_framework.views import APIView from swh.web.api.throttling import ( API_THROTTLING_EXEMPTED_PERM, SwhWebRateThrottle, SwhWebUserRateThrottle, throttle_scope, ) from swh.web.settings.tests import ( scope1_limiter_rate, scope1_limiter_rate_post, scope2_limiter_rate, scope2_limiter_rate_post, scope3_limiter_rate, scope3_limiter_rate_post, ) -from swh.web.tests.utils import create_django_permission +from swh.web.tests.helpers import create_django_permission from swh.web.urls import urlpatterns class MockViewScope1(APIView): throttle_classes = (SwhWebRateThrottle,) throttle_scope = "scope1" def get(self, request): return Response("foo_get") def post(self, request): return Response("foo_post") @api_view(["GET", "POST"]) @throttle_scope("scope2") def mock_view_scope2(request): if request.method == "GET": return Response("bar_get") elif request.method == "POST": return Response("bar_post") class MockViewScope3(APIView): throttle_classes = (SwhWebRateThrottle,) throttle_scope = "scope3" def get(self, request): return Response("foo_get") def post(self, request): return Response("foo_post") @api_view(["GET", "POST"]) @throttle_scope("scope3") def mock_view_scope3(request): if request.method == "GET": return Response("bar_get") elif request.method == "POST": return Response("bar_post") urlpatterns += [ url(r"^scope1_class/$", MockViewScope1.as_view()), url(r"^scope2_func/$", mock_view_scope2), url(r"^scope3_class/$", MockViewScope3.as_view()), url(r"^scope3_func/$", mock_view_scope3), ] def check_response(response, status_code, limit=None, remaining=None): assert response.status_code == status_code if limit is not None: assert response["X-RateLimit-Limit"] == str(limit) else: assert "X-RateLimit-Limit" not in response if remaining is not None: assert response["X-RateLimit-Remaining"] == str(remaining) else: assert "X-RateLimit-Remaining" not in response @override_settings(ROOT_URLCONF=__name__) def test_scope1_requests_are_throttled(api_client): """ Ensure request rate is limited in scope1 """ for i in range(scope1_limiter_rate): response = api_client.get("/scope1_class/") check_response(response, 200, scope1_limiter_rate, scope1_limiter_rate - i - 1) response = api_client.get("/scope1_class/") check_response(response, 429, scope1_limiter_rate, 0) for i in range(scope1_limiter_rate_post): response = api_client.post("/scope1_class/") check_response( response, 200, scope1_limiter_rate_post, scope1_limiter_rate_post - i - 1 ) response = api_client.post("/scope1_class/") check_response(response, 429, scope1_limiter_rate_post, 0) @override_settings(ROOT_URLCONF=__name__) def test_scope2_requests_are_throttled(api_client): """ Ensure request rate is limited in scope2 """ for i in range(scope2_limiter_rate): response = api_client.get("/scope2_func/") check_response(response, 200, scope2_limiter_rate, scope2_limiter_rate - i - 1) response = api_client.get("/scope2_func/") check_response(response, 429, scope2_limiter_rate, 0) for i in range(scope2_limiter_rate_post): response = api_client.post("/scope2_func/") check_response( response, 200, scope2_limiter_rate_post, scope2_limiter_rate_post - i - 1 ) response = api_client.post("/scope2_func/") check_response(response, 429, scope2_limiter_rate_post, 0) @override_settings(ROOT_URLCONF=__name__) def test_scope3_requests_are_throttled_exempted(api_client): """ Ensure request rate is not limited in scope3 as requests coming from localhost are exempted from rate limit. """ for _ in range(scope3_limiter_rate + 1): response = api_client.get("/scope3_class/") check_response(response, 200) for _ in range(scope3_limiter_rate_post + 1): response = api_client.post("/scope3_class/") check_response(response, 200) for _ in range(scope3_limiter_rate + 1): response = api_client.get("/scope3_func/") check_response(response, 200) for _ in range(scope3_limiter_rate_post + 1): response = api_client.post("/scope3_func/") check_response(response, 200) @override_settings(ROOT_URLCONF=__name__) @pytest.mark.django_db def test_staff_users_are_not_rate_limited(api_client, staff_user): api_client.force_login(staff_user) for _ in range(scope2_limiter_rate + 1): response = api_client.get("/scope2_func/") check_response(response, 200) for _ in range(scope2_limiter_rate_post + 1): response = api_client.post("/scope2_func/") check_response(response, 200) @override_settings(ROOT_URLCONF=__name__) @pytest.mark.django_db def test_non_staff_users_are_rate_limited(api_client, regular_user): api_client.force_login(regular_user) scope2_limiter_rate_user = ( scope2_limiter_rate * SwhWebUserRateThrottle.NUM_REQUESTS_FACTOR ) for i in range(scope2_limiter_rate_user): response = api_client.get("/scope2_func/") check_response( response, 200, scope2_limiter_rate_user, scope2_limiter_rate_user - i - 1 ) response = api_client.get("/scope2_func/") check_response(response, 429, scope2_limiter_rate_user, 0) scope2_limiter_rate_post_user = ( scope2_limiter_rate_post * SwhWebUserRateThrottle.NUM_REQUESTS_FACTOR ) for i in range(scope2_limiter_rate_post_user): response = api_client.post("/scope2_func/") check_response( response, 200, scope2_limiter_rate_post_user, scope2_limiter_rate_post_user - i - 1, ) response = api_client.post("/scope2_func/") check_response(response, 429, scope2_limiter_rate_post_user, 0) @override_settings(ROOT_URLCONF=__name__) @pytest.mark.django_db def test_users_with_throttling_exempted_perm_are_not_rate_limited( api_client, regular_user ): regular_user.user_permissions.add( create_django_permission(API_THROTTLING_EXEMPTED_PERM) ) assert regular_user.has_perm(API_THROTTLING_EXEMPTED_PERM) api_client.force_login(regular_user) for _ in range(scope2_limiter_rate + 1): response = api_client.get("/scope2_func/") check_response(response, 200) for _ in range(scope2_limiter_rate_post + 1): response = api_client.post("/scope2_func/") check_response(response, 200) diff --git a/swh/web/tests/api/test_utils.py b/swh/web/tests/api/test_utils.py index e95b12a5..55ae2960 100644 --- a/swh/web/tests/api/test_utils.py +++ b/swh/web/tests/api/test_utils.py @@ -1,614 +1,614 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import random from swh.model.hashutil import DEFAULT_ALGORITHMS from swh.model.model import Origin from swh.web.api import utils -from swh.web.common.origin_visits import get_origin_visits -from swh.web.common.utils import resolve_branch_alias, reverse +from swh.web.utils import resolve_branch_alias, reverse +from swh.web.utils.origin_visits import get_origin_visits url_map = [ { "rule": "/other/", "methods": set(["GET", "POST", "HEAD"]), "endpoint": "foo", }, { "rule": "/some/old/url/", "methods": set(["GET", "POST"]), "endpoint": "blablafn", }, { "rule": "/other/old/url/", "methods": set(["GET", "HEAD"]), "endpoint": "bar", }, {"rule": "/other", "methods": set([]), "endpoint": None}, {"rule": "/other2", "methods": set([]), "endpoint": None}, ] def test_filter_field_keys_dict_unknown_keys(): actual_res = utils.filter_field_keys( {"directory": 1, "file": 2, "link": 3}, {"directory1", "file2"} ) assert actual_res == {} def test_filter_field_keys_dict(): actual_res = utils.filter_field_keys( {"directory": 1, "file": 2, "link": 3}, {"directory", "link"} ) assert actual_res == {"directory": 1, "link": 3} def test_filter_field_keys_list_unknown_keys(): actual_res = utils.filter_field_keys( [{"directory": 1, "file": 2, "link": 3}, {"1": 1, "2": 2, "link": 3}], {"d"} ) assert actual_res == [{}, {}] def test_filter_field_keys_map(): actual_res = utils.filter_field_keys( map( lambda x: {"i": x["i"] + 1, "j": x["j"]}, [{"i": 1, "j": None}, {"i": 2, "j": None}, {"i": 3, "j": None}], ), {"i"}, ) assert list(actual_res) == [{"i": 2}, {"i": 3}, {"i": 4}] def test_filter_field_keys_list(): actual_res = utils.filter_field_keys( [{"directory": 1, "file": 2, "link": 3}, {"dir": 1, "fil": 2, "lin": 3}], {"directory", "dir"}, ) assert actual_res == [{"directory": 1}, {"dir": 1}] def test_filter_field_keys_other(): input_set = {1, 2} actual_res = utils.filter_field_keys(input_set, {"a", "1"}) assert actual_res == input_set def test_person_to_string(): assert ( utils.person_to_string({"name": "raboof", "email": "foo@bar"}) == "raboof " ) def test_enrich_release_empty(): actual_release = utils.enrich_release({}) assert actual_release == {} def test_enrich_release_content_target(api_request_factory, archive_data, release): release_data = archive_data.release_get(release) release_data["target_type"] = "content" url = reverse("api-1-release", url_args={"sha1_git": release}) request = api_request_factory.get(url) actual_release = utils.enrich_release(release_data, request) release_data["target_url"] = reverse( "api-1-content", url_args={"q": f'sha1_git:{release_data["target"]}'}, request=request, ) assert actual_release == release_data def test_enrich_release_directory_target(api_request_factory, archive_data, release): release_data = archive_data.release_get(release) release_data["target_type"] = "directory" url = reverse("api-1-release", url_args={"sha1_git": release}) request = api_request_factory.get(url) actual_release = utils.enrich_release(release_data, request) release_data["target_url"] = reverse( "api-1-directory", url_args={"sha1_git": release_data["target"]}, request=request, ) assert actual_release == release_data def test_enrich_release_revision_target(api_request_factory, archive_data, release): release_data = archive_data.release_get(release) release_data["target_type"] = "revision" url = reverse("api-1-release", url_args={"sha1_git": release}) request = api_request_factory.get(url) actual_release = utils.enrich_release(release_data, request) release_data["target_url"] = reverse( "api-1-revision", url_args={"sha1_git": release_data["target"]}, request=request ) assert actual_release == release_data def test_enrich_release_release_target(api_request_factory, archive_data, release): release_data = archive_data.release_get(release) release_data["target_type"] = "release" url = reverse("api-1-release", url_args={"sha1_git": release}) request = api_request_factory.get(url) actual_release = utils.enrich_release(release_data, request) release_data["target_url"] = reverse( "api-1-release", url_args={"sha1_git": release_data["target"]}, request=request ) assert actual_release == release_data def test_enrich_directory_entry_no_type(): assert utils.enrich_directory_entry({"id": "dir-id"}) == {"id": "dir-id"} def test_enrich_directory_entry_with_type(api_request_factory, archive_data, directory): dir_content = archive_data.directory_ls(directory) dir_entry = random.choice(dir_content) url = reverse("api-1-directory", url_args={"sha1_git": directory}) request = api_request_factory.get(url) actual_directory = utils.enrich_directory_entry(dir_entry, request) if dir_entry["type"] == "file": dir_entry["target_url"] = reverse( "api-1-content", url_args={"q": f'sha1_git:{dir_entry["target"]}'}, request=request, ) elif dir_entry["type"] == "dir": dir_entry["target_url"] = reverse( "api-1-directory", url_args={"sha1_git": dir_entry["target"]}, request=request, ) elif dir_entry["type"] == "rev": dir_entry["target_url"] = reverse( "api-1-revision", url_args={"sha1_git": dir_entry["target"]}, request=request, ) assert actual_directory == dir_entry def test_enrich_content_without_hashes(): assert utils.enrich_content({"id": "123"}) == {"id": "123"} def test_enrich_content_with_hashes(api_request_factory, content): for algo in DEFAULT_ALGORITHMS: content_data = dict(content) query_string = "%s:%s" % (algo, content_data[algo]) url = reverse("api-1-content", url_args={"q": query_string}) request = api_request_factory.get(url) enriched_content = utils.enrich_content( content_data, query_string=query_string, request=request ) content_data["data_url"] = reverse( "api-1-content-raw", url_args={"q": query_string}, request=request ) content_data["filetype_url"] = reverse( "api-1-content-filetype", url_args={"q": query_string}, request=request ) content_data["language_url"] = reverse( "api-1-content-language", url_args={"q": query_string}, request=request ) content_data["license_url"] = reverse( "api-1-content-license", url_args={"q": query_string}, request=request ) assert enriched_content == content_data def test_enrich_content_with_hashes_and_top_level_url(api_request_factory, content): for algo in DEFAULT_ALGORITHMS: content_data = dict(content) query_string = "%s:%s" % (algo, content_data[algo]) url = reverse("api-1-content", url_args={"q": query_string}) request = api_request_factory.get(url) enriched_content = utils.enrich_content( content_data, query_string=query_string, top_url=True, request=request ) content_data["content_url"] = reverse( "api-1-content", url_args={"q": query_string}, request=request ) content_data["data_url"] = reverse( "api-1-content-raw", url_args={"q": query_string}, request=request ) content_data["filetype_url"] = reverse( "api-1-content-filetype", url_args={"q": query_string}, request=request ) content_data["language_url"] = reverse( "api-1-content-language", url_args={"q": query_string}, request=request ) content_data["license_url"] = reverse( "api-1-content-license", url_args={"q": query_string}, request=request ) assert enriched_content == content_data def test_enrich_revision_without_children_or_parent( api_request_factory, archive_data, revision ): revision_data = archive_data.revision_get(revision) del revision_data["parents"] url = reverse("api-1-revision", url_args={"sha1_git": revision}) request = api_request_factory.get(url) actual_revision = utils.enrich_revision(revision_data, request) revision_data["url"] = reverse( "api-1-revision", url_args={"sha1_git": revision}, request=request ) revision_data["history_url"] = reverse( "api-1-revision-log", url_args={"sha1_git": revision}, request=request ) revision_data["directory_url"] = reverse( "api-1-directory", url_args={"sha1_git": revision_data["directory"]}, request=request, ) assert actual_revision == revision_data def test_enrich_revision_with_children_and_parent_no_dir( api_request_factory, archive_data, revisions_list ): revision, parent_revision, child_revision = revisions_list(size=3) revision_data = archive_data.revision_get(revision) del revision_data["directory"] revision_data["parents"] = revision_data["parents"] + (parent_revision,) revision_data["children"] = child_revision url = reverse("api-1-revision", url_args={"sha1_git": revision}) request = api_request_factory.get(url) actual_revision = utils.enrich_revision(revision_data, request) revision_data["url"] = reverse( "api-1-revision", url_args={"sha1_git": revision}, request=request ) revision_data["history_url"] = reverse( "api-1-revision-log", url_args={"sha1_git": revision}, request=request ) revision_data["parents"] = tuple( { "id": p["id"], "url": reverse( "api-1-revision", url_args={"sha1_git": p["id"]}, request=request ), } for p in revision_data["parents"] ) revision_data["children_urls"] = [ reverse( "api-1-revision", url_args={"sha1_git": child_revision}, request=request ) ] assert actual_revision == revision_data def test_enrich_revisionno_context(api_request_factory, revisions_list): revision, parent_revision, child_revision = revisions_list(size=3) revision_data = { "id": revision, "parents": [parent_revision], "children": [child_revision], } url = reverse("api-1-revision", url_args={"sha1_git": revision}) request = api_request_factory.get(url) actual_revision = utils.enrich_revision(revision_data, request) revision_data["url"] = reverse( "api-1-revision", url_args={"sha1_git": revision}, request=request ) revision_data["history_url"] = reverse( "api-1-revision-log", url_args={"sha1_git": revision}, request=request ) revision_data["parents"] = tuple( { "id": parent_revision, "url": reverse( "api-1-revision", url_args={"sha1_git": parent_revision}, request=request, ), } ) revision_data["children_urls"] = [ reverse( "api-1-revision", url_args={"sha1_git": child_revision}, request=request ) ] assert actual_revision == revision_data def test_enrich_revision_with_no_message( api_request_factory, archive_data, revisions_list ): revision, parent_revision, child_revision = revisions_list(size=3) revision_data = archive_data.revision_get(revision) revision_data["message"] = None revision_data["parents"] = revision_data["parents"] + (parent_revision,) revision_data["children"] = child_revision url = reverse("api-1-revision", url_args={"sha1_git": revision}) request = api_request_factory.get(url) actual_revision = utils.enrich_revision(revision_data, request) revision_data["url"] = reverse( "api-1-revision", url_args={"sha1_git": revision}, request=request ) revision_data["directory_url"] = reverse( "api-1-directory", url_args={"sha1_git": revision_data["directory"]}, request=request, ) revision_data["history_url"] = reverse( "api-1-revision-log", url_args={"sha1_git": revision}, request=request ) revision_data["parents"] = tuple( { "id": p["id"], "url": reverse( "api-1-revision", url_args={"sha1_git": p["id"]}, request=request ), } for p in revision_data["parents"] ) revision_data["children_urls"] = [ reverse( "api-1-revision", url_args={"sha1_git": child_revision}, request=request ) ] assert actual_revision == revision_data def test_enrich_revision_with_invalid_message( api_request_factory, archive_data, revisions_list ): revision, parent_revision, child_revision = revisions_list(size=3) revision_data = archive_data.revision_get(revision) revision_data["decoding_failures"] = ["message"] revision_data["parents"] = revision_data["parents"] + (parent_revision,) revision_data["children"] = child_revision url = reverse("api-1-revision", url_args={"sha1_git": revision}) request = api_request_factory.get(url) actual_revision = utils.enrich_revision(revision_data, request) revision_data["url"] = reverse( "api-1-revision", url_args={"sha1_git": revision}, request=request ) revision_data["message_url"] = reverse( "api-1-revision-raw-message", url_args={"sha1_git": revision}, request=request ) revision_data["directory_url"] = reverse( "api-1-directory", url_args={"sha1_git": revision_data["directory"]}, request=request, ) revision_data["history_url"] = reverse( "api-1-revision-log", url_args={"sha1_git": revision}, request=request ) revision_data["parents"] = tuple( { "id": p["id"], "url": reverse( "api-1-revision", url_args={"sha1_git": p["id"]}, request=request ), } for p in revision_data["parents"] ) revision_data["children_urls"] = [ reverse( "api-1-revision", url_args={"sha1_git": child_revision}, request=request ) ] assert actual_revision == revision_data def test_enrich_snapshot(api_request_factory, archive_data, snapshot): snapshot_data = archive_data.snapshot_get(snapshot) url = reverse("api-1-snapshot", url_args={"snapshot_id": snapshot}) request = api_request_factory.get(url) actual_snapshot = utils.enrich_snapshot(snapshot_data, request) for _, b in snapshot_data["branches"].items(): if b["target_type"] in ("directory", "revision", "release"): b["target_url"] = reverse( f'api-1-{b["target_type"]}', url_args={"sha1_git": b["target"]}, request=request, ) elif b["target_type"] == "content": b["target_url"] = reverse( "api-1-content", url_args={"q": f'sha1_git:{b["target"]}'}, request=request, ) for _, b in snapshot_data["branches"].items(): if b["target_type"] == "alias": target = resolve_branch_alias(snapshot_data, b) b["target_url"] = target["target_url"] assert actual_snapshot == snapshot_data def test_enrich_origin(api_request_factory, origin): url = reverse("api-1-origin", url_args={"origin_url": origin["url"]}) request = api_request_factory.get(url) origin_data = {"url": origin["url"]} actual_origin = utils.enrich_origin(origin_data, request) origin_data["origin_visits_url"] = reverse( "api-1-origin-visits", url_args={"origin_url": origin["url"]}, request=request ) origin_data["metadata_authorities_url"] = reverse( "api-1-raw-extrinsic-metadata-swhid-authorities", url_args={"target": Origin(url=origin["url"]).swhid()}, request=request, ) assert actual_origin == origin_data def test_enrich_origin_search_result(api_request_factory, origin): url = reverse("api-1-origin-search", url_args={"url_pattern": origin["url"]}) request = api_request_factory.get(url) origin_visits_url = reverse( "api-1-origin-visits", url_args={"origin_url": origin["url"]}, request=request ) metadata_authorities_url = reverse( "api-1-raw-extrinsic-metadata-swhid-authorities", url_args={"target": Origin(url=origin["url"]).swhid()}, request=request, ) origin_search_result_data = ( [{"url": origin["url"]}], None, ) enriched_origin_search_result = ( [ { "url": origin["url"], "origin_visits_url": origin_visits_url, "metadata_authorities_url": metadata_authorities_url, } ], None, ) assert ( utils.enrich_origin_search_result(origin_search_result_data, request=request) == enriched_origin_search_result ) def test_enrich_origin_visit(api_request_factory, origin): origin_visit = random.choice(get_origin_visits(origin)) url = reverse( "api-1-origin-visit", url_args={"origin_url": origin["url"], "visit_id": origin_visit["visit"]}, ) request = api_request_factory.get(url) actual_origin_visit = utils.enrich_origin_visit( origin_visit, with_origin_link=True, with_origin_visit_link=True, request=request, ) origin_visit["origin_url"] = reverse( "api-1-origin", url_args={"origin_url": origin["url"]}, request=request ) origin_visit["origin_visit_url"] = reverse( "api-1-origin-visit", url_args={"origin_url": origin["url"], "visit_id": origin_visit["visit"]}, request=request, ) origin_visit["snapshot_url"] = reverse( "api-1-snapshot", url_args={"snapshot_id": origin_visit["snapshot"]}, request=request, ) assert actual_origin_visit == origin_visit diff --git a/swh/web/tests/api/views/test_content.py b/swh/web/tests/api/views/test_content.py index 367ab870..63deb7da 100644 --- a/swh/web/tests/api/views/test_content.py +++ b/swh/web/tests/api/views/test_content.py @@ -1,251 +1,251 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import pytest -from swh.web.common.utils import reverse from swh.web.tests.conftest import fossology_missing from swh.web.tests.data import random_content -from swh.web.tests.utils import ( +from swh.web.tests.helpers import ( check_api_get_responses, check_api_post_responses, check_http_get_response, ) +from swh.web.utils import reverse def test_api_content_filetype(api_client, indexer_data, content): indexer_data.content_add_mimetype(content["sha1"]) url = reverse( "api-1-content-filetype", url_args={"q": "sha1_git:%s" % content["sha1_git"]} ) rv = check_api_get_responses(api_client, url, status_code=200) content_url = reverse( "api-1-content", url_args={"q": "sha1:%s" % content["sha1"]}, request=rv.wsgi_request, ) expected_data = indexer_data.content_get_mimetype(content["sha1"]) expected_data["content_url"] = content_url assert rv.data == expected_data def test_api_content_filetype_sha_not_found(api_client): unknown_content_ = random_content() url = reverse( "api-1-content-filetype", url_args={"q": "sha1:%s" % unknown_content_["sha1"]} ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "No filetype information found for content " "sha1:%s." % unknown_content_["sha1"], } def test_api_content_language_sha_not_found(api_client): unknown_content_ = random_content() url = reverse( "api-1-content-language", url_args={"q": "sha1:%s" % unknown_content_["sha1"]} ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "No language information found for content " "sha1:%s." % unknown_content_["sha1"], } @pytest.mark.skipif(fossology_missing, reason="requires fossology-nomossa installed") def test_api_content_license(api_client, indexer_data, content): indexer_data.content_add_license(content["sha1"]) url = reverse( "api-1-content-license", url_args={"q": "sha1_git:%s" % content["sha1_git"]} ) rv = check_api_get_responses(api_client, url, status_code=200) content_url = reverse( "api-1-content", url_args={"q": "sha1:%s" % content["sha1"]}, request=rv.wsgi_request, ) expected_data = list(indexer_data.content_get_license(content["sha1"])) for license in expected_data: del license["id"] assert rv.data == { "content_url": content_url, "id": content["sha1"], "facts": expected_data, } def test_api_content_license_sha_not_found(api_client): unknown_content_ = random_content() url = reverse( "api-1-content-license", url_args={"q": "sha1:%s" % unknown_content_["sha1"]} ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "No license information found for content " "sha1:%s." % unknown_content_["sha1"], } def test_api_content_metadata(api_client, archive_data, content): url = reverse("api-1-content", {"q": "sha1:%s" % content["sha1"]}) rv = check_api_get_responses(api_client, url, status_code=200) expected_data = archive_data.content_get(content["sha1"]) for key, view_name in ( ("data_url", "api-1-content-raw"), ("license_url", "api-1-content-license"), ("language_url", "api-1-content-language"), ("filetype_url", "api-1-content-filetype"), ): expected_data[key] = reverse( view_name, url_args={"q": "sha1:%s" % content["sha1"]}, request=rv.wsgi_request, ) assert rv.data == expected_data def test_api_content_not_found(api_client): unknown_content_ = random_content() url = reverse("api-1-content", url_args={"q": "sha1:%s" % unknown_content_["sha1"]}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Content with sha1 checksum equals to %s not found!" % unknown_content_["sha1"], } def test_api_content_raw_ko_not_found(api_client): unknown_content_ = random_content() url = reverse( "api-1-content-raw", url_args={"q": "sha1:%s" % unknown_content_["sha1"]} ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Content with sha1 checksum equals to %s not found!" % unknown_content_["sha1"], } def test_api_content_raw_text(api_client, archive_data, content): url = reverse("api-1-content-raw", url_args={"q": "sha1:%s" % content["sha1"]}) rv = check_http_get_response(api_client, url, status_code=200) assert rv["Content-Type"] == "application/octet-stream" assert ( rv["Content-disposition"] == 'attachment; filename="content_sha1_%s_raw"' % content["sha1"] ) expected_data = archive_data.content_get_data(content["sha1"]) assert b"".join(rv.streaming_content) == expected_data["data"] assert int(rv["Content-Length"]) == len(expected_data["data"]) def test_api_content_raw_text_with_filename(api_client, archive_data, content): url = reverse( "api-1-content-raw", url_args={"q": "sha1:%s" % content["sha1"]}, query_params={"filename": "filename.txt"}, ) rv = check_http_get_response(api_client, url, status_code=200) assert rv["Content-disposition"] == 'attachment; filename="filename.txt"' assert rv["Content-Type"] == "application/octet-stream" expected_data = archive_data.content_get_data(content["sha1"]) assert b"".join(rv.streaming_content) == expected_data["data"] assert int(rv["Content-Length"]) == len(expected_data["data"]) @pytest.mark.parametrize( "encoded,expected", [ # From https://datatracker.ietf.org/doc/html/rfc5987#section-3.2.2 ( "%c2%a3%20and%20%e2%82%ac%20rates.txt", "%C2%A3%20and%20%E2%82%AC%20rates.txt", ), ("%A3%20rates.txt", "%EF%BF%BD%20rates.txt"), # found in the wild ( "Th%C3%A9orie%20de%20sant%C3%A9-aide-justice.pdf", "Th%C3%A9orie%20de%20sant%C3%A9-aide-justice.pdf", ), ], ) def test_api_content_raw_text_with_nonascii_filename( api_client, archive_data, content, encoded, expected ): url = reverse( "api-1-content-raw", url_args={"q": "sha1:%s" % content["sha1"]}, ) rv = check_http_get_response( api_client, f"{url}?filename={encoded}", status_code=200 ) # technically, ISO8859-1 is allowed too assert rv["Content-disposition"].isascii(), rv["Content-disposition"] assert rv["Content-disposition"] == ( f"attachment; filename*=utf-8''{expected}" ), rv["Content-disposition"] assert rv["Content-Type"] == "application/octet-stream" expected_data = archive_data.content_get_data(content["sha1"]) assert b"".join(rv.streaming_content) == expected_data["data"] assert int(rv["Content-Length"]) == len(expected_data["data"]) def test_api_check_content_known(api_client, content): url = reverse("api-1-content-known", url_args={"q": content["sha1"]}) rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == { "search_res": [{"found": True, "sha1": content["sha1"]}], "search_stats": {"nbfiles": 1, "pct": 100.0}, } def test_api_check_content_known_post(api_client, content): url = reverse("api-1-content-known") rv = check_api_post_responses( api_client, url, data={"q": content["sha1"]}, status_code=200 ) assert rv.data == { "search_res": [{"found": True, "sha1": content["sha1"]}], "search_stats": {"nbfiles": 1, "pct": 100.0}, } def test_api_check_content_known_not_found(api_client): unknown_content_ = random_content() url = reverse("api-1-content-known", url_args={"q": unknown_content_["sha1"]}) rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == { "search_res": [{"found": False, "sha1": unknown_content_["sha1"]}], "search_stats": {"nbfiles": 1, "pct": 0.0}, } def test_api_content_uppercase(api_client, content): url = reverse( "api-1-content-uppercase-checksum", url_args={"q": content["sha1"].upper()} ) rv = check_http_get_response(api_client, url, status_code=302) redirect_url = reverse("api-1-content", url_args={"q": content["sha1"]}) assert rv["location"] == redirect_url diff --git a/swh/web/tests/api/views/test_directory.py b/swh/web/tests/api/views/test_directory.py index 9cdbfd5e..37fa96b7 100644 --- a/swh/web/tests/api/views/test_directory.py +++ b/swh/web/tests/api/views/test_directory.py @@ -1,73 +1,73 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import random from swh.web.api.utils import enrich_directory_entry -from swh.web.common.utils import reverse from swh.web.tests.data import random_sha1 -from swh.web.tests.utils import check_api_get_responses, check_http_get_response +from swh.web.tests.helpers import check_api_get_responses, check_http_get_response +from swh.web.utils import reverse def test_api_directory(api_client, archive_data, directory): url = reverse("api-1-directory", url_args={"sha1_git": directory}) rv = check_api_get_responses(api_client, url, status_code=200) dir_content = list(archive_data.directory_ls(directory)) expected_data = list( map(enrich_directory_entry, dir_content, [rv.wsgi_request] * len(dir_content)) ) assert rv.data == expected_data def test_api_directory_not_found(api_client): unknown_directory_ = random_sha1() url = reverse("api-1-directory", url_args={"sha1_git": unknown_directory_}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Directory with sha1_git %s not found" % unknown_directory_, } def test_api_directory_with_path_found(api_client, archive_data, directory): directory_content = archive_data.directory_ls(directory) dir_entry = random.choice(directory_content) url = reverse( "api-1-directory", url_args={"sha1_git": directory, "path": dir_entry["name"]} ) rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == enrich_directory_entry(dir_entry, rv.wsgi_request) def test_api_directory_with_path_not_found(api_client, directory): path = "some/path/to/nonexistent/dir/" url = reverse("api-1-directory", url_args={"sha1_git": directory, "path": path}) rv = check_api_get_responses(api_client, url, status_code=404) reason = ( f"Directory entry with path {path} from root directory {directory} not found" ) assert rv.data == { "exception": "NotFoundExc", "reason": reason, } def test_api_directory_uppercase(api_client, directory): url = reverse( "api-1-directory-uppercase-checksum", url_args={"sha1_git": directory.upper()} ) resp = check_http_get_response(api_client, url, status_code=302) redirect_url = reverse("api-1-directory", url_args={"sha1_git": directory}) assert resp["location"] == redirect_url diff --git a/swh/web/tests/api/views/test_graph.py b/swh/web/tests/api/views/test_graph.py index 579c8a06..19665fee 100644 --- a/swh/web/tests/api/views/test_graph.py +++ b/swh/web/tests/api/views/test_graph.py @@ -1,429 +1,429 @@ # Copyright (C) 2021-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import hashlib import re import textwrap from urllib.parse import unquote, urlparse import pytest from django.http.response import StreamingHttpResponse from swh.model.hashutil import hash_to_bytes from swh.model.swhids import ExtendedObjectType, ExtendedSWHID from swh.web.api.views.graph import API_GRAPH_PERM -from swh.web.common.utils import reverse from swh.web.config import SWH_WEB_INTERNAL_SERVER_NAME, get_config -from swh.web.tests.utils import check_http_get_response +from swh.web.tests.helpers import check_http_get_response +from swh.web.utils import reverse def test_graph_endpoint_no_authentication_for_vpn_users(api_client, requests_mock): graph_query = "stats" url = reverse("api-1-graph", url_args={"graph_query": graph_query}) requests_mock.get( get_config()["graph"]["server_url"] + graph_query, json={}, headers={"Content-Type": "application/json"}, ) check_http_get_response( api_client, url, status_code=200, server_name=SWH_WEB_INTERNAL_SERVER_NAME ) def test_graph_endpoint_needs_authentication(api_client): url = reverse("api-1-graph", url_args={"graph_query": "stats"}) check_http_get_response(api_client, url, status_code=401) def _authenticate_graph_user(api_client, keycloak_oidc, is_staff=False): keycloak_oidc.client_permissions = [API_GRAPH_PERM] if is_staff: keycloak_oidc.user_groups = ["/staff"] oidc_profile = keycloak_oidc.login() api_client.credentials(HTTP_AUTHORIZATION=f"Bearer {oidc_profile['refresh_token']}") def test_graph_endpoint_needs_permission(api_client, keycloak_oidc, requests_mock): graph_query = "stats" url = reverse("api-1-graph", url_args={"graph_query": graph_query}) oidc_profile = keycloak_oidc.login() api_client.credentials(HTTP_AUTHORIZATION=f"Bearer {oidc_profile['refresh_token']}") check_http_get_response(api_client, url, status_code=403) _authenticate_graph_user(api_client, keycloak_oidc) requests_mock.get( get_config()["graph"]["server_url"] + graph_query, json={}, headers={"Content-Type": "application/json"}, ) check_http_get_response(api_client, url, status_code=200) def test_graph_text_plain_response(api_client, keycloak_oidc, requests_mock): _authenticate_graph_user(api_client, keycloak_oidc) graph_query = "leaves/swh:1:dir:432d1b21c1256f7408a07c577b6974bbdbcc1323" response_text = textwrap.dedent( """\ swh:1:cnt:1d3dace0a825b0535c37c53ed669ef817e9c1b47 swh:1:cnt:6d5b280f4e33589ae967a7912a587dd5cb8dedaa swh:1:cnt:91bef238bf01356a550d416d14bb464c576ac6f4 swh:1:cnt:58a8b925a463b87d49639fda282b8f836546e396 swh:1:cnt:fd32ee0a87e16ccc853dfbeb7018674f9ce008c0 swh:1:cnt:ab7c39871872589a4fc9e249ebc927fb1042c90d swh:1:cnt:93073c02bf3869845977527de16af4d54765838d swh:1:cnt:4251f795b52c54c447a97c9fe904d8b1f993b1e0 swh:1:cnt:c6e7055424332006d07876ffeba684e7e284b383 swh:1:cnt:8459d8867dc3b15ef7ae9683e21cccc9ab2ec887 swh:1:cnt:5f9981d52202815aa947f85b9dfa191b66f51138 swh:1:cnt:00a685ec51bcdf398c15d588ecdedb611dbbab4b swh:1:cnt:e1cf1ea335106a0197a2f92f7804046425a7d3eb swh:1:cnt:07069b38087f88ec192d2c9aff75a502476fd17d swh:1:cnt:f045ee845c7f14d903a2c035b2691a7c400c01f0 """ ) requests_mock.get( get_config()["graph"]["server_url"] + graph_query, text=response_text, headers={"Content-Type": "text/plain", "Transfer-Encoding": "chunked"}, ) url = reverse("api-1-graph", url_args={"graph_query": graph_query}) resp = check_http_get_response( api_client, url, status_code=200, content_type="text/plain" ) assert isinstance(resp, StreamingHttpResponse) assert b"".join(resp.streaming_content) == response_text.encode() _response_json = { "counts": {"nodes": 17075708289, "edges": 196236587976}, "ratios": { "compression": 0.16, "bits_per_node": 58.828, "bits_per_edge": 5.119, "avg_locality": 2184278529.729, }, "indegree": {"min": 0, "max": 263180117, "avg": 11.4921492364925}, "outdegree": {"min": 0, "max": 1033207, "avg": 11.4921492364925}, } def test_graph_json_response(api_client, keycloak_oidc, requests_mock): _authenticate_graph_user(api_client, keycloak_oidc) graph_query = "stats" requests_mock.get( get_config()["graph"]["server_url"] + graph_query, json=_response_json, headers={"Content-Type": "application/json"}, ) url = reverse("api-1-graph", url_args={"graph_query": graph_query}) resp = check_http_get_response(api_client, url, status_code=200) assert resp.content_type == "application/json" assert resp.data == _response_json def test_graph_ndjson_response(api_client, keycloak_oidc, requests_mock): _authenticate_graph_user(api_client, keycloak_oidc) graph_query = "visit/paths/swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb" response_ndjson = textwrap.dedent( """\ ["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb",\ "swh:1:cnt:acfb7cabd63b368a03a9df87670ece1488c8bce0"] ["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb",\ "swh:1:cnt:2a0837708151d76edf28fdbb90dc3eabc676cff3"] ["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb",\ "swh:1:cnt:eaf025ad54b94b2fdda26af75594cfae3491ec75"] """ ) requests_mock.get( get_config()["graph"]["server_url"] + graph_query, text=response_ndjson, headers={ "Content-Type": "application/x-ndjson", "Transfer-Encoding": "chunked", }, ) url = reverse("api-1-graph", url_args={"graph_query": graph_query}) resp = check_http_get_response(api_client, url, status_code=200) assert isinstance(resp, StreamingHttpResponse) assert resp["Content-Type"] == "application/x-ndjson" assert b"".join(resp.streaming_content) == response_ndjson.encode() def test_graph_response_resolve_origins( archive_data, api_client, keycloak_oidc, requests_mock, origin ): hasher = hashlib.sha1() hasher.update(origin["url"].encode()) origin_sha1 = hasher.digest() origin_swhid = str( ExtendedSWHID(object_type=ExtendedObjectType.ORIGIN, object_id=origin_sha1) ) snapshot = archive_data.snapshot_get_latest(origin["url"])["id"] snapshot_swhid = str( ExtendedSWHID( object_type=ExtendedObjectType.SNAPSHOT, object_id=hash_to_bytes(snapshot) ) ) _authenticate_graph_user(api_client, keycloak_oidc) for graph_query, response_text, content_type in ( ( f"visit/nodes/{snapshot_swhid}", f"{snapshot_swhid}\n{origin_swhid}\n", "text/plain", ), ( f"visit/edges/{snapshot_swhid}", f"{snapshot_swhid} {origin_swhid}\n", "text/plain", ), ( f"visit/paths/{snapshot_swhid}", f'["{snapshot_swhid}", "{origin_swhid}"]\n', "application/x-ndjson", ), ): # set two lines response to check resolved origins cache response_text = response_text + response_text requests_mock.get( get_config()["graph"]["server_url"] + graph_query, text=response_text, headers={"Content-Type": content_type, "Transfer-Encoding": "chunked"}, ) url = reverse( "api-1-graph", url_args={"graph_query": graph_query}, query_params={"direction": "backward"}, ) resp = check_http_get_response(api_client, url, status_code=200) assert isinstance(resp, StreamingHttpResponse) assert resp["Content-Type"] == content_type assert b"".join(resp.streaming_content) == response_text.encode() url = reverse( "api-1-graph", url_args={"graph_query": graph_query}, query_params={"direction": "backward", "resolve_origins": "true"}, ) resp = check_http_get_response(api_client, url, status_code=200) assert isinstance(resp, StreamingHttpResponse) assert resp["Content-Type"] == content_type assert ( b"".join(resp.streaming_content) == response_text.replace(origin_swhid, origin["url"]).encode() ) def test_graph_response_resolve_origins_nothing_to_do( api_client, keycloak_oidc, requests_mock ): _authenticate_graph_user(api_client, keycloak_oidc) graph_query = "stats" requests_mock.get( get_config()["graph"]["server_url"] + graph_query, json=_response_json, headers={"Content-Type": "application/json"}, ) url = reverse( "api-1-graph", url_args={"graph_query": graph_query}, query_params={"resolve_origins": "true"}, ) resp = check_http_get_response(api_client, url, status_code=200) assert resp.content_type == "application/json" assert resp.data == _response_json def test_graph_response_invalid_accept_header(api_client): url = reverse( "api-1-graph", url_args={"graph_query": "stats"}, query_params={"resolve_origins": "true"}, ) resp = api_client.get(url, HTTP_ACCEPT="text/html") assert resp.status_code == 406 assert resp.content_type == "application/json" assert resp.data["exception"] == "NotAcceptable" assert resp.data["reason"] == "Could not satisfy the request Accept header." def test_graph_error_response(api_client, keycloak_oidc, requests_mock): _authenticate_graph_user(api_client, keycloak_oidc) graph_query = "foo" error_message = "Not found" content_type = "text/plain" requests_mock.get( get_config()["graph"]["server_url"] + graph_query, text=error_message, headers={"Content-Type": content_type}, status_code=404, ) url = reverse("api-1-graph", url_args={"graph_query": graph_query}) resp = check_http_get_response(api_client, url, status_code=404) assert resp.content_type == content_type assert resp.content == f'"{error_message}"'.encode() @pytest.mark.parametrize( "graph_query, query_params, expected_graph_query_params", [ ("stats", {}, ""), ("stats", {"resolve_origins": "true"}, "resolve_origins=true"), ("stats?a=1", {}, "a=1"), ("stats%3Fb=2", {}, "b=2"), ("stats?a=1", {"resolve_origins": "true"}, "a=1&resolve_origins=true"), ("stats%3Fb=2", {"resolve_origins": "true"}, "b=2&resolve_origins=true"), ("stats/?a=1", {"a": "2"}, "a=1&a=2"), ("stats/%3Fa=1", {"a": "2"}, "a=1&a=2"), ], ) def test_graph_query_params( api_client, keycloak_oidc, requests_mock, graph_query, query_params, expected_graph_query_params, ): _authenticate_graph_user(api_client, keycloak_oidc) requests_mock.get( re.compile(get_config()["graph"]["server_url"]), json=_response_json, headers={"Content-Type": "application/json"}, ) url = reverse( "api-1-graph", url_args={"graph_query": graph_query}, query_params=query_params, ) check_http_get_response(api_client, url, status_code=200) url = requests_mock.request_history[0].url parsed_url = urlparse(url) assert parsed_url.path == f"/graph/{unquote(graph_query).split('?')[0]}" assert expected_graph_query_params in parsed_url.query @pytest.mark.django_db # for authentication def test_graph_endpoint_max_edges_settings(api_client, keycloak_oidc, requests_mock): graph_config = get_config()["graph"] graph_query = "stats" url = reverse("api-1-graph", url_args={"graph_query": graph_query}) requests_mock.get( get_config()["graph"]["server_url"] + graph_query, json={}, headers={"Content-Type": "application/json"}, ) # currently unauthenticated user can only use the graph endpoint from # Software Heritage VPN check_http_get_response( api_client, url, status_code=200, server_name=SWH_WEB_INTERNAL_SERVER_NAME ) assert ( f"max_edges={graph_config['max_edges']['anonymous']}" in requests_mock.request_history[0].url ) # standard user _authenticate_graph_user(api_client, keycloak_oidc) check_http_get_response( api_client, url, status_code=200, ) assert ( f"max_edges={graph_config['max_edges']['user']}" in requests_mock.request_history[1].url ) # staff user _authenticate_graph_user(api_client, keycloak_oidc, is_staff=True) check_http_get_response( api_client, url, status_code=200, ) assert ( f"max_edges={graph_config['max_edges']['staff']}" in requests_mock.request_history[2].url ) def test_graph_endpoint_max_edges_query_parameter_value( api_client, keycloak_oidc, requests_mock ): graph_config = get_config()["graph"] graph_query = "stats" requests_mock.get( get_config()["graph"]["server_url"] + graph_query, json={}, headers={"Content-Type": "application/json"}, ) _authenticate_graph_user(api_client, keycloak_oidc) max_edges_max_value = graph_config["max_edges"]["user"] max_edges = max_edges_max_value // 2 url = reverse( "api-1-graph", url_args={"graph_query": graph_query}, query_params={"max_edges": max_edges}, ) check_http_get_response( api_client, url, status_code=200, ) assert f"max_edges={max_edges}" in requests_mock.request_history[0].url max_edges = max_edges_max_value * 2 url = reverse( "api-1-graph", url_args={"graph_query": graph_query}, query_params={"max_edges": max_edges}, ) check_http_get_response( api_client, url, status_code=200, ) assert f"max_edges={max_edges_max_value}" in requests_mock.request_history[1].url diff --git a/swh/web/tests/api/views/test_identifiers.py b/swh/web/tests/api/views/test_identifiers.py index 456c9faa..bc3aad67 100644 --- a/swh/web/tests/api/views/test_identifiers.py +++ b/swh/web/tests/api/views/test_identifiers.py @@ -1,184 +1,184 @@ # Copyright (C) 2018-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.model.swhids import ObjectType -from swh.web.common.identifiers import gen_swhid -from swh.web.common.utils import reverse from swh.web.tests.data import random_sha1 -from swh.web.tests.utils import check_api_get_responses, check_api_post_responses +from swh.web.tests.helpers import check_api_get_responses, check_api_post_responses +from swh.web.utils import reverse +from swh.web.utils.identifiers import gen_swhid def test_swhid_resolve_success( api_client, content, directory, origin, release, revision, snapshot ): for obj_type, obj_id in ( (ObjectType.CONTENT, content["sha1_git"]), (ObjectType.DIRECTORY, directory), (ObjectType.RELEASE, release), (ObjectType.REVISION, revision), (ObjectType.SNAPSHOT, snapshot), ): swhid = gen_swhid(obj_type, obj_id, metadata={"origin": origin["url"]}) url = reverse("api-1-resolve-swhid", url_args={"swhid": swhid}) resp = check_api_get_responses(api_client, url, status_code=200) if obj_type == ObjectType.CONTENT: url_args = {"query_string": "sha1_git:%s" % obj_id} elif obj_type == ObjectType.SNAPSHOT: url_args = {"snapshot_id": obj_id} else: url_args = {"sha1_git": obj_id} obj_type_str = obj_type.name.lower() browse_rev_url = reverse( f"browse-{obj_type_str}", url_args=url_args, query_params={"origin_url": origin["url"]}, request=resp.wsgi_request, ) expected_result = { "browse_url": browse_rev_url, "metadata": {"origin": origin["url"]}, "namespace": "swh", "object_id": obj_id, "object_type": obj_type_str, "scheme_version": 1, } assert resp.data == expected_result def test_swhid_resolve_invalid(api_client): rev_id_invalid = "96db9023b8_foo_50d6c108e9a3" swhid = "swh:1:rev:%s" % rev_id_invalid url = reverse("api-1-resolve-swhid", url_args={"swhid": swhid}) check_api_get_responses(api_client, url, status_code=400) def test_swhid_resolve_not_found( api_client, unknown_content, unknown_directory, unknown_release, unknown_revision, unknown_snapshot, ): for obj_type, obj_id in ( (ObjectType.CONTENT, unknown_content["sha1_git"]), (ObjectType.DIRECTORY, unknown_directory), (ObjectType.RELEASE, unknown_release), (ObjectType.REVISION, unknown_revision), (ObjectType.SNAPSHOT, unknown_snapshot), ): swhid = gen_swhid(obj_type, obj_id) url = reverse("api-1-resolve-swhid", url_args={"swhid": swhid}) check_api_get_responses(api_client, url, status_code=404) def test_swh_origin_id_not_resolvable(api_client): ori_swhid = "swh:1:ori:8068d0075010b590762c6cb5682ed53cb3c13deb" url = reverse("api-1-resolve-swhid", url_args={"swhid": ori_swhid}) check_api_get_responses(api_client, url, status_code=400) def test_api_known_swhid_all_present( api_client, content, directory, release, revision, snapshot ): input_swhids = [ gen_swhid(ObjectType.CONTENT, content["sha1_git"]), gen_swhid(ObjectType.DIRECTORY, directory), gen_swhid(ObjectType.REVISION, revision), gen_swhid(ObjectType.RELEASE, release), gen_swhid(ObjectType.SNAPSHOT, snapshot), ] url = reverse("api-1-known") resp = check_api_post_responses(api_client, url, data=input_swhids, status_code=200) assert resp.data == {swhid: {"known": True} for swhid in input_swhids} def test_api_known_swhid_some_present(api_client, content, directory): content_ = gen_swhid(ObjectType.CONTENT, content["sha1_git"]) directory_ = gen_swhid(ObjectType.DIRECTORY, directory) unknown_revision_ = gen_swhid(ObjectType.REVISION, random_sha1()) unknown_release_ = gen_swhid(ObjectType.RELEASE, random_sha1()) unknown_snapshot_ = gen_swhid(ObjectType.SNAPSHOT, random_sha1()) input_swhids = [ content_, directory_, unknown_revision_, unknown_release_, unknown_snapshot_, ] url = reverse("api-1-known") resp = check_api_post_responses(api_client, url, data=input_swhids, status_code=200) assert resp.data == { content_: {"known": True}, directory_: {"known": True}, unknown_revision_: {"known": False}, unknown_release_: {"known": False}, unknown_snapshot_: {"known": False}, } def test_api_known_swhid_same_hash(api_client, content): content_ = gen_swhid(ObjectType.CONTENT, content["sha1_git"]) # Reuse hash to make invalid directory SHWID directory_ = gen_swhid(ObjectType.DIRECTORY, content["sha1_git"]) input_swhids = [ content_, directory_, ] url = reverse("api-1-known") resp = check_api_post_responses(api_client, url, data=input_swhids, status_code=200) assert resp.data == { content_: {"known": True}, directory_: {"known": False}, } def test_api_known_invalid_swhid(api_client): invalid_swhid_sha1 = ["swh:1:cnt:8068d0075010b590762c6cb5682ed53cb3c13de;"] invalid_swhid_type = ["swh:1:cnn:8068d0075010b590762c6cb5682ed53cb3c13deb"] url = reverse("api-1-known") check_api_post_responses(api_client, url, data=invalid_swhid_sha1, status_code=400) check_api_post_responses(api_client, url, data=invalid_swhid_type, status_code=400) def test_api_known_raises_large_payload_error(api_client): random_swhid = "swh:1:cnt:8068d0075010b590762c6cb5682ed53cb3c13deb" limit = 10000 err_msg = "The maximum number of SWHIDs this endpoint can receive is 1000" swhids = [random_swhid for i in range(limit)] url = reverse("api-1-known") resp = check_api_post_responses(api_client, url, data=swhids, status_code=413) assert resp.data == {"exception": "LargePayloadExc", "reason": err_msg} diff --git a/swh/web/tests/api/views/test_metadata.py b/swh/web/tests/api/views/test_metadata.py index d2b63411..19a84181 100644 --- a/swh/web/tests/api/views/test_metadata.py +++ b/swh/web/tests/api/views/test_metadata.py @@ -1,255 +1,255 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import attr from hypothesis import given, settings from hypothesis.strategies import sets import pytest from swh.model.hypothesis_strategies import raw_extrinsic_metadata from swh.model.model import Origin -from swh.web.common.utils import reverse from swh.web.tests.api.views.utils import scroll_results -from swh.web.tests.utils import check_api_get_responses, check_http_get_response +from swh.web.tests.helpers import check_api_get_responses, check_http_get_response +from swh.web.utils import reverse @given(raw_extrinsic_metadata()) def test_api_raw_extrinsic_metadata(api_client, subtest, metadata): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): archive_data.metadata_authority_add([metadata.authority]) archive_data.metadata_fetcher_add([metadata.fetcher]) archive_data.raw_extrinsic_metadata_add([metadata]) authority = metadata.authority url = reverse( "api-1-raw-extrinsic-metadata-swhid", url_args={"target": str(metadata.target)}, query_params={"authority": f"{authority.type.value} {authority.url}"}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 expected_result = metadata.to_dict() del expected_result["id"] del expected_result["metadata"] metadata_url = rv.data[0]["metadata_url"] expected_result["metadata_url"] = metadata_url expected_result["discovery_date"] = expected_result[ "discovery_date" ].isoformat() if expected_result["target"].startswith(("swh:1:ori:", "swh:1:emd:")): # non-core SWHID are hidden from the API del expected_result["target"] assert rv.data == [expected_result] rv = check_http_get_response(api_client, metadata_url, status_code=200) assert rv["Content-Type"] == "application/octet-stream" assert ( rv["Content-Disposition"] == f'attachment; filename="{metadata.target}_metadata"' ) assert rv.content == metadata.metadata @settings(max_examples=1) @given(raw_extrinsic_metadata()) def test_api_raw_extrinsic_metadata_origin_filename(api_client, subtest, metadata): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): nonlocal metadata origin = Origin(url="http://example.com/repo.git") metadata = attr.evolve(metadata, target=origin.swhid()) metadata = attr.evolve(metadata, id=metadata.compute_hash()) archive_data.origin_add([origin]) archive_data.metadata_authority_add([metadata.authority]) archive_data.metadata_fetcher_add([metadata.fetcher]) archive_data.raw_extrinsic_metadata_add([metadata]) authority = metadata.authority url = reverse( "api-1-raw-extrinsic-metadata-swhid", url_args={"target": str(metadata.target)}, query_params={"authority": f"{authority.type.value} {authority.url}"}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 metadata_url = rv.data[0]["metadata_url"] rv = check_http_get_response(api_client, metadata_url, status_code=200) assert rv["Content-Type"] == "application/octet-stream" assert ( rv["Content-Disposition"] == 'attachment; filename="http_example_com_repo_git_metadata"' ) assert rv.content == metadata.metadata @pytest.mark.parametrize("limit", [1, 2, 10, 100]) @given(sets(raw_extrinsic_metadata(), min_size=1)) def test_api_raw_extrinsic_metadata_scroll(api_client, subtest, limit, meta): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): # Make all metadata objects use the same authority and target metadata0 = next(iter(meta)) metadata = { attr.evolve(m, authority=metadata0.authority, target=metadata0.target) for m in meta } # Metadata ids must also be updated as they depend on authority and target metadata = {attr.evolve(m, id=m.compute_hash()) for m in metadata} authority = metadata0.authority archive_data.metadata_authority_add([authority]) archive_data.metadata_fetcher_add(list({m.fetcher for m in metadata})) archive_data.raw_extrinsic_metadata_add(metadata) url = reverse( "api-1-raw-extrinsic-metadata-swhid", url_args={"target": str(metadata0.target)}, query_params={ "authority": f"{authority.type.value} {authority.url}", "limit": limit, }, ) results = scroll_results(api_client, url) expected_results = [m.to_dict() for m in metadata] for expected_result in expected_results: del expected_result["id"] del expected_result["metadata"] expected_result["discovery_date"] = expected_result[ "discovery_date" ].isoformat() if expected_result["target"].startswith(("swh:1:ori:", "swh:1:emd:")): # non-core SWHID are hidden from the API del expected_result["target"] assert len(results) == len(expected_results) for result in results: del result["metadata_url"] assert result in expected_results, str(expected_results) _swhid = "swh:1:dir:a2faa28028657859c16ff506924212b33f0e1307" @pytest.mark.parametrize( "status_code,url_args,query_params", [ pytest.param( 200, {"target": _swhid}, {"authority": "forge http://example.org"}, id="minimal working", ), pytest.param( 200, {"target": _swhid}, { "authority": "forge http://example.org", "after": "2021-06-18T09:31:09", "limit": 100, }, id="maximal working", ), pytest.param( 400, {"target": _swhid}, {"authority": "foo http://example.org"}, id="invalid authority type", ), pytest.param( 400, {"target": _swhid}, { "authority": "forge http://example.org", "after": "yesterday", }, id="invalid 'after' format", ), pytest.param( 400, {"target": _swhid}, { "authority": "forge http://example.org", "limit": "abc", }, id="invalid 'limit'", ), ], ) def test_api_raw_extrinsic_metadata_check_params( api_client, archive_data, status_code, url_args, query_params ): url = reverse( "api-1-raw-extrinsic-metadata-swhid", url_args=url_args, query_params=query_params, ) check_api_get_responses(api_client, url, status_code=status_code) @given(raw_extrinsic_metadata()) def test_api_raw_extrinsic_metadata_list_authorities(api_client, subtest, metadata): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): archive_data.metadata_authority_add([metadata.authority]) archive_data.metadata_fetcher_add([metadata.fetcher]) archive_data.raw_extrinsic_metadata_add([metadata]) authority = metadata.authority url = reverse( "api-1-raw-extrinsic-metadata-swhid-authorities", url_args={"target": str(metadata.target)}, ) rv = check_api_get_responses(api_client, url, status_code=200) expected_results = [ { "type": authority.type.value, "url": authority.url, "metadata_list_url": "http://testserver" + reverse( "api-1-raw-extrinsic-metadata-swhid", url_args={"target": str(metadata.target)}, query_params={ "authority": f"{authority.type.value} {authority.url}" }, ), } ] assert rv.data == expected_results def test_api_raw_extrinsic_metadata_origin_redirect(api_client, archive_data): origin = Origin(url="http://example.com/repo.git") archive_data.origin_add([origin]) url = reverse( "api-1-raw-extrinsic-metadata-origin-authorities", url_args={"origin_url": origin.url}, ) rv = check_http_get_response(api_client, url, status_code=302) redirect_url = reverse( "api-1-raw-extrinsic-metadata-swhid-authorities", url_args={"target": str(origin.swhid())}, ) assert rv["location"] == redirect_url diff --git a/swh/web/tests/api/views/test_origin.py b/swh/web/tests/api/views/test_origin.py index 815216fe..5802b5dd 100644 --- a/swh/web/tests/api/views/test_origin.py +++ b/swh/web/tests/api/views/test_origin.py @@ -1,867 +1,867 @@ # Copyright (C) 2015-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from datetime import timedelta import json from hypothesis import given import pytest from swh.indexer.storage.model import OriginIntrinsicMetadataRow from swh.model.hashutil import hash_to_bytes from swh.model.model import Origin, OriginVisit, OriginVisitStatus from swh.search.exc import SearchQuerySyntaxError from swh.search.interface import PagedResult from swh.storage.exc import StorageAPIError, StorageDBError from swh.storage.utils import now from swh.web.api.utils import enrich_origin, enrich_origin_visit -from swh.web.common.exc import BadInputExc -from swh.web.common.origin_visits import get_origin_visits -from swh.web.common.utils import reverse from swh.web.tests.api.views.utils import scroll_results from swh.web.tests.data import ( INDEXER_TOOL, ORIGIN_MASTER_DIRECTORY, ORIGIN_MASTER_REVISION, ORIGIN_METADATA_KEY, ORIGIN_METADATA_VALUE, ) +from swh.web.tests.helpers import check_api_get_responses from swh.web.tests.strategies import new_origin, new_snapshots, visit_dates -from swh.web.tests.utils import check_api_get_responses +from swh.web.utils import reverse +from swh.web.utils.exc import BadInputExc +from swh.web.utils.origin_visits import get_origin_visits def test_api_lookup_origin_visits_raise_error(api_client, origin, mocker): mock_get_origin_visits = mocker.patch("swh.web.api.views.origin.get_origin_visits") err_msg = "voluntary error to check the bad request middleware." mock_get_origin_visits.side_effect = BadInputExc(err_msg) url = reverse("api-1-origin-visits", url_args={"origin_url": origin["url"]}) rv = check_api_get_responses(api_client, url, status_code=400) assert rv.data == {"exception": "BadInputExc", "reason": err_msg} def test_api_lookup_origin_visits_raise_swh_storage_error_db( api_client, origin, mocker ): mock_get_origin_visits = mocker.patch("swh.web.api.views.origin.get_origin_visits") err_msg = "Storage exploded! Will be back online shortly!" mock_get_origin_visits.side_effect = StorageDBError(err_msg) url = reverse("api-1-origin-visits", url_args={"origin_url": origin["url"]}) rv = check_api_get_responses(api_client, url, status_code=503) assert rv.data == { "exception": "StorageDBError", "reason": "An unexpected error occurred in the backend: %s" % err_msg, } def test_api_lookup_origin_visits_raise_swh_storage_error_api( api_client, origin, mocker ): mock_get_origin_visits = mocker.patch("swh.web.api.views.origin.get_origin_visits") err_msg = "Storage API dropped dead! Will resurrect asap!" mock_get_origin_visits.side_effect = StorageAPIError(err_msg) url = reverse("api-1-origin-visits", url_args={"origin_url": origin["url"]}) rv = check_api_get_responses(api_client, url, status_code=503) assert rv.data == { "exception": "StorageAPIError", "reason": "An unexpected error occurred in the api backend: %s" % err_msg, } @given(new_origin(), visit_dates(3), new_snapshots(3)) def test_api_lookup_origin_visits( api_client, subtest, new_origin, visit_dates, new_snapshots ): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): archive_data.origin_add([new_origin]) for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [ OriginVisit( origin=new_origin.url, date=visit_date, type="git", ) ] )[0] archive_data.snapshot_add([new_snapshots[i]]) visit_status = OriginVisitStatus( origin=new_origin.url, visit=origin_visit.visit, date=now(), status="full", snapshot=new_snapshots[i].id, ) archive_data.origin_visit_status_add([visit_status]) all_visits = list(reversed(get_origin_visits(new_origin.to_dict()))) for last_visit, expected_visits in ( (None, all_visits[:2]), (all_visits[1]["visit"], all_visits[2:]), ): url = reverse( "api-1-origin-visits", url_args={"origin_url": new_origin.url}, query_params={"per_page": 2, "last_visit": last_visit}, ) rv = check_api_get_responses(api_client, url, status_code=200) for i in range(len(expected_visits)): expected_visits[i] = enrich_origin_visit( expected_visits[i], with_origin_link=False, with_origin_visit_link=True, request=rv.wsgi_request, ) assert rv.data == expected_visits @given(new_origin(), visit_dates(3), new_snapshots(3)) def test_api_lookup_origin_visits_by_id( api_client, subtest, new_origin, visit_dates, new_snapshots ): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): archive_data.origin_add([new_origin]) for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [ OriginVisit( origin=new_origin.url, date=visit_date, type="git", ) ] )[0] archive_data.snapshot_add([new_snapshots[i]]) visit_status = OriginVisitStatus( origin=new_origin.url, visit=origin_visit.visit, date=now(), status="full", snapshot=new_snapshots[i].id, ) archive_data.origin_visit_status_add([visit_status]) all_visits = list(reversed(get_origin_visits(new_origin.to_dict()))) for last_visit, expected_visits in ( (None, all_visits[:2]), (all_visits[1]["visit"], all_visits[2:4]), ): url = reverse( "api-1-origin-visits", url_args={"origin_url": new_origin.url}, query_params={"per_page": 2, "last_visit": last_visit}, ) rv = check_api_get_responses(api_client, url, status_code=200) for i in range(len(expected_visits)): expected_visits[i] = enrich_origin_visit( expected_visits[i], with_origin_link=False, with_origin_visit_link=True, request=rv.wsgi_request, ) assert rv.data == expected_visits @given(new_origin(), visit_dates(3), new_snapshots(3)) def test_api_lookup_origin_visit( api_client, subtest, new_origin, visit_dates, new_snapshots ): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): archive_data.origin_add([new_origin]) for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [ OriginVisit( origin=new_origin.url, date=visit_date, type="git", ) ] )[0] visit_id = origin_visit.visit archive_data.snapshot_add([new_snapshots[i]]) visit_status = OriginVisitStatus( origin=new_origin.url, visit=origin_visit.visit, date=visit_date + timedelta(minutes=5), status="full", snapshot=new_snapshots[i].id, ) archive_data.origin_visit_status_add([visit_status]) url = reverse( "api-1-origin-visit", url_args={"origin_url": new_origin.url, "visit_id": visit_id}, ) rv = check_api_get_responses(api_client, url, status_code=200) expected_visit = archive_data.origin_visit_get_by(new_origin.url, visit_id) expected_visit = enrich_origin_visit( expected_visit, with_origin_link=True, with_origin_visit_link=False, request=rv.wsgi_request, ) assert rv.data == expected_visit @given(new_origin()) def test_api_lookup_origin_visit_latest_no_visit(api_client, archive_data, new_origin): archive_data.origin_add([new_origin]) url = reverse("api-1-origin-visit-latest", url_args={"origin_url": new_origin.url}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "No visit for origin %s found" % new_origin.url, } @given(new_origin(), visit_dates(2), new_snapshots(1)) def test_api_lookup_origin_visit_latest( api_client, subtest, new_origin, visit_dates, new_snapshots ): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): archive_data.origin_add([new_origin]) visit_dates.sort() visit_ids = [] for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [ OriginVisit( origin=new_origin.url, date=visit_date, type="git", ) ] )[0] visit_ids.append(origin_visit.visit) archive_data.snapshot_add([new_snapshots[0]]) visit_status = OriginVisitStatus( origin=new_origin.url, visit=visit_ids[0], date=now(), status="full", snapshot=new_snapshots[0].id, ) archive_data.origin_visit_status_add([visit_status]) url = reverse( "api-1-origin-visit-latest", url_args={"origin_url": new_origin.url} ) rv = check_api_get_responses(api_client, url, status_code=200) expected_visit = archive_data.origin_visit_status_get_latest( new_origin.url, type="git" ) expected_visit = enrich_origin_visit( expected_visit, with_origin_link=True, with_origin_visit_link=False, request=rv.wsgi_request, ) assert rv.data == expected_visit @given(new_origin(), visit_dates(2), new_snapshots(1)) def test_api_lookup_origin_visit_latest_with_snapshot( api_client, subtest, new_origin, visit_dates, new_snapshots ): # ensure archive_data fixture will be reset between each hypothesis # example test run @subtest def test_inner(archive_data): archive_data.origin_add([new_origin]) visit_dates.sort() visit_ids = [] for i, visit_date in enumerate(visit_dates): origin_visit = archive_data.origin_visit_add( [ OriginVisit( origin=new_origin.url, date=visit_date, type="git", ) ] )[0] visit_ids.append(origin_visit.visit) archive_data.snapshot_add([new_snapshots[0]]) # Add snapshot to the latest visit visit_id = visit_ids[-1] visit_status = OriginVisitStatus( origin=new_origin.url, visit=visit_id, date=now(), status="full", snapshot=new_snapshots[0].id, ) archive_data.origin_visit_status_add([visit_status]) url = reverse( "api-1-origin-visit-latest", url_args={"origin_url": new_origin.url}, query_params={"require_snapshot": True}, ) rv = check_api_get_responses(api_client, url, status_code=200) expected_visit = archive_data.origin_visit_status_get_latest( new_origin.url, type="git", require_snapshot=True ) expected_visit = enrich_origin_visit( expected_visit, with_origin_link=True, with_origin_visit_link=False, request=rv.wsgi_request, ) assert rv.data == expected_visit def test_api_lookup_origin_visit_not_found(api_client, origin): all_visits = list(reversed(get_origin_visits(origin))) max_visit_id = max([v["visit"] for v in all_visits]) url = reverse( "api-1-origin-visit", url_args={"origin_url": origin["url"], "visit_id": max_visit_id + 1}, ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Origin %s or its visit with id %s not found!" % (origin["url"], max_visit_id + 1), } def test_api_origins_wrong_input(api_client, archive_data): """Should fail with 400 if the input is deprecated.""" # fail if wrong input url = reverse("api-1-origins", query_params={"origin_from": 1}) rv = check_api_get_responses(api_client, url, status_code=400) assert rv.data == { "exception": "BadInputExc", "reason": "Please use the Link header to browse through result", } def test_api_origins(api_client, archive_data): page_result = archive_data.origin_list(limit=10000) origins = page_result.results origin_urls = {origin.url for origin in origins} # Get only one url = reverse("api-1-origins", query_params={"origin_count": 1}) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 assert {origin["url"] for origin in rv.data} <= origin_urls # Get all url = reverse("api-1-origins", query_params={"origin_count": len(origins)}) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == len(origins) assert {origin["url"] for origin in rv.data} == origin_urls # Get "all + 10" url = reverse("api-1-origins", query_params={"origin_count": len(origins) + 10}) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == len(origins) assert {origin["url"] for origin in rv.data} == origin_urls @pytest.mark.parametrize("origin_count", [1, 2, 10, 100]) def test_api_origins_scroll(api_client, archive_data, origin_count): page_result = archive_data.origin_list(limit=10000) origins = page_result.results origin_urls = {origin.url for origin in origins} url = reverse("api-1-origins", query_params={"origin_count": origin_count}) results = scroll_results(api_client, url) assert len(results) == len(origins) assert {origin["url"] for origin in results} == origin_urls def test_api_origin_by_url(api_client, archive_data, origin): origin_url = origin["url"] url = reverse("api-1-origin", url_args={"origin_url": origin_url}) rv = check_api_get_responses(api_client, url, status_code=200) expected_origin = archive_data.origin_get([origin_url])[0] expected_origin = enrich_origin(expected_origin, rv.wsgi_request) assert rv.data == expected_origin @given(new_origin()) def test_api_origin_not_found(api_client, new_origin): url = reverse("api-1-origin", url_args={"origin_url": new_origin.url}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Origin with url %s not found!" % new_origin.url, } @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) def test_api_origin_search(api_client, mocker, backend): if backend != "swh-search": # equivalent to not configuring search in the config - mocker.patch("swh.web.common.archive.search", None) + mocker.patch("swh.web.utils.archive.search", None) expected_origins = { "https://github.com/wcoder/highlightjs-line-numbers.js", "https://github.com/memononen/libtess2", } # Search for 'github.com', get only one url = reverse( "api-1-origin-search", url_args={"url_pattern": "github.com"}, query_params={"limit": 1}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 assert {origin["url"] for origin in rv.data} <= expected_origins assert rv.data == [ enrich_origin({"url": origin["url"]}, request=rv.wsgi_request) for origin in rv.data ] # Search for 'github.com', get all url = reverse( "api-1-origin-search", url_args={"url_pattern": "github.com"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins assert rv.data == [ enrich_origin({"url": origin["url"]}, request=rv.wsgi_request) for origin in rv.data ] # Search for 'github.com', get more than available url = reverse( "api-1-origin-search", url_args={"url_pattern": "github.com"}, query_params={"limit": 10}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins assert rv.data == [ enrich_origin({"url": origin["url"]}, request=rv.wsgi_request) for origin in rv.data ] @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) def test_api_origin_search_words(api_client, mocker, backend): if backend != "swh-search": # equivalent to not configuring search in the config - mocker.patch("swh.web.common.archive.search", None) + mocker.patch("swh.web.utils.archive.search", None) expected_origins = { "https://github.com/wcoder/highlightjs-line-numbers.js", "https://github.com/memononen/libtess2", } url = reverse( "api-1-origin-search", url_args={"url_pattern": "github com"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins url = reverse( "api-1-origin-search", url_args={"url_pattern": "com github"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins url = reverse( "api-1-origin-search", url_args={"url_pattern": "memononen libtess2"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 assert {origin["url"] for origin in rv.data} == { "https://github.com/memononen/libtess2" } url = reverse( "api-1-origin-search", url_args={"url_pattern": "libtess2 memononen"}, query_params={"limit": 2}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1 assert {origin["url"] for origin in rv.data} == { "https://github.com/memononen/libtess2" } @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) def test_api_origin_search_visit_type(api_client, mocker, backend): if backend != "swh-search": # equivalent to not configuring search in the config - mocker.patch("swh.web.common.archive.search", None) + mocker.patch("swh.web.utils.archive.search", None) expected_origins = { "https://github.com/wcoder/highlightjs-line-numbers.js", "https://github.com/memononen/libtess2", } url = reverse( "api-1-origin-search", url_args={ "url_pattern": "github com", }, query_params={"visit_type": "git"}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins url = reverse( "api-1-origin-search", url_args={ "url_pattern": "github com", }, query_params={"visit_type": "foo"}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == [] def test_api_origin_search_use_ql(api_client, mocker): expected_origins = { "https://github.com/wcoder/highlightjs-line-numbers.js", "https://github.com/memononen/libtess2", } ORIGINS = [{"url": origin} for origin in expected_origins] - mock_archive_search = mocker.patch("swh.web.common.archive.search") + mock_archive_search = mocker.patch("swh.web.utils.archive.search") mock_archive_search.origin_search.return_value = PagedResult( results=ORIGINS, next_page_token=None, ) query = "origin : 'github.com'" url = reverse( "api-1-origin-search", url_args={"url_pattern": query}, query_params={"visit_type": "git", "use_ql": "true"}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {origin["url"] for origin in rv.data} == expected_origins mock_archive_search.origin_search.assert_called_with( query=query, page_token=None, with_visit=False, visit_types=["git"], limit=70 ) def test_api_origin_search_ql_syntax_error(api_client, mocker): - mock_archive_search = mocker.patch("swh.web.common.archive.search") + mock_archive_search = mocker.patch("swh.web.utils.archive.search") mock_archive_search.origin_search.side_effect = SearchQuerySyntaxError( "Invalid syntax" ) query = "this is not a valid query" url = reverse( "api-1-origin-search", url_args={"url_pattern": query}, query_params={"visit_type": "git", "use_ql": "true"}, ) rv = check_api_get_responses(api_client, url, status_code=400) assert rv.data == { "exception": "BadInputExc", "reason": "Syntax error in search query: Invalid syntax", } mock_archive_search.origin_search.assert_called_with( query=query, page_token=None, with_visit=False, visit_types=["git"], limit=70 ) @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) @pytest.mark.parametrize("limit", [1, 2, 3, 10]) def test_api_origin_search_scroll(api_client, archive_data, mocker, limit, backend): if backend != "swh-search": # equivalent to not configuring search in the config - mocker.patch("swh.web.common.archive.search", None) + mocker.patch("swh.web.utils.archive.search", None) expected_origins = { "https://github.com/wcoder/highlightjs-line-numbers.js", "https://github.com/memononen/libtess2", } url = reverse( "api-1-origin-search", url_args={"url_pattern": "github.com"}, query_params={"limit": limit}, ) results = scroll_results(api_client, url) assert {origin["url"] for origin in results} == expected_origins @pytest.mark.parametrize("backend", ["swh-search", "swh-storage"]) def test_api_origin_search_limit(api_client, archive_data, tests_data, mocker, backend): if backend == "swh-search": tests_data["search"].origin_update( [{"url": "http://foobar/{}".format(i)} for i in range(2000)] ) else: # equivalent to not configuring search in the config - mocker.patch("swh.web.common.archive.search", None) + mocker.patch("swh.web.utils.archive.search", None) archive_data.origin_add( [Origin(url="http://foobar/{}".format(i)) for i in range(2000)] ) url = reverse( "api-1-origin-search", url_args={"url_pattern": "foobar"}, query_params={"limit": 1050}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == 1000 @pytest.mark.parametrize("backend", ["swh-search", "swh-indexer-storage"]) def test_api_origin_metadata_search(api_client, mocker, backend): - mock_config = mocker.patch("swh.web.common.archive.config") + mock_config = mocker.patch("swh.web.utils.archive.config") mock_config.get_config.return_value = { "search_config": {"metadata_backend": backend} } url = reverse( "api-1-origin-metadata-search", query_params={"fulltext": ORIGIN_METADATA_VALUE} ) rv = check_api_get_responses(api_client, url, status_code=200) rv.data = sorted(rv.data, key=lambda d: d["url"]) expected_data = sorted( [ { "url": origin_url, "metadata": { "from_directory": ORIGIN_MASTER_DIRECTORY[origin_url], "tool": { "name": INDEXER_TOOL["tool_name"], "version": INDEXER_TOOL["tool_version"], "configuration": INDEXER_TOOL["tool_configuration"], "id": INDEXER_TOOL["id"], }, "mappings": [], }, } for origin_url in sorted(ORIGIN_MASTER_REVISION.keys()) ], key=lambda d: d["url"], ) for i in range(len(expected_data)): expected = expected_data[i] response = rv.data[i] metadata = response["metadata"].pop("metadata") assert any( [ORIGIN_METADATA_VALUE in json.dumps(val) for val in metadata.values()] ) assert response == expected def test_api_origin_metadata_search_limit(api_client, mocker): - mock_idx_storage = mocker.patch("swh.web.common.archive.idx_storage") + mock_idx_storage = mocker.patch("swh.web.utils.archive.idx_storage") oimsft = mock_idx_storage.origin_intrinsic_metadata_search_fulltext oimsft.side_effect = lambda conjunction, limit: [ OriginIntrinsicMetadataRow( id=origin_url, from_directory=hash_to_bytes(directory), indexer_configuration_id=INDEXER_TOOL["id"], metadata={ORIGIN_METADATA_KEY: ORIGIN_METADATA_VALUE}, mappings=[], ) for origin_url, directory in ORIGIN_MASTER_DIRECTORY.items() ] url = reverse( "api-1-origin-metadata-search", query_params={"fulltext": ORIGIN_METADATA_VALUE} ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == len(ORIGIN_MASTER_REVISION) oimsft.assert_called_with(conjunction=[ORIGIN_METADATA_VALUE], limit=70) url = reverse( "api-1-origin-metadata-search", query_params={"fulltext": ORIGIN_METADATA_VALUE, "limit": 10}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == len(ORIGIN_MASTER_REVISION) oimsft.assert_called_with(conjunction=[ORIGIN_METADATA_VALUE], limit=10) url = reverse( "api-1-origin-metadata-search", query_params={"fulltext": ORIGIN_METADATA_VALUE, "limit": 987}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert len(rv.data) == len(ORIGIN_MASTER_REVISION) oimsft.assert_called_with(conjunction=[ORIGIN_METADATA_VALUE], limit=100) def test_api_origin_intrinsic_metadata(api_client, origin): url = reverse( "api-origin-intrinsic-metadata", url_args={"origin_url": origin["url"]} ) rv = check_api_get_responses(api_client, url, status_code=200) assert ORIGIN_METADATA_KEY in rv.data assert rv.data[ORIGIN_METADATA_KEY] == ORIGIN_METADATA_VALUE def test_api_origin_metadata_search_invalid(api_client, mocker): - mock_idx_storage = mocker.patch("swh.web.common.archive.idx_storage") + mock_idx_storage = mocker.patch("swh.web.utils.archive.idx_storage") url = reverse("api-1-origin-metadata-search") check_api_get_responses(api_client, url, status_code=400) mock_idx_storage.assert_not_called() @pytest.mark.parametrize("backend", ["swh-counters", "swh-storage"]) def test_api_stat_counters(api_client, mocker, backend): - mock_config = mocker.patch("swh.web.common.archive.config") + mock_config = mocker.patch("swh.web.utils.archive.config") mock_config.get_config.return_value = {"counters_backend": backend} url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=200) counts = json.loads(rv.content) for obj in ["content", "origin", "release", "directory", "revision"]: assert counts.get(obj, 0) > 0 @pytest.fixture def archived_origins(archive_data): page_result = archive_data.origin_list(page_token=None, limit=10000) origins = [origin.to_dict() for origin in page_result.results] for origin in origins: ovs = archive_data.origin_visit_get_with_statuses(origin["url"]).results del origin["id"] origin["type"] = ovs[0].visit.type return origins def test_api_origin_search_empty_pattern(api_client, archived_origins): url = reverse( "api-1-origin-search", url_args={"url_pattern": ""}, query_params={"limit": 10000}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {o["url"] for o in rv.data} == {o["url"] for o in archived_origins} def test_api_origin_search_empty_pattern_and_visit_type(api_client, archived_origins): visit_types = {o["type"] for o in archived_origins} for visit_type in visit_types: url = reverse( "api-1-origin-search", url_args={"url_pattern": ""}, query_params={"visit_type": visit_type, "limit": 10000}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert {o["url"] for o in rv.data} == { o["url"] for o in archived_origins if o["type"] == visit_type } @pytest.mark.parametrize( "view_name, extra_args", [ ("api-1-origin", {}), ("api-1-origin-visits", {}), ("api-1-origin-visit", {"visit_id": 1}), ("api-1-origin-visit-latest", {}), ("api-origin-intrinsic-metadata", {}), ], ) def test_api_origin_by_url_with_extra_trailing_slash( api_client, origin, view_name, extra_args ): origin_url = origin["url"] assert not origin_url.endswith("/") origin_url = origin_url + "/" url = reverse(view_name, url_args={"origin_url": origin_url, **extra_args}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": f"Origin with url {origin_url} not found!", } diff --git a/swh/web/tests/api/views/test_ping.py b/swh/web/tests/api/views/test_ping.py index 2e2b64eb..8d14e0c0 100644 --- a/swh/web/tests/api/views/test_ping.py +++ b/swh/web/tests/api/views/test_ping.py @@ -1,13 +1,13 @@ # Copyright (C) 2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information -from swh.web.common.utils import reverse -from swh.web.tests.utils import check_api_get_responses +from swh.web.tests.helpers import check_api_get_responses +from swh.web.utils import reverse def test_api_1_ping(api_client): url = reverse("api-1-ping") rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == "pong" diff --git a/swh/web/tests/api/views/test_raw.py b/swh/web/tests/api/views/test_raw.py index 5e502b65..4140e9c8 100644 --- a/swh/web/tests/api/views/test_raw.py +++ b/swh/web/tests/api/views/test_raw.py @@ -1,115 +1,115 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import hashlib import pytest from swh.model.hashutil import hash_to_bytes from swh.web.api.throttling import SwhWebUserRateThrottle from swh.web.auth.utils import API_RAW_OBJECT_PERMISSION -from swh.web.common.utils import reverse from swh.web.settings.tests import api_raw_object_rate -from swh.web.tests.utils import ( +from swh.web.tests.helpers import ( check_api_get_responses, check_http_get_response, create_django_permission, ) +from swh.web.utils import reverse @pytest.fixture def privileged_user(regular_user): regular_user.user_permissions.add( create_django_permission(API_RAW_OBJECT_PERMISSION) ) return regular_user @pytest.mark.django_db def test_api_raw_forbidden_for_anonymous_user(api_client, unknown_core_swhid): url = reverse("api-1-raw-object", url_args={"swhid": str(unknown_core_swhid)}) check_api_get_responses(api_client, url, status_code=403) @pytest.mark.django_db def test_api_raw_forbidden_for_user_without_permission( api_client, regular_user, unknown_core_swhid ): api_client.force_login(regular_user) url = reverse("api-1-raw-object", url_args={"swhid": str(unknown_core_swhid)}) check_api_get_responses(api_client, url, status_code=403) @pytest.mark.django_db def test_api_raw_not_found(api_client, unknown_core_swhid, staff_user): api_client.force_login(staff_user) url = reverse("api-1-raw-object", url_args={"swhid": str(unknown_core_swhid)}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": f"Object with id {unknown_core_swhid} not found.", } def _test_api_raw_hash(api_client, privileged_user, archive_data, object_id, object_ty): api_client.force_login(privileged_user) url = reverse( "api-1-raw-object", url_args={"swhid": f"swh:1:{object_ty}:{object_id}"}, ) rv = check_http_get_response(api_client, url, status_code=200) assert rv["Content-Type"] == "application/octet-stream" assert ( rv["Content-disposition"] == f"attachment; filename=swh_1_{object_ty}_{object_id}_raw" ) sha1_git = hashlib.new("sha1", rv.content).digest() assert sha1_git == hash_to_bytes(object_id) @pytest.mark.django_db def test_api_raw_content(api_client, archive_data, content, privileged_user): _test_api_raw_hash( api_client, privileged_user, archive_data, content["sha1_git"], "cnt" ) @pytest.mark.django_db def test_api_raw_directory(api_client, archive_data, directory, privileged_user): _test_api_raw_hash(api_client, privileged_user, archive_data, directory, "dir") @pytest.mark.django_db def test_api_raw_revision(api_client, archive_data, revision, privileged_user): _test_api_raw_hash(api_client, privileged_user, archive_data, revision, "rev") @pytest.mark.django_db def test_api_raw_release(api_client, archive_data, release, privileged_user): _test_api_raw_hash(api_client, privileged_user, archive_data, release, "rel") @pytest.mark.django_db def test_api_raw_snapshot(api_client, archive_data, snapshot, privileged_user): _test_api_raw_hash(api_client, privileged_user, archive_data, snapshot, "snp") @pytest.mark.django_db def test_api_raw_no_rate_limit_for_privileged_user( api_client, revision, privileged_user ): api_client.force_login(privileged_user) url = reverse( "api-1-raw-object", url_args={"swhid": f"swh:1:rev:{revision}"}, ) for _ in range(api_raw_object_rate * SwhWebUserRateThrottle.NUM_REQUESTS_FACTOR): check_http_get_response(api_client, url, status_code=200) check_http_get_response(api_client, url, status_code=200) diff --git a/swh/web/tests/api/views/test_release.py b/swh/web/tests/api/views/test_release.py index a847973f..caca9a15 100644 --- a/swh/web/tests/api/views/test_release.py +++ b/swh/web/tests/api/views/test_release.py @@ -1,104 +1,104 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from datetime import datetime, timezone from swh.model.hashutil import hash_to_bytes, hash_to_hex from swh.model.model import ObjectType, Person, Release, TimestampWithTimezone -from swh.web.common.utils import reverse from swh.web.tests.data import random_sha1 -from swh.web.tests.utils import check_api_get_responses, check_http_get_response +from swh.web.tests.helpers import check_api_get_responses, check_http_get_response +from swh.web.utils import reverse def test_api_release(api_client, archive_data, release): url = reverse("api-1-release", url_args={"sha1_git": release}) rv = check_api_get_responses(api_client, url, status_code=200) expected_release = archive_data.release_get(release) target_revision = expected_release["target"] target_url = reverse( "api-1-revision", url_args={"sha1_git": target_revision}, request=rv.wsgi_request, ) expected_release["target_url"] = target_url assert rv.data == expected_release def test_api_release_target_type_not_a_revision( api_client, archive_data, content, directory, release ): for target_type, target in ( (ObjectType.CONTENT, content), (ObjectType.DIRECTORY, directory), (ObjectType.RELEASE, release), ): if target_type == ObjectType.CONTENT: target = target["sha1_git"] sample_release = Release( author=Person( email=b"author@company.org", fullname=b"author ", name=b"author", ), date=TimestampWithTimezone.from_datetime(datetime.now(tz=timezone.utc)), message=b"sample release message", name=b"sample release", synthetic=False, target=hash_to_bytes(target), target_type=target_type, ) archive_data.release_add([sample_release]) new_release_id = hash_to_hex(sample_release.id) url = reverse("api-1-release", url_args={"sha1_git": new_release_id}) rv = check_api_get_responses(api_client, url, status_code=200) expected_release = archive_data.release_get(new_release_id) if target_type == ObjectType.CONTENT: url_args = {"q": "sha1_git:%s" % target} else: url_args = {"sha1_git": target} target_url = reverse( "api-1-%s" % target_type.value, url_args=url_args, request=rv.wsgi_request ) expected_release["target_url"] = target_url assert rv.data == expected_release def test_api_release_not_found(api_client): unknown_release_ = random_sha1() url = reverse("api-1-release", url_args={"sha1_git": unknown_release_}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Release with sha1_git %s not found." % unknown_release_, } def test_api_release_uppercase(api_client, release): url = reverse( "api-1-release-uppercase-checksum", url_args={"sha1_git": release.upper()} ) resp = check_http_get_response(api_client, url, status_code=302) redirect_url = reverse( "api-1-release-uppercase-checksum", url_args={"sha1_git": release} ) assert resp["location"] == redirect_url diff --git a/swh/web/tests/api/views/test_revision.py b/swh/web/tests/api/views/test_revision.py index 47717683..b005ec7f 100644 --- a/swh/web/tests/api/views/test_revision.py +++ b/swh/web/tests/api/views/test_revision.py @@ -1,230 +1,230 @@ # Copyright (C) 2015-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from hypothesis import given from swh.model.from_disk import DentryPerms from swh.model.hashutil import hash_to_bytes, hash_to_hex from swh.model.model import ( Directory, DirectoryEntry, Revision, RevisionType, TimestampWithTimezone, ) from swh.web.api.utils import enrich_content, enrich_directory_entry, enrich_revision -from swh.web.common.utils import reverse from swh.web.tests.data import random_sha1 +from swh.web.tests.helpers import check_api_get_responses, check_http_get_response from swh.web.tests.strategies import new_person, new_swh_date -from swh.web.tests.utils import check_api_get_responses, check_http_get_response +from swh.web.utils import reverse def test_api_revision(api_client, archive_data, revision): url = reverse("api-1-revision", url_args={"sha1_git": revision}) rv = check_api_get_responses(api_client, url, status_code=200) expected_revision = archive_data.revision_get(revision) enrich_revision(expected_revision, rv.wsgi_request) assert rv.data == expected_revision def test_api_revision_not_found(api_client): unknown_revision_ = random_sha1() url = reverse("api-1-revision", url_args={"sha1_git": unknown_revision_}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Revision with sha1_git %s not found." % unknown_revision_, } def test_api_revision_raw_ok(api_client, archive_data, revision): url = reverse("api-1-revision-raw-message", url_args={"sha1_git": revision}) expected_message = archive_data.revision_get(revision)["message"] rv = check_http_get_response(api_client, url, status_code=200) assert rv["Content-Type"] == "application/octet-stream" assert rv.content == expected_message.encode() def test_api_revision_raw_ko_no_rev(api_client): unknown_revision_ = random_sha1() url = reverse( "api-1-revision-raw-message", url_args={"sha1_git": unknown_revision_} ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Revision with sha1_git %s not found." % unknown_revision_, } def test_api_revision_log(api_client, archive_data, revision): limit = 10 url = reverse( "api-1-revision-log", url_args={"sha1_git": revision}, query_params={"limit": limit}, ) rv = check_api_get_responses(api_client, url, status_code=200) expected_log = archive_data.revision_log(revision, limit=limit) expected_log = list( map(enrich_revision, expected_log, [rv.wsgi_request] * len(expected_log)) ) assert rv.data == expected_log def test_api_revision_log_not_found(api_client): unknown_revision_ = random_sha1() url = reverse("api-1-revision-log", url_args={"sha1_git": unknown_revision_}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": "Revision with sha1_git %s not found." % unknown_revision_, } assert not rv.has_header("Link") def test_api_revision_directory_ko_not_found(api_client): sha1_git = random_sha1() url = reverse("api-1-revision-directory", {"sha1_git": sha1_git}) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data == { "exception": "NotFoundExc", "reason": f"Revision with sha1_git {sha1_git} not found.", } def test_api_revision_directory_ok_returns_dir_entries( api_client, archive_data, revision ): url = reverse("api-1-revision-directory", {"sha1_git": revision}) rv = check_api_get_responses(api_client, url, status_code=200) rev_data = archive_data.revision_get(revision) dir_content = archive_data.directory_ls(rev_data["directory"]) dir_content = [ enrich_directory_entry(dir_entry, request=rv.wsgi_request) for dir_entry in dir_content ] assert rv.data == { "content": dir_content, "path": ".", "type": "dir", "revision": revision, } @given(new_person(), new_swh_date()) def test_api_revision_directory_ok_returns_content( api_client, archive_data, content, person, date ): content_path = "foo" _dir = Directory( entries=( DirectoryEntry( name=content_path.encode(), type="file", target=hash_to_bytes(content["sha1_git"]), perms=DentryPerms.content, ), ) ) archive_data.directory_add([_dir]) revision = Revision( directory=_dir.id, author=person, committer=person, message=b"commit message", date=TimestampWithTimezone.from_datetime(date), committer_date=TimestampWithTimezone.from_datetime(date), synthetic=False, type=RevisionType.GIT, ) archive_data.revision_add([revision]) revision_id = hash_to_hex(revision.id) cnt_data = archive_data.content_get(content["sha1"]) url = reverse( "api-1-revision-directory", {"sha1_git": revision_id, "dir_path": content_path}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == { "content": enrich_content(cnt_data, request=rv.wsgi_request), "path": content_path, "type": "file", "revision": revision_id, } @given(new_person(), new_swh_date()) def test_api_revision_directory_ok_returns_revision( api_client, archive_data, revision, person, date ): rev_path = "foo" _dir = Directory( entries=( DirectoryEntry( name=rev_path.encode(), type="rev", target=hash_to_bytes(revision), perms=DentryPerms.revision, ), ) ) archive_data.directory_add([_dir]) rev = Revision( directory=_dir.id, author=person, committer=person, message=b"commit message", date=TimestampWithTimezone.from_datetime(date), committer_date=TimestampWithTimezone.from_datetime(date), synthetic=False, type=RevisionType.GIT, ) archive_data.revision_add([rev]) revision_id = hash_to_hex(rev.id) rev_data = archive_data.revision_get(revision) url = reverse( "api-1-revision-directory", {"sha1_git": revision_id, "dir_path": rev_path}, ) rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == { "content": enrich_revision(rev_data, request=rv.wsgi_request), "path": rev_path, "type": "rev", "revision": revision_id, } def test_api_revision_uppercase(api_client, revision): url = reverse( "api-1-revision-uppercase-checksum", url_args={"sha1_git": revision.upper()} ) resp = check_http_get_response(api_client, url, status_code=302) redirect_url = reverse("api-1-revision", url_args={"sha1_git": revision}) assert resp["location"] == redirect_url diff --git a/swh/web/tests/api/views/test_snapshot.py b/swh/web/tests/api/views/test_snapshot.py index 5a2c9148..d002b61d 100644 --- a/swh/web/tests/api/views/test_snapshot.py +++ b/swh/web/tests/api/views/test_snapshot.py @@ -1,159 +1,159 @@ # Copyright (C) 2018-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import random from hypothesis import given from swh.model.hashutil import hash_to_hex from swh.model.model import Snapshot from swh.web.api.utils import enrich_snapshot -from swh.web.common.utils import reverse from swh.web.tests.data import random_sha1 +from swh.web.tests.helpers import check_api_get_responses, check_http_get_response from swh.web.tests.strategies import new_snapshot -from swh.web.tests.utils import check_api_get_responses, check_http_get_response +from swh.web.utils import reverse def test_api_snapshot(api_client, archive_data, snapshot): url = reverse("api-1-snapshot", url_args={"snapshot_id": snapshot}) rv = check_api_get_responses(api_client, url, status_code=200) expected_data = {**archive_data.snapshot_get(snapshot), "next_branch": None} expected_data = enrich_snapshot(expected_data, rv.wsgi_request) assert rv.data == expected_data def test_api_snapshot_paginated(api_client, archive_data, snapshot): branches_offset = 0 branches_count = 2 snapshot_branches = [] for k, v in sorted(archive_data.snapshot_get(snapshot)["branches"].items()): snapshot_branches.append( {"name": k, "target_type": v["target_type"], "target": v["target"]} ) whole_snapshot = {"id": snapshot, "branches": {}, "next_branch": None} while branches_offset < len(snapshot_branches): branches_from = snapshot_branches[branches_offset]["name"] url = reverse( "api-1-snapshot", url_args={"snapshot_id": snapshot}, query_params={ "branches_from": branches_from, "branches_count": branches_count, }, ) rv = check_api_get_responses(api_client, url, status_code=200) expected_data = archive_data.snapshot_get_branches( snapshot, branches_from, branches_count ) expected_data = enrich_snapshot(expected_data, rv.wsgi_request) branches_offset += branches_count if branches_offset < len(snapshot_branches): next_branch = snapshot_branches[branches_offset]["name"] expected_data["next_branch"] = next_branch else: expected_data["next_branch"] = None assert rv.data == expected_data whole_snapshot["branches"].update(expected_data["branches"]) if branches_offset < len(snapshot_branches): next_url = rv.wsgi_request.build_absolute_uri( reverse( "api-1-snapshot", url_args={"snapshot_id": snapshot}, query_params={ "branches_from": next_branch, "branches_count": branches_count, }, ) ) assert rv["Link"] == '<%s>; rel="next"' % next_url else: assert not rv.has_header("Link") url = reverse("api-1-snapshot", url_args={"snapshot_id": snapshot}) rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == whole_snapshot def test_api_snapshot_filtered(api_client, archive_data, snapshot): snapshot_branches = [] for k, v in sorted(archive_data.snapshot_get(snapshot)["branches"].items()): snapshot_branches.append( {"name": k, "target_type": v["target_type"], "target": v["target"]} ) target_type = random.choice(snapshot_branches)["target_type"] url = reverse( "api-1-snapshot", url_args={"snapshot_id": snapshot}, query_params={"target_types": target_type}, ) rv = check_api_get_responses(api_client, url, status_code=200) expected_data = archive_data.snapshot_get_branches( snapshot, target_types=target_type ) expected_data = enrich_snapshot(expected_data, rv.wsgi_request) assert rv.data == expected_data def test_api_snapshot_errors(api_client): unknown_snapshot_ = random_sha1() url = reverse("api-1-snapshot", url_args={"snapshot_id": "63ce369"}) check_api_get_responses(api_client, url, status_code=400) url = reverse("api-1-snapshot", url_args={"snapshot_id": unknown_snapshot_}) check_api_get_responses(api_client, url, status_code=404) def test_api_snapshot_uppercase(api_client, snapshot): url = reverse( "api-1-snapshot-uppercase-checksum", url_args={"snapshot_id": snapshot.upper()} ) resp = check_http_get_response(api_client, url, status_code=302) redirect_url = reverse( "api-1-snapshot-uppercase-checksum", url_args={"snapshot_id": snapshot} ) assert resp["location"] == redirect_url @given(new_snapshot(min_size=4)) def test_api_snapshot_null_branch(api_client, archive_data, new_snapshot): snp_dict = new_snapshot.to_dict() snp_id = hash_to_hex(snp_dict["id"]) for branch in snp_dict["branches"].keys(): snp_dict["branches"][branch] = None break archive_data.snapshot_add([Snapshot.from_dict(snp_dict)]) url = reverse("api-1-snapshot", url_args={"snapshot_id": snp_id}) check_api_get_responses(api_client, url, status_code=200) def test_api_snapshot_no_pull_request_branches_filtering( api_client, archive_data, origin_with_pull_request_branches ): """Pull request branches should not be filtered out when querying a snapshot with the Web API.""" snapshot = archive_data.snapshot_get_latest(origin_with_pull_request_branches.url) url = reverse("api-1-snapshot", url_args={"snapshot_id": snapshot["id"]}) resp = check_api_get_responses(api_client, url, status_code=200) assert any([b.startswith("refs/pull/") for b in resp.data["branches"]]) diff --git a/swh/web/tests/api/views/test_stat.py b/swh/web/tests/api/views/test_stat.py index 28ad3cee..8b504d5e 100644 --- a/swh/web/tests/api/views/test_stat.py +++ b/swh/web/tests/api/views/test_stat.py @@ -1,59 +1,59 @@ # Copyright (C) 2015-2019 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.storage.exc import StorageAPIError, StorageDBError -from swh.web.common.exc import BadInputExc -from swh.web.common.utils import reverse -from swh.web.tests.utils import check_api_get_responses +from swh.web.tests.helpers import check_api_get_responses +from swh.web.utils import reverse +from swh.web.utils.exc import BadInputExc def test_api_1_stat_counters_raise_error(api_client, mocker): mock_archive = mocker.patch("swh.web.api.views.stat.archive") mock_archive.stat_counters.side_effect = BadInputExc( "voluntary error to check the bad request middleware." ) url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=400) assert rv.data == { "exception": "BadInputExc", "reason": "voluntary error to check the bad request middleware.", } def test_api_1_stat_counters_raise_from_db(api_client, mocker): mock_archive = mocker.patch("swh.web.api.views.stat.archive") mock_archive.stat_counters.side_effect = StorageDBError( "Storage exploded! Will be back online shortly!" ) url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=503) assert rv.data == { "exception": "StorageDBError", "reason": "An unexpected error occurred in the backend: " "Storage exploded! Will be back online shortly!", } def test_api_1_stat_counters_raise_from_api(api_client, mocker): mock_archive = mocker.patch("swh.web.api.views.stat.archive") mock_archive.stat_counters.side_effect = StorageAPIError( "Storage API dropped dead! Will resurrect from its ashes asap!" ) url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=503) assert rv.data == { "exception": "StorageAPIError", "reason": "An unexpected error occurred in the api backend: " "Storage API dropped dead! Will resurrect from its ashes asap!", } def test_api_1_stat_counters(api_client, archive_data): url = reverse("api-1-stat-counters") rv = check_api_get_responses(api_client, url, status_code=200) assert rv.data == archive_data.stat_counters() diff --git a/swh/web/tests/api/views/test_vault.py b/swh/web/tests/api/views/test_vault.py index fbf058a7..baf97839 100644 --- a/swh/web/tests/api/views/test_vault.py +++ b/swh/web/tests/api/views/test_vault.py @@ -1,330 +1,330 @@ # Copyright (C) 2017-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import re import pytest from swh.model.swhids import CoreSWHID from swh.vault.exc import NotFoundExc -from swh.web.common.utils import reverse -from swh.web.tests.utils import ( +from swh.web.tests.helpers import ( check_api_get_responses, check_api_post_responses, check_http_get_response, check_http_post_response, ) +from swh.web.utils import reverse ##################### # Current API: def test_api_vault_cook(api_client, mocker, directory, revision): mock_archive = mocker.patch("swh.web.api.views.vault.archive") for bundle_type, swhid, content_type, in ( ("flat", f"swh:1:dir:{directory}", "application/gzip"), ("gitfast", f"swh:1:rev:{revision}", "application/gzip"), ("git_bare", f"swh:1:rev:{revision}", "application/x-tar"), ): swhid = CoreSWHID.from_string(swhid) fetch_url = reverse( f"api-1-vault-fetch-{bundle_type.replace('_', '-')}", url_args={"swhid": str(swhid)}, ) stub_cook = { "type": bundle_type, "progress_msg": None, "task_id": 1, "task_status": "done", "swhid": swhid, } stub_fetch = b"content" mock_archive.vault_cook.return_value = stub_cook mock_archive.vault_fetch.return_value = stub_fetch email = "test@test.mail" url = reverse( f"api-1-vault-cook-{bundle_type.replace('_', '-')}", url_args={"swhid": str(swhid)}, query_params={"email": email}, ) rv = check_api_post_responses(api_client, url, data=None, status_code=200) assert rv.data == { "fetch_url": rv.wsgi_request.build_absolute_uri(fetch_url), "progress_message": None, "id": 1, "status": "done", "swhid": str(swhid), } mock_archive.vault_cook.assert_called_with(bundle_type, swhid, email) rv = check_http_get_response(api_client, fetch_url, status_code=200) assert rv["Content-Type"] == content_type assert rv.content == stub_fetch mock_archive.vault_fetch.assert_called_with(bundle_type, swhid) def test_api_vault_cook_notfound( api_client, mocker, directory, revision, unknown_directory, unknown_revision ): - mock_vault = mocker.patch("swh.web.common.archive.vault") + mock_vault = mocker.patch("swh.web.utils.archive.vault") mock_vault.cook.side_effect = NotFoundExc("object not found") mock_vault.fetch.side_effect = NotFoundExc("cooked archive not found") mock_vault.progress.side_effect = NotFoundExc("cooking request not found") for bundle_type, swhid in ( ("flat", f"swh:1:dir:{directory}"), ("gitfast", f"swh:1:rev:{revision}"), ("git_bare", f"swh:1:rev:{revision}"), ): swhid = CoreSWHID.from_string(swhid) url = reverse( f"api-1-vault-cook-{bundle_type.replace('_', '-')}", url_args={"swhid": str(swhid)}, ) rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert rv.data["reason"] == f"Cooking of {swhid} was never requested." mock_vault.progress.assert_called_with(bundle_type, swhid) for bundle_type, swhid in ( ("flat", f"swh:1:dir:{unknown_directory}"), ("gitfast", f"swh:1:rev:{unknown_revision}"), ("git_bare", f"swh:1:rev:{unknown_revision}"), ): swhid = CoreSWHID.from_string(swhid) url = reverse( f"api-1-vault-cook-{bundle_type.replace('_', '-')}", url_args={"swhid": str(swhid)}, ) rv = check_api_post_responses(api_client, url, data=None, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert rv.data["reason"] == f"{swhid} not found." mock_vault.cook.assert_called_with(bundle_type, swhid, email=None) fetch_url = reverse( f"api-1-vault-fetch-{bundle_type.replace('_', '-')}", url_args={"swhid": str(swhid)}, ) rv = check_api_get_responses(api_client, fetch_url, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert rv.data["reason"] == f"Cooked archive for {swhid} not found." mock_vault.fetch.assert_called_with(bundle_type, swhid) @pytest.mark.parametrize("bundle_type", ["flat", "gitfast", "git_bare"]) def test_api_vault_cook_error_content(api_client, mocker, bundle_type): swhid = "swh:1:cnt:" + "0" * 40 email = "test@test.mail" url = reverse( f"api-1-vault-cook-{bundle_type.replace('_', '-')}", url_args={"swhid": swhid}, query_params={"email": email}, ) rv = check_api_post_responses(api_client, url, data=None, status_code=400) assert rv.data == { "exception": "BadInputExc", "reason": ( "Content objects do not need to be cooked, " "use `/api/1/content/raw/` instead." ), } @pytest.mark.parametrize( "bundle_type,swhid_type,hint", [ ("flat", "rev", True), ("flat", "rel", False), ("flat", "snp", False), ("gitfast", "dir", True), ("gitfast", "rel", False), ("gitfast", "snp", False), ("git_bare", "dir", True), ("git_bare", "rel", False), ("git_bare", "snp", False), ], ) def test_api_vault_cook_error(api_client, mocker, bundle_type, swhid_type, hint): swhid = f"swh:1:{swhid_type}:" + "0" * 40 email = "test@test.mail" url = reverse( f"api-1-vault-cook-{bundle_type.replace('_', '-')}", url_args={"swhid": swhid}, query_params={"email": email}, ) rv = check_api_post_responses(api_client, url, data=None, status_code=400) assert rv.data["exception"] == "BadInputExc" if hint: assert re.match( r"Only .* can be cooked as .* bundles\. Use .*", rv.data["reason"] ) else: assert re.match(r"Only .* can be cooked as .* bundles\.", rv.data["reason"]) ##################### # Legacy API: def test_api_vault_cook_legacy(api_client, mocker, directory, revision): mock_archive = mocker.patch("swh.web.api.views.vault.archive") for obj_type, bundle_type, response_obj_type, obj_id in ( ("directory", "flat", "directory", directory), ("revision_gitfast", "gitfast", "revision", revision), ): swhid = CoreSWHID.from_string(f"swh:1:{obj_type[:3]}:{obj_id}") fetch_url = reverse( f"api-1-vault-fetch-{bundle_type}", url_args={"swhid": str(swhid)}, ) stub_cook = { "type": obj_type, "progress_msg": None, "task_id": 1, "task_status": "done", "swhid": swhid, "obj_type": response_obj_type, "obj_id": obj_id, } stub_fetch = b"content" mock_archive.vault_cook.return_value = stub_cook mock_archive.vault_fetch.return_value = stub_fetch email = "test@test.mail" url = reverse( f"api-1-vault-cook-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, query_params={"email": email}, ) rv = check_api_post_responses(api_client, url, data=None, status_code=200) assert rv.data == { "fetch_url": rv.wsgi_request.build_absolute_uri(fetch_url), "progress_message": None, "id": 1, "status": "done", "swhid": str(swhid), "obj_type": response_obj_type, "obj_id": obj_id, } mock_archive.vault_cook.assert_called_with(bundle_type, swhid, email) rv = check_http_get_response(api_client, fetch_url, status_code=200) assert rv["Content-Type"] == "application/gzip" assert rv.content == stub_fetch mock_archive.vault_fetch.assert_called_with(bundle_type, swhid) def test_api_vault_cook_uppercase_hash_legacy(api_client, directory, revision): for obj_type, obj_id in ( ("directory", directory), ("revision_gitfast", revision), ): url = reverse( f"api-1-vault-cook-{obj_type}-uppercase-checksum", url_args={f"{obj_type[:3]}_id": obj_id.upper()}, ) rv = check_http_post_response( api_client, url, data={"email": "test@test.mail"}, status_code=302 ) redirect_url = reverse( f"api-1-vault-cook-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id} ) assert rv["location"] == redirect_url fetch_url = reverse( f"api-1-vault-fetch-{obj_type}-uppercase-checksum", url_args={f"{obj_type[:3]}_id": obj_id.upper()}, ) rv = check_http_get_response(api_client, fetch_url, status_code=302) redirect_url = reverse( f"api-1-vault-fetch-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, ) assert rv["location"] == redirect_url def test_api_vault_cook_notfound_legacy( api_client, mocker, directory, revision, unknown_directory, unknown_revision ): - mock_vault = mocker.patch("swh.web.common.archive.vault") + mock_vault = mocker.patch("swh.web.utils.archive.vault") mock_vault.cook.side_effect = NotFoundExc("object not found") mock_vault.fetch.side_effect = NotFoundExc("cooked archive not found") mock_vault.progress.side_effect = NotFoundExc("cooking request not found") for obj_type, bundle_type, obj_id in ( ("directory", "flat", directory), ("revision_gitfast", "gitfast", revision), ): url = reverse( f"api-1-vault-cook-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, ) swhid = CoreSWHID.from_string(f"swh:1:{obj_type[:3]}:{obj_id}") rv = check_api_get_responses(api_client, url, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert rv.data["reason"] == f"Cooking of {swhid} was never requested." mock_vault.progress.assert_called_with(bundle_type, swhid) for obj_type, bundle_type, obj_id in ( ("directory", "flat", unknown_directory), ("revision_gitfast", "gitfast", unknown_revision), ): swhid = CoreSWHID.from_string(f"swh:1:{obj_type[:3]}:{obj_id}") url = reverse( f"api-1-vault-cook-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id} ) rv = check_api_post_responses(api_client, url, data=None, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert rv.data["reason"] == f"{swhid} not found." mock_vault.cook.assert_called_with(bundle_type, swhid, email=None) fetch_url = reverse( f"api-1-vault-fetch-{obj_type}", url_args={f"{obj_type[:3]}_id": obj_id}, ) # Redirected to the current 'fetch' url rv = check_http_get_response(api_client, fetch_url, status_code=302) redirect_url = reverse( f"api-1-vault-fetch-{bundle_type}", url_args={"swhid": str(swhid)}, ) assert rv["location"] == redirect_url rv = check_api_get_responses(api_client, redirect_url, status_code=404) assert rv.data["exception"] == "NotFoundExc" assert rv.data["reason"] == f"Cooked archive for {swhid} not found." mock_vault.fetch.assert_called_with(bundle_type, swhid) diff --git a/swh/web/tests/api/views/utils.py b/swh/web/tests/api/views/utils.py index 9887dba3..90bf661e 100644 --- a/swh/web/tests/api/views/utils.py +++ b/swh/web/tests/api/views/utils.py @@ -1,33 +1,33 @@ # Copyright (C) 2015-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from requests.utils import parse_header_links -from swh.web.tests.utils import check_api_get_responses +from swh.web.tests.helpers import check_api_get_responses def scroll_results(api_client, url): """Iterates through pages of results, and returns them all.""" results = [] while True: rv = check_api_get_responses(api_client, url, status_code=200) results.extend(rv.data) if "Link" in rv: for link in parse_header_links(rv["Link"]): if link["rel"] == "next": # Found link to next page of results url = link["url"] break else: # No link with 'rel=next' break else: # No Link header break return results diff --git a/swh/web/tests/auth/test_views.py b/swh/web/tests/auth/test_views.py index 2b194613..c3580443 100644 --- a/swh/web/tests/auth/test_views.py +++ b/swh/web/tests/auth/test_views.py @@ -1,312 +1,313 @@ # Copyright (C) 2020-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json from urllib.parse import urljoin, urlparse import uuid import pytest +from django.conf import settings from django.http import QueryDict from swh.auth.keycloak import KeycloakError from swh.web.auth.models import OIDCUserOfflineTokens -from swh.web.auth.utils import OIDC_SWH_WEB_CLIENT_ID, decrypt_data -from swh.web.common.utils import reverse +from swh.web.auth.utils import decrypt_data from swh.web.config import get_config from swh.web.tests.django_asserts import assert_contains -from swh.web.tests.utils import ( +from swh.web.tests.helpers import ( check_html_get_response, check_http_get_response, check_http_post_response, ) from swh.web.urls import _default_view as homepage_view +from swh.web.utils import reverse def _check_oidc_login_code_flow_data( request, response, keycloak_oidc, redirect_uri, scope="openid" ): parsed_url = urlparse(response["location"]) authorization_url = keycloak_oidc.well_known()["authorization_endpoint"] query_dict = QueryDict(parsed_url.query) # check redirect url is valid assert urljoin(response["location"], parsed_url.path) == authorization_url assert "client_id" in query_dict - assert query_dict["client_id"] == OIDC_SWH_WEB_CLIENT_ID + assert query_dict["client_id"] == settings.OIDC_SWH_WEB_CLIENT_ID assert "response_type" in query_dict assert query_dict["response_type"] == "code" assert "redirect_uri" in query_dict assert query_dict["redirect_uri"] == redirect_uri assert "code_challenge_method" in query_dict assert query_dict["code_challenge_method"] == "S256" assert "scope" in query_dict assert query_dict["scope"] == scope assert "state" in query_dict assert "code_challenge" in query_dict # check a login_data has been registered in user session assert "login_data" in request.session login_data = request.session["login_data"] assert "code_verifier" in login_data assert "state" in login_data assert "redirect_uri" in login_data assert login_data["redirect_uri"] == query_dict["redirect_uri"] return login_data def test_view_rendering_when_user_not_set_in_request(request_factory): request = request_factory.get("/") # Django RequestFactory do not set any user by default assert not hasattr(request, "user") response = homepage_view(request) assert response.status_code == 200 def test_oidc_generate_bearer_token_anonymous_user(client): """ Anonymous user should be refused access with forbidden response. """ url = reverse("oidc-generate-bearer-token") check_http_get_response(client, url, status_code=403) def _generate_and_test_bearer_token(client, kc_oidc_mock): # user authenticates client.login( code="code", code_verifier="code-verifier", redirect_uri="redirect-uri" ) # user initiates bearer token generation flow url = reverse("oidc-generate-bearer-token") response = check_http_get_response(client, url, status_code=302) request = response.wsgi_request redirect_uri = reverse("oidc-generate-bearer-token-complete", request=request) # check login data and redirection to Keycloak is valid login_data = _check_oidc_login_code_flow_data( request, response, kc_oidc_mock, redirect_uri=redirect_uri, scope="openid offline_access", ) # once a user has identified himself in Keycloak, he is # redirected to the 'oidc-generate-bearer-token-complete' view # to get and save bearer token # generate authorization code / session state in the same # manner as Keycloak code = f"{str(uuid.uuid4())}.{str(uuid.uuid4())}.{str(uuid.uuid4())}" session_state = str(uuid.uuid4()) token_complete_url = reverse( "oidc-generate-bearer-token-complete", query_params={ "code": code, "state": login_data["state"], "session_state": session_state, }, ) nb_tokens = len(OIDCUserOfflineTokens.objects.all()) response = check_http_get_response(client, token_complete_url, status_code=302) request = response.wsgi_request # check token has been generated and saved encrypted to database assert len(OIDCUserOfflineTokens.objects.all()) == nb_tokens + 1 encrypted_token = OIDCUserOfflineTokens.objects.last().offline_token.tobytes() secret = get_config()["secret_key"].encode() salt = request.user.sub.encode() decrypted_token = decrypt_data(encrypted_token, secret, salt) oidc_profile = kc_oidc_mock.authorization_code(code=code, redirect_uri=redirect_uri) assert decrypted_token.decode("ascii") == oidc_profile["refresh_token"] # should redirect to tokens management Web UI assert response["location"] == reverse("oidc-profile") + "#tokens" return decrypted_token @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_oidc_generate_bearer_token_authenticated_user_success(client, keycloak_oidc): """ Authenticated user should be able to generate a bearer token using OIDC Authorization Code Flow. """ _generate_and_test_bearer_token(client, keycloak_oidc) def test_oidc_list_bearer_tokens_anonymous_user(client): """ Anonymous user should be refused access with forbidden response. """ url = reverse( "oidc-list-bearer-tokens", query_params={"draw": 1, "start": 0, "length": 10} ) check_http_get_response(client, url, status_code=403) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_oidc_list_bearer_tokens(client, keycloak_oidc): """ User with correct credentials should be allowed to list his tokens. """ nb_tokens = 3 for _ in range(nb_tokens): _generate_and_test_bearer_token(client, keycloak_oidc) url = reverse( "oidc-list-bearer-tokens", query_params={"draw": 1, "start": 0, "length": 10} ) response = check_http_get_response(client, url, status_code=200) tokens_data = list(reversed(json.loads(response.content.decode("utf-8"))["data"])) for oidc_token in OIDCUserOfflineTokens.objects.all(): assert ( oidc_token.creation_date.isoformat() == tokens_data[oidc_token.id - 1]["creation_date"] ) def test_oidc_get_bearer_token_anonymous_user(client): """ Anonymous user should be refused access with forbidden response. """ url = reverse("oidc-get-bearer-token") check_http_post_response(client, url, status_code=403) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_oidc_get_bearer_token(client, keycloak_oidc): """ User with correct credentials should be allowed to display a token. """ nb_tokens = 3 for i in range(nb_tokens): token = _generate_and_test_bearer_token(client, keycloak_oidc) url = reverse("oidc-get-bearer-token") response = check_http_post_response( client, url, status_code=200, data={"token_id": i + 1}, content_type="text/plain", ) assert response.content == token @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_oidc_get_bearer_token_expired_token(client, keycloak_oidc): """ User with correct credentials should be allowed to display a token. """ _generate_and_test_bearer_token(client, keycloak_oidc) for kc_err_msg in ("Offline session not active", "Offline user session not found"): kc_error_dict = { "error": "invalid_grant", "error_description": kc_err_msg, } keycloak_oidc.refresh_token.side_effect = KeycloakError( error_message=json.dumps(kc_error_dict).encode(), response_code=400 ) url = reverse("oidc-get-bearer-token") response = check_http_post_response( client, url, status_code=400, data={"token_id": 1}, content_type="text/plain", ) assert ( response.content == b"Bearer token has expired, please generate a new one." ) def test_oidc_revoke_bearer_tokens_anonymous_user(client): """ Anonymous user should be refused access with forbidden response. """ url = reverse("oidc-revoke-bearer-tokens") check_http_post_response(client, url, status_code=403) @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_oidc_revoke_bearer_tokens(client, keycloak_oidc): """ User with correct credentials should be allowed to revoke tokens. """ nb_tokens = 3 for _ in range(nb_tokens): _generate_and_test_bearer_token(client, keycloak_oidc) url = reverse("oidc-revoke-bearer-tokens") check_http_post_response( client, url, status_code=200, data={"token_ids": [1]}, ) assert len(OIDCUserOfflineTokens.objects.all()) == 2 check_http_post_response( client, url, status_code=200, data={"token_ids": [2, 3]}, ) assert len(OIDCUserOfflineTokens.objects.all()) == 0 def test_oidc_profile_view_anonymous_user(client): """ Non authenticated users should be redirected to login page when requesting profile view. """ url = reverse("oidc-profile") login_url = reverse("oidc-login", query_params={"next_path": url}) resp = check_http_get_response(client, url, status_code=302) assert resp["location"] == login_url @pytest.mark.django_db(transaction=True, reset_sequences=True) def test_oidc_profile_view(client, keycloak_oidc): """ Authenticated users should be able to request the profile page and link to Keycloak account UI should be present. """ url = reverse("oidc-profile") kc_config = get_config()["keycloak"] client_permissions = ["perm1", "perm2"] keycloak_oidc.client_permissions = client_permissions client.login(code="", code_verifier="", redirect_uri="") resp = check_html_get_response( client, url, status_code=200, template_used="profile.html" ) user = resp.wsgi_request.user kc_account_url = ( f"{kc_config['server_url']}realms/{kc_config['realm_name']}/account/" ) assert_contains(resp, kc_account_url) assert_contains(resp, user.username) assert_contains(resp, user.first_name) assert_contains(resp, user.last_name) assert_contains(resp, user.email) for perm in client_permissions: assert_contains(resp, perm) diff --git a/swh/web/tests/browse/test_snapshot_context.py b/swh/web/tests/browse/test_snapshot_context.py index 6d830ac6..1a17bc4c 100644 --- a/swh/web/tests/browse/test_snapshot_context.py +++ b/swh/web/tests/browse/test_snapshot_context.py @@ -1,524 +1,524 @@ # Copyright (C) 2020-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import random from swh.model.hashutil import hash_to_bytes from swh.model.model import ObjectType as ModelObjectType from swh.model.model import Release, Snapshot, SnapshotBranch, TargetType from swh.model.swhids import ObjectType from swh.web.browse.snapshot_context import ( _get_release, get_origin_visit_snapshot, get_snapshot_content, get_snapshot_context, ) from swh.web.browse.utils import gen_revision_url -from swh.web.common.identifiers import gen_swhid -from swh.web.common.origin_visits import get_origin_visit, get_origin_visits -from swh.web.common.typing import ( +from swh.web.utils import format_utc_iso_date, reverse +from swh.web.utils.identifiers import gen_swhid +from swh.web.utils.origin_visits import get_origin_visit, get_origin_visits +from swh.web.utils.typing import ( SnapshotBranchInfo, SnapshotContext, SnapshotReleaseInfo, ) -from swh.web.common.utils import format_utc_iso_date, reverse def test_get_origin_visit_snapshot_simple(archive_data, origin_with_multiple_visits): visits = archive_data.origin_visit_get(origin_with_multiple_visits["url"]) for visit in visits: snapshot = archive_data.snapshot_get(visit["snapshot"]) branches = [] releases = [] def _process_branch_data(branch, branch_data, alias=False): if branch_data["target_type"] == "revision": rev_data = archive_data.revision_get(branch_data["target"]) branches.append( SnapshotBranchInfo( name=branch, alias=alias, revision=branch_data["target"], directory=rev_data["directory"], date=format_utc_iso_date(rev_data["date"]), message=rev_data["message"], url=None, ) ) elif branch_data["target_type"] == "release": rel_data = archive_data.release_get(branch_data["target"]) rev_data = archive_data.revision_get(rel_data["target"]) releases.append( SnapshotReleaseInfo( name=rel_data["name"], alias=alias, branch_name=branch, date=format_utc_iso_date(rel_data["date"]), id=rel_data["id"], message=rel_data["message"], target_type=rel_data["target_type"], target=rel_data["target"], directory=rev_data["directory"], url=None, ) ) aliases = {} for branch in sorted(snapshot["branches"].keys()): branch_data = snapshot["branches"][branch] if branch_data["target_type"] == "alias": target_data = snapshot["branches"][branch_data["target"]] aliases[branch] = target_data _process_branch_data(branch, target_data, alias=True) else: _process_branch_data(branch, branch_data) assert branches and releases, "Incomplete test data." origin_visit_branches = get_origin_visit_snapshot( origin_with_multiple_visits, visit_id=visit["visit"] ) assert origin_visit_branches == (branches, releases, aliases) def test_get_snapshot_context_no_origin(archive_data, snapshot): for browse_context, kwargs in ( ("content", {"snapshot_id": snapshot, "path": "/some/path"}), ("directory", {"snapshot_id": snapshot}), ("log", {"snapshot_id": snapshot}), ): url_args = {"snapshot_id": snapshot} query_params = dict(kwargs) query_params.pop("snapshot_id") snapshot_context = get_snapshot_context(**kwargs, browse_context=browse_context) branches, releases, _ = get_snapshot_content(snapshot) releases = list(reversed(releases)) revision_id = None root_directory = None for branch in branches: if branch["name"] == "HEAD": revision_id = branch["revision"] root_directory = branch["directory"] branch["url"] = reverse( f"browse-snapshot-{browse_context}", url_args=url_args, query_params={"branch": branch["name"], **query_params}, ) for release in releases: release["url"] = reverse( f"browse-snapshot-{browse_context}", url_args=url_args, query_params={"release": release["name"], **query_params}, ) branches_url = reverse("browse-snapshot-branches", url_args=url_args) releases_url = reverse("browse-snapshot-releases", url_args=url_args) directory_url = reverse("browse-snapshot-directory", url_args=url_args) is_empty = not branches and not releases snapshot_swhid = gen_swhid(ObjectType.SNAPSHOT, snapshot) snapshot_sizes = archive_data.snapshot_count_branches(snapshot) expected = SnapshotContext( branch="HEAD", branch_alias=True, branches=branches, branches_url=branches_url, is_empty=is_empty, origin_info=None, origin_visits_url=None, release=None, release_alias=False, release_id=None, query_params=query_params, releases=releases, releases_url=releases_url, revision_id=revision_id, revision_info=_get_revision_info(archive_data, revision_id), root_directory=root_directory, snapshot_id=snapshot, snapshot_sizes=snapshot_sizes, snapshot_swhid=snapshot_swhid, url_args=url_args, visit_info=None, directory_url=directory_url, ) if revision_id: expected["revision_info"]["revision_url"] = gen_revision_url( revision_id, snapshot_context ) assert snapshot_context == expected _check_branch_release_revision_parameters( archive_data, expected, browse_context, kwargs, branches, releases ) def test_get_snapshot_context_with_origin(archive_data, origin_with_multiple_visits): origin_visits = get_origin_visits(origin_with_multiple_visits) timestamp = format_utc_iso_date(origin_visits[0]["date"], "%Y-%m-%dT%H:%M:%SZ") visit_id = origin_visits[1]["visit"] origin_url = origin_with_multiple_visits["url"] for browse_context, kwargs in ( ("content", {"origin_url": origin_url, "path": "/some/path"}), ("directory", {"origin_url": origin_url}), ("log", {"origin_url": origin_url}), ( "directory", { "origin_url": origin_url, "timestamp": timestamp, }, ), ( "directory", { "origin_url": origin_url, "visit_id": visit_id, }, ), ): visit_id = kwargs["visit_id"] if "visit_id" in kwargs else None visit_ts = kwargs["timestamp"] if "timestamp" in kwargs else None visit_info = get_origin_visit( {"url": kwargs["origin_url"]}, visit_ts=visit_ts, visit_id=visit_id ) snapshot = visit_info["snapshot"] snapshot_context = get_snapshot_context(**kwargs, browse_context=browse_context) query_params = dict(kwargs) branches, releases, _ = get_snapshot_content(snapshot) releases = list(reversed(releases)) revision_id = None root_directory = None for branch in branches: if branch["name"] == "HEAD": revision_id = branch["revision"] root_directory = branch["directory"] branch["url"] = reverse( f"browse-origin-{browse_context}", query_params={"branch": branch["name"], **query_params}, ) for release in releases: release["url"] = reverse( f"browse-origin-{browse_context}", query_params={"release": release["name"], **query_params}, ) query_params.pop("path", None) branches_url = reverse("browse-origin-branches", query_params=query_params) releases_url = reverse("browse-origin-releases", query_params=query_params) origin_visits_url = reverse( "browse-origin-visits", query_params={"origin_url": kwargs["origin_url"]} ) is_empty = not branches and not releases snapshot_swhid = gen_swhid(ObjectType.SNAPSHOT, snapshot) snapshot_sizes = archive_data.snapshot_count_branches(snapshot) visit_info["url"] = directory_url = reverse( "browse-origin-directory", query_params=query_params ) visit_info["formatted_date"] = format_utc_iso_date(visit_info["date"]) if "path" in kwargs: query_params["path"] = kwargs["path"] expected = SnapshotContext( branch="HEAD", branch_alias=True, branches=branches, branches_url=branches_url, is_empty=is_empty, origin_info={"url": origin_url}, origin_visits_url=origin_visits_url, release=None, release_alias=False, release_id=None, query_params=query_params, releases=releases, releases_url=releases_url, revision_id=revision_id, revision_info=_get_revision_info(archive_data, revision_id), root_directory=root_directory, snapshot_id=snapshot, snapshot_sizes=snapshot_sizes, snapshot_swhid=snapshot_swhid, url_args={}, visit_info=visit_info, directory_url=directory_url, ) if revision_id: expected["revision_info"]["revision_url"] = gen_revision_url( revision_id, snapshot_context ) assert snapshot_context == expected _check_branch_release_revision_parameters( archive_data, expected, browse_context, kwargs, branches, releases ) def _check_branch_release_revision_parameters( archive_data, base_expected_context, browse_context, kwargs, branches, releases, ): branch = random.choice(branches) snapshot_context = get_snapshot_context( **kwargs, browse_context=browse_context, branch_name=branch["name"] ) url_args = dict(kwargs) url_args.pop("path", None) url_args.pop("timestamp", None) url_args.pop("visit_id", None) url_args.pop("origin_url", None) query_params = dict(kwargs) query_params.pop("snapshot_id", None) expected_branch = dict(base_expected_context) expected_branch["branch"] = branch["name"] expected_branch["branch_alias"] = branch["alias"] expected_branch["revision_id"] = branch["revision"] expected_branch["revision_info"] = _get_revision_info( archive_data, branch["revision"] ) expected_branch["root_directory"] = branch["directory"] expected_branch["query_params"] = {"branch": branch["name"], **query_params} expected_branch["revision_info"]["revision_url"] = gen_revision_url( branch["revision"], expected_branch ) assert snapshot_context == expected_branch if releases: release = random.choice(releases) snapshot_context = get_snapshot_context( **kwargs, browse_context=browse_context, release_name=release["name"] ) expected_release = dict(base_expected_context) expected_release["branch"] = None expected_release["branch_alias"] = False expected_release["release"] = release["name"] expected_release["release_id"] = release["id"] if release["target_type"] == "revision": expected_release["revision_id"] = release["target"] expected_release["revision_info"] = _get_revision_info( archive_data, release["target"] ) expected_release["root_directory"] = release["directory"] expected_release["query_params"] = {"release": release["name"], **query_params} expected_release["revision_info"]["revision_url"] = gen_revision_url( release["target"], expected_release ) assert snapshot_context == expected_release revision_log = archive_data.revision_log(branch["revision"]) revision = revision_log[-1] snapshot_context = get_snapshot_context( **kwargs, browse_context=browse_context, revision_id=revision["id"] ) if "origin_url" in kwargs: view_name = f"browse-origin-{browse_context}" else: view_name = f"browse-snapshot-{browse_context}" kwargs.pop("visit_id", None) revision_browse_url = reverse( view_name, url_args=url_args, query_params={"revision": revision["id"], **query_params}, ) branches.append( SnapshotBranchInfo( name=revision["id"], alias=False, revision=revision["id"], directory=revision["directory"], date=revision["date"], message=revision["message"], url=revision_browse_url, ) ) expected_revision = dict(base_expected_context) expected_revision["branch"] = None expected_revision["branch_alias"] = False expected_revision["branches"] = branches expected_revision["revision_id"] = revision["id"] expected_revision["revision_info"] = _get_revision_info( archive_data, revision["id"] ) expected_revision["root_directory"] = revision["directory"] expected_revision["query_params"] = {"revision": revision["id"], **query_params} expected_revision["revision_info"]["revision_url"] = gen_revision_url( revision["id"], expected_revision ) assert snapshot_context == expected_revision def test_get_release_large_snapshot(archive_data, origin_with_releases): snapshot = archive_data.snapshot_get_latest(origin_with_releases["url"]) release_id = random.choice( [ v["target"] for v in snapshot["branches"].values() if v["target_type"] == "release" ] ) release_data = archive_data.release_get(release_id) # simulate large snapshot processing by providing releases parameter # as an empty list release = _get_release( releases=[], release_name=release_data["name"], snapshot_id=snapshot["id"] ) assert release_data["name"] == release["name"] assert release_data["id"] == release["id"] def _get_revision_info(archive_data, revision_id): revision_info = None if revision_id: revision_info = archive_data.revision_get(revision_id) revision_info["message_header"] = revision_info["message"].split("\n")[0] revision_info["date"] = format_utc_iso_date(revision_info["date"]) revision_info["committer_date"] = format_utc_iso_date( revision_info["committer_date"] ) return revision_info def test_get_snapshot_context_revision_release(archive_data, revision): release_name = "v1.0.0" release = Release( name=release_name.encode(), message=f"release {release_name}".encode(), target=hash_to_bytes(revision), target_type=ModelObjectType.REVISION, synthetic=True, ) archive_data.release_add([release]) snapshot = Snapshot( branches={ b"HEAD": SnapshotBranch( target=release_name.encode(), target_type=TargetType.ALIAS ), release_name.encode(): SnapshotBranch( target=release.id, target_type=TargetType.RELEASE ), }, ) archive_data.snapshot_add([snapshot]) snapshot_no_head = Snapshot( branches={ release_name.encode(): SnapshotBranch( target=release.id, target_type=TargetType.RELEASE ), }, ) archive_data.snapshot_add([snapshot_no_head]) revision_data = archive_data.revision_get(revision) for params in ( {"snapshot_id": snapshot.id.hex()}, {"snapshot_id": snapshot.id.hex(), "release_name": release_name}, {"snapshot_id": snapshot_no_head.id.hex()}, ): snapshot_context = get_snapshot_context(**params) assert snapshot_context["branches"] == [] assert snapshot_context["releases"] != [] assert snapshot_context["release"] == release_name assert snapshot_context["release_id"] == release.id.hex() assert snapshot_context["revision_id"] == revision assert snapshot_context["root_directory"] == revision_data["directory"] def test_get_snapshot_context_directory_release(archive_data, directory): release_name = "v1.0.0" release = Release( name=release_name.encode(), message=f"release {release_name}".encode(), target=hash_to_bytes(directory), target_type=ModelObjectType.DIRECTORY, synthetic=True, ) archive_data.release_add([release]) snapshot = Snapshot( branches={ b"HEAD": SnapshotBranch( target=release_name.encode(), target_type=TargetType.ALIAS ), release_name.encode(): SnapshotBranch( target=release.id, target_type=TargetType.RELEASE ), }, ) archive_data.snapshot_add([snapshot]) snapshot_no_head = Snapshot( branches={ release_name.encode(): SnapshotBranch( target=release.id, target_type=TargetType.RELEASE ), }, ) archive_data.snapshot_add([snapshot_no_head]) for params in ( {"snapshot_id": snapshot.id.hex()}, {"snapshot_id": snapshot.id.hex(), "release_name": release_name}, {"snapshot_id": snapshot_no_head.id.hex()}, ): snapshot_context = get_snapshot_context(**params) assert snapshot_context["branches"] == [] assert snapshot_context["releases"] != [] assert snapshot_context["release"] == release_name assert snapshot_context["release_id"] == release.id.hex() assert snapshot_context["revision_id"] is None assert snapshot_context["root_directory"] == directory diff --git a/swh/web/tests/browse/test_utils.py b/swh/web/tests/browse/test_utils.py index fc899363..5234a1e4 100644 --- a/swh/web/tests/browse/test_utils.py +++ b/swh/web/tests/browse/test_utils.py @@ -1,147 +1,147 @@ # Copyright (C) 2017-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import re import pytest from swh.model.model import Content from swh.web.browse.utils import ( gen_link, gen_person_mail_link, gen_revision_link, get_mimetype_and_encoding_for_content, get_readme_to_display, prepare_content_for_display, re_encode_content, ) -from swh.web.common.utils import reverse from swh.web.tests.data import get_tests_data +from swh.web.utils import reverse def test_get_mimetype_and_encoding_for_content(): text = b"Hello world!" assert get_mimetype_and_encoding_for_content(text) == ( "text/plain", "us-ascii", ) def test_gen_link(): assert ( gen_link("https://www.softwareheritage.org/", "swh") == 'swh' ) def test_gen_revision_link(): revision_id = "28a0bc4120d38a394499382ba21d6965a67a3703" revision_url = reverse("browse-revision", url_args={"sha1_git": revision_id}) assert gen_revision_link( revision_id, link_text=None, link_attrs=None ) == '%s' % (revision_url, revision_id) assert gen_revision_link( revision_id, shorten_id=True, link_attrs=None ) == '%s' % (revision_url, revision_id[:7]) def test_gen_person_mail_link(): person_full = { "name": "John Doe", "email": "john.doe@swh.org", "fullname": "John Doe ", } assert gen_person_mail_link(person_full) == '%s' % ( person_full["email"], person_full["name"], ) link_text = "Mail" assert gen_person_mail_link( person_full, link_text=link_text ) == '%s' % (person_full["email"], link_text) person_partial_email = {"name": None, "email": None, "fullname": "john.doe@swh.org"} assert gen_person_mail_link( person_partial_email ) == '%s' % ( person_partial_email["fullname"], person_partial_email["fullname"], ) person_partial = { "name": None, "email": None, "fullname": "John Doe ", } assert gen_person_mail_link(person_partial) == person_partial["fullname"] person_none = {"name": None, "email": None, "fullname": None} assert gen_person_mail_link(person_none) == "None" @pytest.mark.parametrize( "path, expected_language", [("CMakeLists.txt", "cmake"), ("path/CMakeLists.txt", "cmake")], ) def test_prepare_content_display_language_for_filename(path, expected_language): content_display = prepare_content_for_display( content_data=b"", mime_type="", path=path ) assert content_display["language"] == expected_language def test_re_encode_content_for_shift_jis_encoding(): data = b"/* \x8a\xd6\x98A\x82\xcc\x95\xb6\x8e\x9a\x83R\x81[\x83h\x95\xcf\x8a\xb7 */" mime_type, encoding = get_mimetype_and_encoding_for_content(data) _, encoding, re_encoded_data = re_encode_content(mime_type, encoding, data) assert encoding == "SHIFT_JIS" assert data.decode(encoding) == re_encoded_data.decode("utf-8") assert re_encoded_data.decode("utf-8") == "/* 関連の文字コード変換 */" @pytest.mark.parametrize( "input_,expected_output", [ (b"foo bar", "

foo bar

"), (b"foo *bar* baz", "

foo bar baz

"), ( b".. raw:: html\n\n ", "<script>foo</script>", ), ], ) def test_rst_readme(input_, expected_output): content = Content.from_data(input_) storage = get_tests_data()["storage"] storage.content_add([content]) assert re.search( expected_output, get_readme_to_display({"readme.rst": content.sha1.hex()})[2] ) def test_rst_readme_no_leak(): input_ = b".. include:: /etc/passwd" content = Content.from_data(input_) storage = get_tests_data()["storage"] storage.content_add([content]) assert "root:" not in get_readme_to_display({"readme.rst": content.sha1.hex()})[2] def test_rst_readme_no_xss(): input_ = b".. raw:: html\n\n " content = Content.from_data(input_) storage = get_tests_data()["storage"] storage.content_add([content]) assert ( "