diff --git a/docs/developers-info.rst b/docs/developers-info.rst index 0e825497..cf2f662d 100644 --- a/docs/developers-info.rst +++ b/docs/developers-info.rst @@ -1,135 +1,135 @@ Developers Information ====================== Sample configuration -------------------- The configuration will be taken from the default configuration file: *~/.config/swh/webapp/webapp.yml*. The following introduces a default configuration file: .. sourcecode:: yaml storage: cls: remote args: url: http://localhost:5002 debug: false throttling: cache_uri: None scopes: swh_api: limiter_rate: default: 120/h exempted_networks: - 127.0.0.0/8 Run server ---------- Either use the django manage script directly (useful in development mode as it offers various commands): .. sourcecode:: shell $ python3 -m swh.web.manage runserver or use the following shortcut: .. sourcecode:: shell $ make run Modules description ------------------- Common to all web applications ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Configuration and settings """""""""""""""""""""""""" * :mod:`swh.web.config`: holds the SWH configuration for the web applications. * :mod:`swh.web.doc_config`: utility module used to extend the sphinx configuration when building the documentation. - * :mod:`swh.web.manage`: Django management module for developpers. + * :mod:`swh.web.manage`: Django management module for developers. * :mod:`swh.web.urls`: module that holds the whole URI scheme of all the web applications. * :mod:`swh.web.wsgi`: WSGI module to use when deploying the web applications in production. * :mod:`swh.web.settings.common`: Common Django settings * :mod:`swh.web.settings.development`: Django settings for development * :mod:`swh.web.settings.production`: Django settings for production * :mod:`swh.web.settings.tests`: Django settings for tests Common utilities """""""""""""""" * :mod:`swh.web.common.converters`: conversion module used to transform SWH raw data to serializable ones. It is used by :mod:`swh.web.common.service`: to convert data before transmitting then to Django views. * :mod:`swh.web.common.exc`: module defining exceptions used in the web applications. * :mod:`swh.web.common.highlightjs`: utility module to ease the use of the highlightjs_ library in produced Django views. * :mod:`swh.web.common.query`: Utilities to parse data from HTTP endpoints. It is used by :mod:`swh.web.common.service`. * :mod:`swh.web.common.service`: Orchestration layer used by views module in charge of communication with :mod:`swh.storage` to retrieve information and perform conversion for the upper layer. * :mod:`swh.web.common.swh_templatetags`: Custom Django template tags library for swh. * :mod:`swh.web.common.throttling`: Custom request rate limiter to use with the `Django REST Framework `_ * :mod:`swh.web.common.urlsindex`: Utilities to help the registering of endpoints for the web applications * :mod:`swh.web.common.utils`: Utility functions used in the web applications implementation SWH Web API application ^^^^^^^^^^^^^^^^^^^^^^^ * :mod:`swh.web.api.apidoc`: Utilities to document the web api for its html browsable rendering. * :mod:`swh.web.api.apiresponse`: Utility module to ease the generation of web api responses. * :mod:`swh.web.api.apiurls`: Utilities to facilitate the registration of SWH web api endpoints. * :mod:`swh.web.api.urls`: Module that defines the whole URI scheme for the api endpoints * :mod:`swh.web.api.utils`: Utility functions used in the SWH web api implementation. * :mod:`swh.web.api.views.content`: Implementation of API endpoints for getting information about SWH contents. * :mod:`swh.web.api.views.directory`: Implementation of API endpoints for getting information about SWH directories. * :mod:`swh.web.api.views.entity`: Implementation of API endpoints for getting information about SWH entities. * :mod:`swh.web.api.views.origin`: Implementation of API endpoints for getting information about SWH origins. * :mod:`swh.web.api.views.person`: Implementation of API endpoints for getting information about SWH persons. * :mod:`swh.web.api.views.release`: Implementation of API endpoints for getting information about SWH releases. * :mod:`swh.web.api.views.revision`: Implementation of API endpoints for getting information about SWH revisions. * :mod:`swh.web.api.views.snapshot`: Implementation of API endpoints for getting information about SWH snapshots. * :mod:`swh.web.api.views.stat`: Implementation of API endpoints for getting information about SWH archive statistics. * :mod:`swh.web.api.views.utils`: Utilities used in the web api endpoints implementation. SWH Web browse application ^^^^^^^^^^^^^^^^^^^^^^^^^^ * :mod:`swh.web.browse.browseurls`: Utilities to facilitate the registration of SWH web browse endpoints. * :mod:`swh.web.browse.urls`: Module that defines the whole URI scheme for the SWH web browse endpoints. * :mod:`swh.web.browse.utils`: Utilities functions used troughout the SWH web browse endpoints implementation. * :mod:`swh.web.browse.views.content`: Implementation of endpoints for browsing SWH contents. * :mod:`swh.web.browse.views.directory`: Implementation of endpoints for browsing SWH directories. * :mod:`swh.web.browse.views.identifiers`: Implementation of endpoints for browsing SWH objects through persistent identifiers. * :mod:`swh.web.browse.views.origin`: Implementation of endpoints for browsing SWH origins. * :mod:`swh.web.browse.views.person`: Implementation of endpoints for browsing SWH persons. * :mod:`swh.web.browse.views.release`: Implementation of endpoints for browsing SWH releases. * :mod:`swh.web.browse.views.revision`: Implementation of endpoints for browsing SWH revisions. * :mod:`swh.web.browse.views.snapshot`: Implementation of endpoints for browsing SWH snapshots. .. _highlightjs: https://highlightjs.org/ diff --git a/docs/uri-scheme-browse-content.rst b/docs/uri-scheme-browse-content.rst index d81fbc7d..d977fac1 100644 --- a/docs/uri-scheme-browse-content.rst +++ b/docs/uri-scheme-browse-content.rst @@ -1,92 +1,92 @@ Content ^^^^^^^ .. http:get:: /browse/content/[(algo_hash):](hash)/ HTML view that displays a SWH content identified by its hash value. If the content to display is textual, it will be highlighted client-side if possible using highlightjs_. In order for that operation to be performed, a programming language must first be associated to the content. The following procedure is used in order to find the language: 1) First try to find a language from the content filename (provided as query parameter when navigating from a directory view). 2) If no language has been found from the filename, try to find one from the content mime type. The mime type is retrieved from the content metadata stored in the SWH archive or is computed server-side using Python magic module. It is also possible to highlight specific lines of a textual content (not in terms of syntax highlighting but to emphasize some relevant content part) by either: * clicking on line numbers (holding shift to highlight a lines range) * using an url fragment in the form '#Ln' or '#Lm-Ln' When that view is called in the context of a navigation coming from a directory view, a breadcrumb will be displayed on top of the rendered content in order to easily navigate up to the associated root directory. In that case, the path query parameter will be used and filled with the path of the file relative to the root directory. :param string algo_hash: optional parameter to indicate the algorithm used to compute the content checksum (can be either *sha1*, *sha1_git*, *sha256* or *blake2s256*, default to *sha1*) :param string hash: hexadecimal representation for the checksum from which to retrieve the associated content in the SWH archive :query string path: describe the path of the content relative to a root directory (used to add context aware navigation links when navigating from a directory view) :statuscode 200: no error :statuscode 400: an invalid query string has been provided :statuscode 404: requested content can not be found in the SWH archive **Examples:** .. parsed-literal:: :swh_web_browse:`content/sha1_git:f5d0b39a0cdddb91a31a537052b7d8d31a4aa79f/` :swh_web_browse:`content/sha1_git:f5d0b39a0cdddb91a31a537052b7d8d31a4aa79f/#L23-L41` :swh_web_browse:`content/blake2s256:1cc1e3124957c9be8a454c58e92eb925cf4aa9823984bd01451c5b7e0fee99d1/` :swh_web_browse:`content/sha1:1cb1447c1c7ddc1b03eac88398e40bd914d46b62/` :swh_web_browse:`content/sha256:8ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b903/` .. http:get:: /browse/content/[(algo_hash):](hash)/raw/ HTML view that produces a raw display of a SWH content identified by its hash value. The behaviour of that view depends on the mime type of the requested content. If the mime type is from the text family, the view will return a response whose content type is 'text/plain' that will be rendered by the browser. Otherwise, the view will return a response whose content type is 'application/octet-stream' and the browser will then offer to download the file. In the context of a navigation coming from a directory view, the filename query parameter will be used in order to provide the real name of the file when one wants to save it locally. - :param string algo_hash: optionnal parameter to indicate the algorithm used + :param string algo_hash: optional parameter to indicate the algorithm used to compute the content checksum (can be either *sha1*, *sha1_git*, *sha256* or *blake2s256*, default to *sha1*) :param string hash: hexadecimal representation for the checksum from which to retrieve the associated content in the SWH archive :query string filename: indicate the name of the file holding the requested content (used when one wants to save the content to a local file) :statuscode 200: no error :statuscode 400: an invalid query string has been provided :statuscode 404: requested content can not be found in the SWH archive **Examples:** .. parsed-literal:: :swh_web_browse:`content/sha1_git:f5d0b39a0cdddb91a31a537052b7d8d31a4aa79f/raw/?filename=LICENSE` :swh_web_browse:`content/blake2s256:1cc1e3124957c9be8a454c58e92eb925cf4aa9823984bd01451c5b7e0fee99d1/raw/?filename=MAINTAINERS` :swh_web_browse:`content/sha1:1cb1447c1c7ddc1b03eac88398e40bd914d46b62/raw/` :swh_web_browse:`content/sha256:8ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b903/raw/?filename=COPYING` .. _highlightjs: https://highlightjs.org/ \ No newline at end of file diff --git a/docs/uri-scheme-browse-release.rst b/docs/uri-scheme-browse-release.rst index 608c1b50..ba895ff9 100644 --- a/docs/uri-scheme-browse-release.rst +++ b/docs/uri-scheme-browse-release.rst @@ -1,25 +1,25 @@ Release ^^^^^^^ .. http:get:: /browse/release/(sha1_git)/ HTML view that displays metadata associated to a SWH release: * the author * the release date * the release name * the associated message - * the type of target the release points to (revision, directory, content or relase) + * the type of target the release points to (revision, directory, content or release) * the link to browse the release target :param string sha1_git: hexadecimal representation for the *sha1_git* identifier of a SWH release :statuscode 200: no error :statuscode 404: requested release can not be found in the SWH archive **Examples:** .. parsed-literal:: :swh_web_browse:`release/208f61cc7a5dbc9879ae6e5c2f95891e270f09ef/` :swh_web_browse:`release/f883596e997fe5bcbc5e89bee01b869721326109/` diff --git a/swh/web/api/apiresponse.py b/swh/web/api/apiresponse.py index 709f900e..f2e9af46 100644 --- a/swh/web/api/apiresponse.py +++ b/swh/web/api/apiresponse.py @@ -1,180 +1,180 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import json from rest_framework.response import Response from swh.storage.exc import StorageDBError, StorageAPIError from swh.web.common.exc import NotFoundExc, ForbiddenExc from swh.web.common.utils import shorten_path, gen_path_info from swh.web.api import utils def compute_link_header(rv, options): """Add Link header in returned value results. Args: rv (dict): dictionary with keys: - headers: potential headers with 'link-next' and 'link-prev' keys - results: containing the result to return options (dict): the initial dict to update with result if any Returns: dict: dictionary with optional keys 'link-next' and 'link-prev' """ link_headers = [] if 'headers' not in rv: return {} rv_headers = rv['headers'] if 'link-next' in rv_headers: link_headers.append('<%s>; rel="next"' % ( rv_headers['link-next'])) if 'link-prev' in rv_headers: link_headers.append('<%s>; rel="previous"' % ( rv_headers['link-prev'])) if link_headers: link_header_str = ','.join(link_headers) headers = options.get('headers', {}) headers.update({ 'Link': link_header_str }) return headers return {} def filter_by_fields(request, data): """Extract a request parameter 'fields' if it exists to permit the filtering on he data dict's keys. If such field is not provided, returns the data as is. """ fields = request.query_params.get('fields') if fields: fields = set(fields.split(',')) data = utils.filter_field_keys(data, fields) return data def transform(rv): """Transform an eventual returned value with multiple layer of information with only what's necessary. If the returned value rv contains the 'results' key, this is the associated value which is returned. Otherwise, return the initial dict without the potential 'headers' key. """ if 'results' in rv: return rv['results'] if 'headers' in rv: rv.pop('headers') return rv def make_api_response(request, data, doc_data={}, options={}): """Generates an API response based on the requested mimetype. Args: request: a DRF Request object data: raw data to return in the API response doc_data: documentation data for HTML response - options: optionnal data that can be used to generate the response + options: optional data that can be used to generate the response Returns: a DRF Response a object """ if data: options['headers'] = compute_link_header(data, options) data = transform(data) data = filter_by_fields(request, data) doc_env = doc_data headers = {} if 'headers' in options: doc_env['headers_data'] = options['headers'] headers = options['headers'] # get request status code doc_env['status_code'] = options.get('status', 200) response_args = {'status': doc_env['status_code'], 'headers': headers, 'content_type': request.accepted_media_type} # when requesting HTML, typically when browsing the API through its # documented views, we need to enrich the input data with documentation # related ones and inform DRF that we request HTML template rendering if request.accepted_media_type == 'text/html': if data: data = json.dumps(data, sort_keys=True, indent=4, separators=(',', ': ')) doc_env['response_data'] = data doc_env['request'] = { 'path': request.path, 'method': request.method, 'absolute_uri': request.build_absolute_uri(), } doc_env['heading'] = shorten_path(str(request.path)) if 'route' in doc_env: doc_env['endpoint_path'] = gen_path_info(doc_env['route']) response_args['data'] = doc_env response_args['template_name'] = 'api/apidoc.html' # otherwise simply return the raw data and let DRF picks # the correct renderer (JSON or YAML) else: response_args['data'] = data return Response(**response_args) def error_response(request, error, doc_data): """Private function to create a custom error response. Args: request: a DRF Request object error: the exception that caused the error doc_data: documentation data for HTML response """ error_code = 400 if isinstance(error, NotFoundExc): error_code = 404 elif isinstance(error, ForbiddenExc): error_code = 403 elif isinstance(error, StorageDBError): error_code = 503 elif isinstance(error, StorageAPIError): error_code = 503 error_opts = {'status': error_code} error_data = { 'exception': error.__class__.__name__, 'reason': str(error), } return make_api_response(request, error_data, doc_data, options=error_opts) diff --git a/swh/web/api/views/revision.py b/swh/web/api/views/revision.py index d178661b..6b88589f 100644 --- a/swh/web/api/views/revision.py +++ b/swh/web/api/views/revision.py @@ -1,504 +1,504 @@ # Copyright (C) 2015-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.http import HttpResponse from swh.web.common import service from swh.web.common.utils import reverse from swh.web.common.utils import parse_timestamp from swh.web.api import utils from swh.web.api.apidoc import api_doc from swh.web.api.apiurls import api_route from swh.web.api.views.utils import api_lookup def _revision_directory_by(revision, path, request_path, limit=100, with_data=False): """ Compute the revision matching criterion's directory or content data. Args: revision: dictionary of criterions representing a revision to lookup path: directory's path to lookup request_path: request path which holds the original context to limit: optional query parameter to limit the revisions log (default to 100). For now, note that this limit could impede the transitivity conclusion about sha1_git not being an ancestor of with_data: indicate to retrieve the content's raw data if path resolves to a content. """ def enrich_directory_local(dir, context_url=request_path): return utils.enrich_directory(dir, context_url) rev_id, result = service.lookup_directory_through_revision( revision, path, limit=limit, with_data=with_data) content = result['content'] if result['type'] == 'dir': # dir_entries result['content'] = list(map(enrich_directory_local, content)) else: # content result['content'] = utils.enrich_content(content) return result @api_route(r'/revision/origin/(?P[0-9]+)' r'/branch/(?P.+)/log/', 'revision-origin-log') @api_route(r'/revision/origin/(?P[0-9]+)/log/', 'revision-origin-log') @api_route(r'/revision/origin/(?P[0-9]+)' r'/ts/(?P.+)/log/', 'revision-origin-log') @api_route(r'/revision/origin/(?P[0-9]+)' r'/branch/(?P.+)' r'/ts/(?P.+)/log/', 'revision-origin-log') @api_doc('/revision/origin/log/') def api_revision_log_by(request, origin_id, branch_name='refs/heads/master', ts=None): """ .. http:get:: /api/1/revision/origin/(origin_id)[/branch/(branch_name)][/ts/(timestamp)]/log Show the commit log for a revision, searching for it based on software origin, branch name, and/or visit timestamp. This endpoint behaves like :http:get:`/api/1/revision/(sha1_git)[/prev/(prev_sha1s)]/log/`, but operates on the revision that has been found at a given software origin, close to a given point in time, pointed by a given branch. :param int origin_id: a SWH origin identifier :param string branch_name: optional parameter specifying a fully-qualified branch name associated to the software origin, e.g., "refs/heads/master". Defaults to the master branch. :param string timestamp: optional parameter specifying a timestamp close to which the revision pointed by the given branch should be looked up. The timestamp can be expressed either as an ISO date or as a Unix one (in UTC). Defaults to now. :reqheader Accept: the requested response content type, either *application/json* (default) or *application/yaml* :resheader Content-Type: this depends on :http:header:`Accept` header of request :>jsonarr object author: information about the author of the revision :>jsonarr string author_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the author of the revision :>jsonarr object committer: information about the committer of the revision :>jsonarr string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the committer of the revision :>jsonarr string committer_date: ISO representation of the commit date (in UTC) :>jsonarr string date: ISO representation of the revision date (in UTC) :>jsonarr string directory: the unique identifier that revision points to :>jsonarr string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>jsonarr string id: the revision unique identifier :>jsonarr boolean merge: whether or not the revision corresponds to a merge commit :>jsonarr string message: the message associated to the revision :>jsonarr array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` - to get more informations about it + to get more information about it :>jsonarr string type: the type of the revision **Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options` :statuscode 200: no error :statuscode 404: no revision matching the given criteria could be found in the SWH archive **Example:** .. parsed-literal:: :swh_web_api:`revision/origin/723566/ts/2016-01-17T00:00:00+00:00/log/` """ # noqa result = {} per_page = int(request.query_params.get('per_page', '10')) if ts: ts = parse_timestamp(ts) def lookup_revision_log_by_with_limit(o_id, br, ts, limit=per_page+1): return service.lookup_revision_log_by(o_id, br, ts, limit) error_msg = 'No revision matching origin %s ' % origin_id error_msg += ', branch name %s' % branch_name error_msg += (' and time stamp %s.' % ts) if ts else '.' rev_get = api_lookup( lookup_revision_log_by_with_limit, origin_id, branch_name, ts, notfound_msg=error_msg, enrich_fn=utils.enrich_revision) nb_rev = len(rev_get) if nb_rev == per_page+1: revisions = rev_get[:-1] last_sha1_git = rev_get[-1]['id'] params = {k: v for k, v in {'origin_id': origin_id, 'branch_name': branch_name, 'ts': ts, }.items() if v is not None} query_params = {} query_params['sha1_git'] = last_sha1_git if request.query_params.get('per_page'): query_params['per_page'] = per_page result['headers'] = { 'link-next': reverse('revision-origin-log', kwargs=params, query_params=query_params) } else: revisions = rev_get result.update({'results': revisions}) return result @api_route(r'/revision/origin/(?P[0-9]+)/directory/', 'revision-directory') @api_route(r'/revision/origin/(?P[0-9]+)/directory/(?P.+)/', 'revision-directory') @api_route(r'/revision/origin/(?P[0-9]+)' r'/branch/(?P.+)/directory/', 'revision-directory') @api_route(r'/revision/origin/(?P[0-9]+)' r'/branch/(?P.+)/ts/(?P.+)/directory/', 'revision-directory') @api_route(r'/revision/origin/(?P[0-9]+)' r'/branch/(?P.+)/directory/(?P.+)/', 'revision-directory') @api_route(r'/revision/origin/(?P[0-9]+)' r'/branch/(?P.+)/ts/(?P.+)' r'/directory/(?P.+)/', 'revision-directory') @api_doc('/revision/origin/directory/', tags=['hidden']) def api_directory_through_revision_origin(request, origin_id, branch_name="refs/heads/master", ts=None, path=None, with_data=False): """ Display directory or content information through a revision identified by origin/branch/timestamp. """ if ts: ts = parse_timestamp(ts) return _revision_directory_by({'origin_id': origin_id, 'branch_name': branch_name, 'ts': ts }, path, request.path, with_data=with_data) @api_route(r'/revision/origin/(?P[0-9]+)/', 'revision-origin') @api_route(r'/revision/origin/(?P[0-9]+)' r'/branch/(?P.+)/', 'revision-origin') @api_route(r'/revision/origin/(?P[0-9]+)' r'/branch/(?P.+)/ts/(?P.+)/', 'revision-origin') @api_route(r'/revision/origin/(?P[0-9]+)/ts/(?P.+)/', 'revision-origin') @api_doc('/revision/origin/') def api_revision_with_origin(request, origin_id, branch_name="refs/heads/master", ts=None): """ .. http:get:: /api/1/revision/origin/(origin_id)/[branch/(branch_name)/][ts/(timestamp)/] Get information about a revision, searching for it based on software origin, branch name, and/or visit timestamp. This endpoint behaves like :http:get:`/api/1/revision/(sha1_git)/`, but operates on the revision that has been found at a given software origin, close to a given point in time, pointed by a given branch. :param int origin_id: a SWH origin identifier :param string branch_name: optional parameter specifying a fully-qualified branch name associated to the software origin, e.g., "refs/heads/master". Defaults to the master branch. :param string timestamp: optional parameter specifying a timestamp close to which the revision pointed by the given branch should be looked up. The timestamp can be expressed either as an ISO date or as a Unix one (in UTC). Defaults to now. :reqheader Accept: the requested response content type, either *application/json* (default) or *application/yaml* :resheader Content-Type: this depends on :http:header:`Accept` header of request :>json object author: information about the author of the revision :>json string author_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the author of the revision :>json object committer: information about the committer of the revision :>json string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the committer of the revision :>json string committer_date: ISO representation of the commit date (in UTC) :>json string date: ISO representation of the revision date (in UTC) :>json string directory: the unique identifier that revision points to :>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>json string id: the revision unique identifier :>json boolean merge: whether or not the revision corresponds to a merge commit :>json string message: the message associated to the revision :>json array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` - to get more informations about it + to get more information about it :>json string type: the type of the revision **Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options` :statuscode 200: no error :statuscode 404: no revision matching the given criteria could be found in the SWH archive **Example:** .. parsed-literal:: :swh_web_api:`revision/origin/13706355/branch/refs/heads/2.7/` """ # noqa ts = parse_timestamp(ts) return api_lookup( service.lookup_revision_by, origin_id, branch_name, ts, notfound_msg=('Revision with (origin_id: {}, branch_name: {}' ', ts: {}) not found.'.format(origin_id, branch_name, ts)), enrich_fn=utils.enrich_revision) @api_route(r'/revision/(?P[0-9a-f]+)/prev/(?P[0-9a-f/]+)/', 'revision-context') @api_doc('/revision/prev/', tags=['hidden']) def api_revision_with_context(request, sha1_git, context): """ Return information about revision with id sha1_git. """ def _enrich_revision(revision, context=context): return utils.enrich_revision(revision, context) return api_lookup( service.lookup_revision, sha1_git, notfound_msg='Revision with sha1_git %s not found.' % sha1_git, enrich_fn=_enrich_revision) @api_route(r'/revision/(?P[0-9a-f]+)/', 'revision') @api_doc('/revision/') def api_revision(request, sha1_git): """ .. http:get:: /api/1/revision/(sha1_git)/ Get information about a revision in the SWH archive. Revisions are identified by *sha1* checksums, compatible with Git commit identifiers. See :func:`swh.model.identifiers.revision_identifier` in our data model module for details about how they are computed. :param string sha1_git: hexadecimal representation of the revision *sha1_git* identifier :reqheader Accept: the requested response content type, either *application/json* (default) or *application/yaml* :resheader Content-Type: this depends on :http:header:`Accept` header of request :>json object author: information about the author of the revision :>json string author_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the author of the revision :>json object committer: information about the committer of the revision :>json string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the committer of the revision :>json string committer_date: ISO representation of the commit date (in UTC) :>json string date: ISO representation of the revision date (in UTC) :>json string directory: the unique identifier that revision points to :>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>json string id: the revision unique identifier :>json boolean merge: whether or not the revision corresponds to a merge commit :>json string message: the message associated to the revision :>json array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` - to get more informations about it + to get more information about it :>json string type: the type of the revision **Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options` :statuscode 200: no error :statuscode 400: an invalid *sha1_git* value has been provided :statuscode 404: requested revision can not be found in the SWH archive **Example:** .. parsed-literal:: :swh_web_api:`revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/` """ # noqa return api_lookup( service.lookup_revision, sha1_git, notfound_msg='Revision with sha1_git {} not found.'.format(sha1_git), enrich_fn=utils.enrich_revision) @api_route(r'/revision/(?P[0-9a-f]+)/raw/', 'revision-raw-message') @api_doc('/revision/raw/', tags=['hidden'], handle_response=True) def api_revision_raw_message(request, sha1_git): """Return the raw data of the message of revision identified by sha1_git """ raw = service.lookup_revision_message(sha1_git) response = HttpResponse(raw['message'], content_type='application/octet-stream') response['Content-disposition'] = \ 'attachment;filename=rev_%s_raw' % sha1_git return response @api_route(r'/revision/(?P[0-9a-f]+)/directory/', 'revision-directory') @api_route(r'/revision/(?P[0-9a-f]+)/directory/(?P.+)/', 'revision-directory') @api_doc('/revision/directory/') def api_revision_directory(request, sha1_git, dir_path=None, with_data=False): """ .. http:get:: /api/1/revision/(sha1_git)/directory/[(path)/] Get information about directory (entry) objects associated to revisions. Each revision is associated to a single "root" directory. This endpoint behaves like :http:get:`/api/1/directory/(sha1_git)/[(path)/]`, but operates on the root directory associated to a given revision. :param string sha1_git: hexadecimal representation of the revision *sha1_git* identifier :param string path: optional parameter to get information about the directory entry pointed by that relative path :reqheader Accept: the requested response content type, either *application/json* (default) or *application/yaml* :resheader Content-Type: this depends on :http:header:`Accept` header of request :>json array content: directory entries as returned by :http:get:`/api/1/directory/(sha1_git)/[(path)/]` :>json string path: path of directory from the revision root one :>json string revision: the unique revision identifier :>json string type: the type of the directory **Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options` :statuscode 200: no error :statuscode 400: an invalid *sha1_git* value has been provided :statuscode 404: requested revision can not be found in the SWH archive **Example:** .. parsed-literal:: :swh_web_api:`revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/directory/` """ # noqa return _revision_directory_by({'sha1_git': sha1_git}, dir_path, request.path, with_data=with_data) @api_route(r'/revision/(?P[0-9a-f]+)/log/', 'revision-log') @api_route(r'/revision/(?P[0-9a-f]+)' r'/prev/(?P[0-9a-f/]+)/log/', 'revision-log') @api_doc('/revision/log/') def api_revision_log(request, sha1_git, prev_sha1s=None): """ .. http:get:: /api/1/revision/(sha1_git)[/prev/(prev_sha1s)]/log/ Get a list of all SWH revisions heading to a given one, in other words show the commit log. :param string sha1_git: hexadecimal representation of the revision *sha1_git* identifier :param string prev_sha1s: optional parameter representing the navigation breadcrumbs (descendant revisions previously visited). If multiple values, use / as delimiter. If provided, revisions information will be added at the beginning of the returned list. :query int per_page: number of elements in the returned list, for pagination purpose :reqheader Accept: the requested response content type, either *application/json* (default) or *application/yaml* :resheader Content-Type: this depends on :http:header:`Accept` header of request :resheader Link: indicates that a subsequent result page is available and contains the url pointing to it :>jsonarr object author: information about the author of the revision :>jsonarr string author_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the author of the revision :>jsonarr object committer: information about the committer of the revision :>jsonarr string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the committer of the revision :>jsonarr string committer_date: ISO representation of the commit date (in UTC) :>jsonarr string date: ISO representation of the revision date (in UTC) :>jsonarr string directory: the unique identifier that revision points to :>jsonarr string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>jsonarr string id: the revision unique identifier :>jsonarr boolean merge: whether or not the revision corresponds to a merge commit :>jsonarr string message: the message associated to the revision :>jsonarr array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` - to get more informations about it + to get more information about it :>jsonarr string type: the type of the revision **Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options` :statuscode 200: no error :statuscode 400: an invalid *sha1_git* value has been provided :statuscode 404: requested revision can not be found in the SWH archive **Example:** .. parsed-literal:: :swh_web_api:`revision/e1a315fa3fa734e2a6154ed7b5b9ae0eb8987aad/log/` """ # noqa result = {} per_page = int(request.query_params.get('per_page', '10')) def lookup_revision_log_with_limit(s, limit=per_page+1): return service.lookup_revision_log(s, limit) error_msg = 'Revision with sha1_git %s not found.' % sha1_git rev_get = api_lookup(lookup_revision_log_with_limit, sha1_git, notfound_msg=error_msg, enrich_fn=utils.enrich_revision) nb_rev = len(rev_get) if nb_rev == per_page+1: rev_backward = rev_get[:-1] new_last_sha1 = rev_get[-1]['id'] query_params = {} if request.query_params.get('per_page'): query_params['per_page'] = per_page result['headers'] = { 'link-next': reverse('revision-log', kwargs={'sha1_git': new_last_sha1}, query_params=query_params) } else: rev_backward = rev_get if not prev_sha1s: # no nav breadcrumbs, so we're done revisions = rev_backward else: rev_forward_ids = prev_sha1s.split('/') rev_forward = api_lookup( service.lookup_revision_multiple, rev_forward_ids, notfound_msg=error_msg, enrich_fn=utils.enrich_revision) revisions = rev_forward + rev_backward result.update({ 'results': revisions }) return result diff --git a/swh/web/api/views/stat.py b/swh/web/api/views/stat.py index 0c6de479..3c3884e0 100644 --- a/swh/web/api/views/stat.py +++ b/swh/web/api/views/stat.py @@ -1,55 +1,55 @@ # Copyright (C) 2015-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from swh.web.common import service from swh.web.api.apidoc import api_doc from swh.web.api.apiurls import api_route @api_route(r'/stat/counters/', 'stat-counters') @api_doc('/stat/counters/', noargs=True) def api_stats(request): """ .. http:get:: /api/1/stat/counters/ Get statistics about the content of the archive. :>json number content: current number of content objects (aka files) in the SWH archive :>json number directory: current number of directory objects in the SWH archive :>json number directory_entry_dir: current number of SWH directory entries pointing to others SWH directories in the SWH archive :>json number directory_entry_file: current number of SWH directory entries pointing to SWH content objects in the SWH archive :>json number directory_entry_rev: current number of SWH directory entries pointing to SWH revision objects (e.g. git submodules) in the SWH archive :>json number entity: current number of SWH entities (a SWH entity is either a *group_of_entities*, a *group_of_persons*, a *project*, a *person*, an *organization*, or a *hosting* service) in the SWH archive :>json number occurrence: current number of SWH occurrences (an occurrence may be assimilated to a branch found during a SWH crawl of a repository) in the SWH archive :>json number origin: current number of SWH origins (an origin is a "place" where code source can be found, e.g. a git repository, a tarball, ...) in the SWH archive - :>json number person: current number of SWH persons (code source authors or commiters) + :>json number person: current number of SWH persons (code source authors or committers) in the SWH archive :>json number release: current number of SWH releases objects in the SWH archive :>json number revision: current number of SWH revision objects (aka commits) in the SWH archive :>json number skipped_content: current number of content objects (aka files) which where not inserted in the SWH archive :reqheader Accept: the requested response content type, either *application/json* (default) or *application/yaml* :resheader Content-Type: this depends on :http:header:`Accept` header of request **Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options` :statuscode 200: no error **Example:** .. parsed-literal:: :swh_web_api:`stat/counters/` """ # noqa return service.stat_counters() diff --git a/swh/web/browse/utils.py b/swh/web/browse/utils.py index 70d658d9..ce2074da 100644 --- a/swh/web/browse/utils.py +++ b/swh/web/browse/utils.py @@ -1,1093 +1,1093 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information import base64 import magic import math import pypandoc import stat from django.core.cache import cache from django.utils.safestring import mark_safe from importlib import reload from swh.web.common import highlightjs, service from swh.web.common.exc import NotFoundExc, http_status_code_message from swh.web.common.utils import ( reverse, format_utc_iso_date, parse_timestamp, get_origin_visits, get_swh_persistent_id ) from swh.web.config import get_config def get_directory_entries(sha1_git): """Function that retrieves the content of a SWH directory from the SWH archive. The directories entries are first sorted in lexicographical order. Sub-directories and regular files are then extracted. Args: sha1_git: sha1_git identifier of the directory Returns: A tuple whose first member corresponds to the sub-directories list and second member the regular files list Raises: NotFoundExc if the directory is not found """ cache_entry_id = 'directory_entries_%s' % sha1_git cache_entry = cache.get(cache_entry_id) if cache_entry: return cache_entry entries = list(service.lookup_directory(sha1_git)) entries = sorted(entries, key=lambda e: e['name']) for entry in entries: entry['perms'] = stat.filemode(entry['perms']) dirs = [e for e in entries if e['type'] == 'dir'] files = [e for e in entries if e['type'] == 'file'] cache.set(cache_entry_id, (dirs, files)) return dirs, files def get_mimetype_and_encoding_for_content(content): """Function that returns the mime type and the encoding associated to a content buffer using the magic module under the hood. Args: content (bytes): a content buffer Returns: A tuple (mimetype, encoding), for instance ('text/plain', 'us-ascii'), associated to the provided content. """ while True: try: magic_result = magic.detect_from_content(content) mime_type = magic_result.mime_type encoding = magic_result.encoding break except Exception as exc: # workaround an issue with the magic module who can fail # if detect_from_content is called multiple times in # a short amount of time reload(magic) return mime_type, encoding # maximum authorized content size in bytes for HTML display # with code highlighting content_display_max_size = get_config()['content_display_max_size'] def request_content(query_string, max_size=content_display_max_size, raise_if_unavailable=True, reencode=True): """Function that retrieves a SWH content from the SWH archive. Raw bytes content is first retrieved, then the content mime type. If the mime type is not stored in the archive, it will be computed using Python magic module. Args: query_string: a string of the form "[ALGO_HASH:]HASH" where optional ALGO_HASH can be either *sha1*, *sha1_git*, *sha256*, or *blake2s256* (default to *sha1*) and HASH the hexadecimal representation of the hash value max_size: the maximum size for a content to retrieve (default to 1MB, no size limit if None) Returns: A tuple whose first member corresponds to the content raw bytes and second member the content mime type Raises: NotFoundExc if the content is not found """ content_data = service.lookup_content(query_string) filetype = None language = None license = None # requests to the indexer db may fail so properly handle # those cases in order to avoid content display errors try: filetype = service.lookup_content_filetype(query_string) language = service.lookup_content_language(query_string) license = service.lookup_content_license(query_string) except Exception as e: pass mimetype = 'unknown' encoding = 'unknown' if filetype: mimetype = filetype['mimetype'] encoding = filetype['encoding'] content_data['error_code'] = 200 content_data['error_message'] = '' content_data['error_description'] = '' if not max_size or content_data['length'] < max_size: try: content_raw = service.lookup_content_raw(query_string) except Exception as e: if raise_if_unavailable: raise e else: content_data['raw_data'] = None content_data['error_code'] = 404 content_data['error_description'] = \ 'The bytes of the content are currently not available in the archive.' # noqa content_data['error_message'] = \ http_status_code_message[content_data['error_code']] else: content_data['raw_data'] = content_raw['data'] if not filetype: mimetype, encoding = \ get_mimetype_and_encoding_for_content(content_data['raw_data']) # noqa # encode textual content to utf-8 if needed if reencode and mimetype.startswith('text/'): - # probably a malformed UTF-8 content, reencode it + # probably a malformed UTF-8 content, re-encode it # by replacing invalid chars with a substitution one if encoding == 'unknown-8bit': content_data['raw_data'] = \ content_data['raw_data'].decode('utf-8', 'replace')\ .encode('utf-8') elif 'ascii' not in encoding and encoding not in ['utf-8', 'binary']: # noqa content_data['raw_data'] = \ content_data['raw_data'].decode(encoding, 'replace')\ .encode('utf-8') elif reencode and mimetype.startswith('application/octet-stream'): # file may detect an iso-8859-* encoded content as binary # so try to decode it for display encodings = ['iso-8859-%s' % i for i in range(1, 17)] for encoding in encodings: try: content_data['raw_data'] = \ content_data['raw_data'].decode(encoding)\ .encode('utf-8') except Exception: pass else: break else: content_data['raw_data'] = None content_data['mimetype'] = mimetype content_data['encoding'] = encoding if language: content_data['language'] = language['lang'] else: content_data['language'] = 'not detected' if license: content_data['licenses'] = ', '.join(license['facts'][0]['licenses']) else: content_data['licenses'] = 'not detected' return content_data _browsers_supported_image_mimes = set(['image/gif', 'image/png', 'image/jpeg', 'image/bmp', 'image/webp', 'image/svg', 'image/svg+xml']) def prepare_content_for_display(content_data, mime_type, path): """Function that prepares a content for HTML display. The function tries to associate a programming language to a content in order to perform syntax highlighting client-side using highlightjs. The language is determined using either the content filename or its mime type. If the mime type corresponds to an image format supported by web browsers, the content will be encoded in base64 for displaying the image. Args: content_data (bytes): raw bytes of the content mime_type (string): mime type of the content path (string): path of the content including filename Returns: A dict containing the content bytes (possibly different from the one provided as parameter if it is an image) under the key 'content_data and the corresponding highlightjs language class under the key 'language'. """ language = highlightjs.get_hljs_language_from_filename(path) if not language: language = highlightjs.get_hljs_language_from_mime_type(mime_type) if not language: language = 'nohighlight-swh' elif mime_type.startswith('application/'): mime_type = mime_type.replace('application/', 'text/') if mime_type.startswith('image/'): if mime_type in _browsers_supported_image_mimes: content_data = base64.b64encode(content_data) else: content_data = None if mime_type.startswith('image/svg'): mime_type = 'image/svg+xml' return {'content_data': content_data, 'language': language, 'mimetype': mime_type} def get_origin_visit(origin_info, visit_ts=None, visit_id=None, snapshot_id=None): """Function that returns information about a SWH visit for a given origin. The visit is retrieved from a provided timestamp. The closest visit from that timestamp is selected. Args: origin_info (dict): a dict filled with origin information (id, url, type) visit_ts (int or str): an ISO date string or Unix timestamp to parse Returns: A dict containing the visit info as described below:: {'origin': 2, 'date': '2017-10-08T11:54:25.582463+00:00', 'metadata': {}, 'visit': 25, 'status': 'full'} """ visits = get_origin_visits(origin_info) if not visits: raise NotFoundExc('No SWH visit associated to origin with' ' type %s and url %s!' % (origin_info['type'], origin_info['url'])) if snapshot_id: visit = [v for v in visits if v['snapshot'] == snapshot_id] if len(visit) == 0: raise NotFoundExc( 'Visit for snapshot with id %s for origin with type %s' ' and url %s not found!' % (snapshot_id, origin_info['type'], origin_info['url'])) return visit[0] if visit_id: visit = [v for v in visits if v['visit'] == int(visit_id)] if len(visit) == 0: raise NotFoundExc( 'Visit with id %s for origin with type %s' ' and url %s not found!' % (visit_id, origin_info['type'], origin_info['url'])) return visit[0] if not visit_ts: # returns the latest full visit when no timestamp is provided for v in reversed(visits): if v['status'] == 'full': return v return visits[-1] parsed_visit_ts = math.floor(parse_timestamp(visit_ts).timestamp()) visit_idx = None for i, visit in enumerate(visits): ts = math.floor(parse_timestamp(visit['date']).timestamp()) if i == 0 and parsed_visit_ts <= ts: return visit elif i == len(visits) - 1: if parsed_visit_ts >= ts: return visit else: next_ts = math.floor( parse_timestamp(visits[i+1]['date']).timestamp()) if parsed_visit_ts >= ts and parsed_visit_ts < next_ts: if (parsed_visit_ts - ts) < (next_ts - parsed_visit_ts): visit_idx = i break else: visit_idx = i+1 break if visit_idx: visit = visits[visit_idx] while visit_idx < len(visits) - 1 and \ visit['date'] == visits[visit_idx+1]['date']: visit_idx = visit_idx + 1 visit = visits[visit_idx] return visit else: raise NotFoundExc( 'Visit with timestamp %s for origin with type %s and url %s not found!' % # noqa (visit_ts, origin_info['type'], origin_info['url'])) def get_snapshot_content(snapshot_id): """Returns the lists of branches and releases associated to a swh snapshot. That list is put in cache in order to speedup the navigation in the swh-web/browse ui. Args: snapshot_id (str): hexadecimal representation of the snapshot identifier Returns: A tuple with two members. The first one is a list of dict describing the snapshot branches. The second one is a list of dict describing the snapshot releases. Raises: NotFoundExc if the snapshot does not exist """ cache_entry_id = 'swh_snapshot_%s' % snapshot_id cache_entry = cache.get(cache_entry_id) if cache_entry: return cache_entry['branches'], cache_entry['releases'] branches = [] releases = [] if snapshot_id: revision_ids = [] releases_ids = [] snapshot = service.lookup_snapshot(snapshot_id) snapshot_branches = snapshot['branches'] for key in sorted(snapshot_branches.keys()): if not snapshot_branches[key]: continue if snapshot_branches[key]['target_type'] == 'revision': branches.append({'name': key, 'revision': snapshot_branches[key]['target']}) revision_ids.append(snapshot_branches[key]['target']) elif snapshot_branches[key]['target_type'] == 'release': releases_ids.append(snapshot_branches[key]['target']) releases_info = service.lookup_release_multiple(releases_ids) for release in releases_info: releases.append({'name': release['name'], 'date': format_utc_iso_date(release['date']), 'id': release['id'], 'message': release['message'], 'target_type': release['target_type'], 'target': release['target']}) revision_ids.append(release['target']) revisions = service.lookup_revision_multiple(revision_ids) branches_to_remove = [] for idx, revision in enumerate(revisions): if idx < len(branches): if revision: branches[idx]['directory'] = revision['directory'] branches[idx]['date'] = format_utc_iso_date(revision['date']) # noqa branches[idx]['message'] = revision['message'] else: branches_to_remove.append(branches[idx]) else: rel_idx = idx - len(branches) if revision: releases[rel_idx]['directory'] = revision['directory'] for b in branches_to_remove: branches.remove(b) cache.set(cache_entry_id, {'branches': branches, 'releases': releases}) return branches, releases def get_origin_visit_snapshot(origin_info, visit_ts=None, visit_id=None, snapshot_id=None): """Returns the lists of branches and releases associated to a swh origin for a given visit. The visit is expressed by a timestamp. In the latter case, the closest visit from the provided timestamp will be used. If no visit parameter is provided, it returns the list of branches found for the latest visit. That list is put in cache in order to speedup the navigation in the swh-web/browse ui. Args: origin_info (dict): a dict filled with origin information (id, url, type) visit_ts (int or str): an ISO date string or Unix timestamp to parse visit_id (int): optional visit id for desambiguation in case several visits have the same timestamp Returns: A tuple with two members. The first one is a list of dict describing the origin branches for the given visit. The second one is a list of dict describing the origin releases for the given visit. Raises: NotFoundExc if the origin or its visit are not found """ visit_info = get_origin_visit(origin_info, visit_ts, visit_id, snapshot_id) return get_snapshot_content(visit_info['snapshot']) def gen_link(url, link_text=None, link_attrs={}): """ Utility function for generating an HTML link to insert in Django templates. Args: url (str): an url link_text (str): optional text for the produced link, if not provided the url will be used link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ attrs = ' ' for k, v in link_attrs.items(): attrs += '%s="%s" ' % (k, v) if not link_text: link_text = url link = '%s' % (attrs, url, link_text) return mark_safe(link) def gen_person_link(person_id, person_name, snapshot_context=None, link_attrs={}): """ Utility function for generating a link to a SWH person HTML view to insert in Django templates. Args: person_id (int): a SWH person id person_name (str): the associated person name link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'person_name' """ query_params = None if snapshot_context and snapshot_context['origin_info']: origin_info = snapshot_context['origin_info'] query_params = {'origin_type': origin_info['type'], 'origin': origin_info['url']} if 'timestamp' in snapshot_context['url_args']: query_params['timestamp'] = \ snapshot_context['url_args']['timestamp'] if 'visit_id' in snapshot_context['query_params']: query_params['visit_id'] = \ snapshot_context['query_params']['visit_id'] elif snapshot_context: query_params = {'snapshot_id': snapshot_context['snapshot_id']} person_url = reverse('browse-person', kwargs={'person_id': person_id}, query_params=query_params) return gen_link(person_url, person_name, link_attrs) def gen_revision_link(revision_id, shorten_id=False, snapshot_context=None, link_text=None, link_attrs={}): """ Utility function for generating a link to a SWH revision HTML view to insert in Django templates. Args: revision_id (str): a SWH revision id - shorten_id (boolean): wheter to shorten the revision id to 7 + shorten_id (boolean): whether to shorten the revision id to 7 characters for the link text snapshot_context (dict): if provided, generate snapshot-dependent browsing link link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'revision_id' """ query_params = None if snapshot_context and snapshot_context['origin_info']: origin_info = snapshot_context['origin_info'] origin_type = snapshot_context['origin_type'] query_params = {'origin_type': origin_type, 'origin': origin_info['url']} if 'timestamp' in snapshot_context['url_args']: query_params['timestamp'] = \ snapshot_context['url_args']['timestamp'] if 'visit_id' in snapshot_context['query_params']: query_params['visit_id'] = \ snapshot_context['query_params']['visit_id'] elif snapshot_context: query_params = {'snapshot_id': snapshot_context['snapshot_id']} revision_url = reverse('browse-revision', kwargs={'sha1_git': revision_id}, query_params=query_params) if shorten_id: return gen_link(revision_url, revision_id[:7], link_attrs) else: if not link_text: link_text = revision_id return gen_link(revision_url, link_text, link_attrs) def gen_origin_link(origin_info, link_attrs={}): """ Utility function for generating a link to a SWH origin HTML view to insert in Django templates. Args: origin_info (dict): a dicted filled with origin information (id, type, url) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'Origin: origin_url' """ # noqa origin_browse_url = reverse('browse-origin', kwargs={'origin_type': origin_info['type'], 'origin_url': origin_info['url']}) return gen_link(origin_browse_url, 'Origin: ' + origin_info['url'], link_attrs) def gen_directory_link(sha1_git, link_text=None, link_attrs={}): """ Utility function for generating a link to a SWH directory HTML view to insert in Django templates. Args: sha1_git (str): directory identifier link_text (str): optional text for the generated link (the generated url will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ directory_url = reverse('browse-directory', kwargs={'sha1_git': sha1_git}) if not link_text: link_text = directory_url return gen_link(directory_url, link_text, link_attrs) def gen_snapshot_link(snapshot_id, link_text=None, link_attrs={}): """ Utility function for generating a link to a SWH snapshot HTML view to insert in Django templates. Args: snapshot_id (str): snapshot identifier link_text (str): optional text for the generated link (the generated url will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ snapshot_url = reverse('browse-snapshot', kwargs={'snapshot_id': snapshot_id}) if not link_text: link_text = snapshot_url return gen_link(snapshot_url, link_text, link_attrs) def gen_snapshot_directory_link(snapshot_context, revision_id=None, link_text=None, link_attrs={}): """ Utility function for generating a link to a SWH directory HTML view in the context of a snapshot to insert in Django templates. Args: snapshot_context (dict): the snapshot information revision_id (str): optional revision identifier in order to use the associated directory link_text (str): optional text to use for the generated link link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'origin_directory_view_url' """ query_params = {'revision': revision_id} if snapshot_context['origin_info']: origin_info = snapshot_context['origin_info'] url_args = {'origin_type': origin_info['type'], 'origin_url': origin_info['url']} if 'timestamp' in snapshot_context['url_args']: url_args['timestamp'] = \ snapshot_context['url_args']['timestamp'] if 'visit_id' in snapshot_context['query_params']: query_params['visit_id'] = \ snapshot_context['query_params']['visit_id'] directory_url = reverse('browse-origin-directory', kwargs=url_args, query_params=query_params) else: url_args = {'snapshot_id': snapshot_context['snapshot_id']} directory_url = reverse('browse-snapshot-directory', kwargs=url_args, query_params=query_params) if not link_text: link_text = directory_url return gen_link(directory_url, link_text, link_attrs) def gen_content_link(sha1_git, link_text=None, link_attrs={}): """ Utility function for generating a link to a SWH content HTML view to insert in Django templates. Args: sha1_git (str): content identifier link_text (str): optional text for the generated link (the generated url will be used by default) link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ content_url = reverse('browse-content', kwargs={'query_string': 'sha1_git:' + sha1_git}) if not link_text: link_text = content_url return gen_link(content_url, link_text, link_attrs) def get_revision_log_url(revision_id, snapshot_context=None): """ Utility function for getting the URL for a SWH revision log HTML view (possibly in the context of an origin). Args: revision_id (str): revision identifier the history heads to snapshot_context (dict): if provided, generate snapshot-dependent browsing link Returns: The SWH revision log view URL """ query_params = {'revision': revision_id} if snapshot_context and snapshot_context['origin_info']: origin_info = snapshot_context['origin_info'] url_args = {'origin_type': origin_info['type'], 'origin_url': origin_info['url']} if 'timestamp' in snapshot_context['url_args']: url_args['timestamp'] = \ snapshot_context['url_args']['timestamp'] if 'visit_id' in snapshot_context['query_params']: query_params['visit_id'] = \ snapshot_context['query_params']['visit_id'] revision_log_url = reverse('browse-origin-log', kwargs=url_args, query_params=query_params) elif snapshot_context: url_args = {'snapshot_id': snapshot_context['snapshot_id']} revision_log_url = reverse('browse-snapshot-log', kwargs=url_args, query_params=query_params) else: revision_log_url = reverse('browse-revision-log', kwargs={'sha1_git': revision_id}) return revision_log_url def gen_revision_log_link(revision_id, snapshot_context=None, link_text=None, link_attrs={}): """ Utility function for generating a link to a SWH revision log HTML view (possibly in the context of an origin) to insert in Django templates. Args: revision_id (str): revision identifier the history heads to snapshot_context (dict): if provided, generate snapshot-dependent browsing link link_text (str): optional text to use for the generated link link_attrs (dict): optional attributes (e.g. class) to add to the link Returns: An HTML link in the form 'link_text' """ revision_log_url = get_revision_log_url(revision_id, snapshot_context) if not link_text: link_text = revision_log_url return gen_link(revision_log_url, link_text, link_attrs) def _format_log_entries(revision_log, per_page, snapshot_context=None): revision_log_data = [] for i, log in enumerate(revision_log): if i == per_page: break author_name = 'None' author_link = 'None' if log['author']: author_name = log['author']['name'] or log['author']['fullname'] author_link = gen_person_link(log['author']['id'], author_name, snapshot_context) revision_log_data.append( {'author': author_link, 'revision': gen_revision_link(log['id'], True, snapshot_context), 'message': log['message'], 'date': format_utc_iso_date(log['date']), 'directory': log['directory']}) return revision_log_data def prepare_revision_log_for_display(revision_log, per_page, revs_breadcrumb, snapshot_context=None): """ Utility functions that process raw revision log data for HTML display. Its purpose is to: * add links to relevant SWH browse views * format date in human readable format * truncate the message log It also computes the data needed to generate the links for navigating back and forth in the history log. Args: revision_log (list): raw revision log as returned by the SWH web api per_page (int): number of log entries per page revs_breadcrumb (str): breadcrumbs of revisions navigated so far, in the form 'rev1[/rev2/../revN]'. Each revision corresponds to the first one displayed in the HTML view for history log. snapshot_context (dict): if provided, generate snapshot-dependent browsing link """ current_rev = revision_log[0]['id'] next_rev = None prev_rev = None next_revs_breadcrumb = None prev_revs_breadcrumb = None if len(revision_log) == per_page + 1: prev_rev = revision_log[-1]['id'] prev_rev_bc = current_rev if snapshot_context: prev_rev_bc = prev_rev if revs_breadcrumb: revs = revs_breadcrumb.split('/') next_rev = revs[-1] if len(revs) > 1: next_revs_breadcrumb = '/'.join(revs[:-1]) if len(revision_log) == per_page + 1: prev_revs_breadcrumb = revs_breadcrumb + '/' + prev_rev_bc else: prev_revs_breadcrumb = prev_rev_bc return {'revision_log_data': _format_log_entries(revision_log, per_page, snapshot_context), 'prev_rev': prev_rev, 'prev_revs_breadcrumb': prev_revs_breadcrumb, 'next_rev': next_rev, 'next_revs_breadcrumb': next_revs_breadcrumb} # list of origin types that can be found in the swh archive # TODO: retrieve it dynamically in an efficient way instead # of hardcoding it _swh_origin_types = ['git', 'svn', 'deb', 'hg', 'ftp', 'deposit'] def get_origin_info(origin_url, origin_type=None): """ Get info about a SWH origin. Its main purpose is to automatically find an origin type when it is not provided as parameter. Args: origin_url (str): complete url of a SWH origin - origin_type (str): optionnal origin type + origin_type (str): optional origin type Returns: A dict with the following entries: * type: the origin type * url: the origin url * id: the SWH internal id of the origin """ if origin_type: return service.lookup_origin({'type': origin_type, 'url': origin_url}) else: for origin_type in _swh_origin_types: try: origin_info = service.lookup_origin({'type': origin_type, 'url': origin_url}) return origin_info except Exception: pass raise NotFoundExc('Origin with url %s not found!' % origin_url) def get_snapshot_context(snapshot_id=None, origin_type=None, origin_url=None, timestamp=None, visit_id=None): """ Utility function to compute relevant information when navigating the SWH archive in a snapshot context. The snapshot is either referenced by its id or it will be retrieved from an origin visit. Args: snapshot_id (str): hexadecimal representation of a snapshot identifier, all other parameters will be ignored if it is provided origin_type (str): the origin type (git, svn, deposit, ...) origin_url (str): the origin_url (e.g. https://github.com/(user)/(repo)/) timestamp (str): a datetime string for retrieving the closest SWH visit of the origin visit_id (int): optional visit id for disambiguation in case of several visits with the same timestamp Returns: A dict with the following entries: * origin_info: dict containing origin information * visit_info: dict containing SWH visit information * branches: the list of branches for the origin found during the visit * releases: the list of releases for the origin found during the visit * origin_browse_url: the url to browse the origin * origin_branches_url: the url to browse the origin branches * origin_releases_url': the url to browse the origin releases * origin_visit_url: the url to browse the snapshot of the origin found during the visit - * url_args: dict containg url arguments to use when browsing in + * url_args: dict containing url arguments to use when browsing in the context of the origin and its visit """ # noqa origin_info = None visit_info = None url_args = None query_params = {} branches = [] releases = [] browse_url = None visit_url = None branches_url = None releases_url = None swh_type = 'snapshot' if origin_url: swh_type = 'origin' origin_info = get_origin_info(origin_url, origin_type) visit_info = get_origin_visit(origin_info, timestamp, visit_id, snapshot_id) visit_info['fmt_date'] = format_utc_iso_date(visit_info['date']) snapshot_id = visit_info['snapshot'] # provided timestamp is not necessarily equals to the one # of the retrieved visit, so get the exact one in order # use it in the urls generated below if timestamp: timestamp = visit_info['date'] branches, releases = \ get_origin_visit_snapshot(origin_info, timestamp, visit_id, snapshot_id) url_args = {'origin_type': origin_type, 'origin_url': origin_info['url']} query_params = {'visit_id': visit_id} browse_url = reverse('browse-origin-visits', kwargs=url_args) if timestamp: url_args['timestamp'] = format_utc_iso_date(timestamp, '%Y-%m-%dT%H:%M:%S') visit_url = reverse('browse-origin-directory', kwargs=url_args, query_params=query_params) visit_info['url'] = visit_url branches_url = reverse('browse-origin-branches', kwargs=url_args, query_params=query_params) releases_url = reverse('browse-origin-releases', kwargs=url_args, query_params=query_params) elif snapshot_id: branches, releases = get_snapshot_content(snapshot_id) url_args = {'snapshot_id': snapshot_id} browse_url = reverse('browse-snapshot', kwargs=url_args) branches_url = reverse('browse-snapshot-branches', kwargs=url_args) releases_url = reverse('browse-snapshot-releases', kwargs=url_args) releases = list(reversed(releases)) return { 'swh_type': swh_type, 'snapshot_id': snapshot_id, 'origin_info': origin_info, # keep track if the origin type was provided as url argument 'origin_type': origin_type, 'visit_info': visit_info, 'branches': branches, 'releases': releases, 'branch': None, 'release': None, 'browse_url': browse_url, 'branches_url': branches_url, 'releases_url': releases_url, 'url_args': url_args, 'query_params': query_params } # list of common readme names ordered by preference # (lower indices have higher priority) _common_readme_names = [ "readme.markdown", "readme.md", "readme.rst", "readme.txt", "readme" ] def get_readme_to_display(readmes): """ Process a list of readme files found in a directory in order to find the adequate one to display. Args: readmes: a list of dict where keys are readme file names and values are readme sha1s Returns: A tuple (readme_name, readme_sha1) """ readme_name = None readme_url = None readme_sha1 = None readme_html = None lc_readmes = {k.lower(): {'orig_name': k, 'sha1': v} for k, v in readmes.items()} # look for readme names according to the preference order # defined by the _common_readme_names list for common_readme_name in _common_readme_names: if common_readme_name in lc_readmes: readme_name = lc_readmes[common_readme_name]['orig_name'] readme_sha1 = lc_readmes[common_readme_name]['sha1'] readme_url = reverse('browse-content-raw', kwargs={'query_string': readme_sha1}) break # otherwise pick the first readme like file if any if not readme_name and len(readmes.items()) > 0: readme_name = next(iter(readmes)) readme_sha1 = readmes[readme_name] readme_url = reverse('browse-content-raw', kwargs={'query_string': readme_sha1}) # convert rst README to html server side as there is # no viable solution to perform that task client side if readme_name and readme_name.endswith('.rst'): cache_entry_id = 'readme_%s' % readme_sha1 cache_entry = cache.get(cache_entry_id) if cache_entry: readme_html = cache_entry else: try: rst_doc = request_content(readme_sha1) readme_html = pypandoc.convert_text(rst_doc['raw_data'], 'html', format='rst') cache.set(cache_entry_id, readme_html) except Exception: readme_html = 'Readme bytes are not available' return readme_name, readme_url, readme_html def get_swh_persistent_ids(swh_objects, snapshot_context=None): """ Returns a list of dict containing info related to persistent identifiers of swh objects. Args: swh_objects (list): a list of dict with the following keys: * type: swh object type (content/directory/release/revision/snapshot) * id: swh object id snapshot_context (dict): optional parameter describing the snapshot in which the object has been found Returns: list: a list of dict with the following keys: * object_type: the swh object type (content/directory/release/revision/snapshot) * object_icon: the swh object icon to use in HTML views * swh_id: the computed swh object persistent identifier * swh_id_url: the url resolving the persistent identifier * show_options: boolean indicating if the persistent id options must be displayed in persistent ids HTML view """ # noqa swh_ids = [] for swh_object in swh_objects: swh_id = get_swh_persistent_id(swh_object['type'], swh_object['id']) show_options = swh_object['type'] == 'content' or \ (snapshot_context and snapshot_context['origin_info'] is not None) object_icon = mark_safe('') if swh_object['type'] == 'directory': object_icon = mark_safe('') elif swh_object['type'] == 'release': object_icon = mark_safe('') elif swh_object['type'] == 'revision': object_icon = mark_safe('') # noqa elif swh_object['type'] == 'snapshot': object_icon = mark_safe('') swh_ids.append({ 'object_type': swh_object['type'], 'object_icon': object_icon, 'swh_id': swh_id, 'swh_id_url': reverse('browse-swh-id', kwargs={'swh_id': swh_id}), 'show_options': show_options }) return swh_ids diff --git a/swh/web/common/highlightjs.py b/swh/web/common/highlightjs.py index bfc8b489..feb6941c 100644 --- a/swh/web/common/highlightjs.py +++ b/swh/web/common/highlightjs.py @@ -1,301 +1,301 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from pygments.lexers import ( get_all_lexers, get_lexer_for_filename ) # set of languages ids that can be highlighted # by highlight.js library _hljs_languages = set([ '1c', 'abnf', 'accesslog', 'actionscript', 'ada', 'apache', 'applescript', 'arduino', 'armasm', 'asciidoc', 'aspectj', 'autohotkey', 'autoit', 'avrasm', 'awk', 'axapta', 'bash', 'basic', 'bnf', 'brainfuck', 'cal', 'capnproto', 'ceylon', 'clean', 'clojure', 'clojure-repl', 'cmake', 'coffeescript', 'coq', 'cos', 'cpp', 'crmsh', 'crystal', 'cs', 'csp', 'css', 'dart', 'delphi', 'diff', 'django', 'd', 'dns', 'dockerfile', 'dos', 'dsconfig', 'dts', 'dust', 'ebnf', 'elixir', 'elm', 'erb', 'erlang', 'erlang-repl', 'excel', 'fix', 'flix', 'fortran', 'fsharp', 'gams', 'gauss', 'gcode', 'gherkin', 'glsl', 'go', 'golo', 'gradle', 'groovy', 'haml', 'handlebars', 'haskell', 'haxe', 'hsp', 'htmlbars', 'http', 'hy', 'inform7', 'ini', 'irpf90', 'java', 'javascript', 'jboss-cli', 'json', 'julia', 'julia-repl', 'kotlin', 'lasso', 'ldif', 'leaf', 'less', 'lisp', 'livecodeserver', 'livescript', 'llvm', 'lsl', 'lua', 'makefile', 'markdown', 'mathematica', 'matlab', 'maxima', 'mel', 'mercury', 'mipsasm', 'mizar', 'mojolicious', 'monkey', 'moonscript', 'n1ql', 'nginx', 'nimrod', 'nix', 'nsis', 'objectivec', 'ocaml', 'openscad', 'oxygene', 'parser3', 'perl', 'pf', 'php', 'pony', 'powershell', 'processing', 'profile', 'prolog', 'protobuf', 'puppet', 'purebasic', 'python', 'q', 'qml', 'rib', 'r', 'roboconf', 'routeros', 'rsl', 'ruby', 'ruleslanguage', 'rust', 'scala', 'scheme', 'scilab', 'scss', 'shell', 'smali', 'smalltalk', 'sml', 'sqf', 'sql', 'stan', 'stata', 'step21', 'stylus', 'subunit', 'swift', 'taggerscript', 'tap', 'tcl', 'tex', 'thrift', 'tp', 'twig', 'typescript', 'vala', 'vbnet', 'vbscript-html', 'vbscript', 'verilog', 'vhdl', 'vim', 'x86asm', 'xl', 'xml', 'xquery', 'yaml', 'zephir', ]) # languages aliases defined in highlight.js _hljs_languages_aliases = { 'ado': 'stata', 'adoc': 'asciidoc', 'ahk': 'autohotkey', 'apacheconf': 'apache', 'arm': 'armasm', 'as': 'actionscript', 'atom': 'xml', 'bat': 'dos', 'bf': 'brainfuck', 'bind': 'dns', 'c': 'cpp', 'c++': 'cpp', 'capnp': 'capnproto', 'cc': 'cpp', 'clean': 'clean', 'clj': 'clojure', 'cls': 'cos', 'cmake.in': 'cmake', 'cmd': 'dos', 'coffee': 'coffeescript', 'console': 'shell', 'cos': 'cos', 'cr': 'crystal', 'craftcms': 'twig', 'crm': 'crmsh', 'csharp': 'cs', 'cson': 'coffeescript', 'dcl': 'clean', 'desktop': 'ini', 'dfm': 'delphi', 'do': 'stata', 'docker': 'dockerfile', 'dpr': 'delphi', 'dst': 'dust', 'el': 'lisp', 'erl': 'erlang', 'f90': 'fortran', 'f95': 'fortran', 'feature': 'gherkin', 'freepascal': 'delphi', 'fs': 'fsharp', 'gemspec': 'ruby', 'gms': 'gams', 'golang': 'go', 'graph': 'roboconf', 'gss': 'gauss', 'gyp': 'python', 'h': 'cpp', 'h++': 'cpp', 'hbs': 'handlebars', 'hpp': 'cpp', 'hs': 'haskell', 'html': 'xml', 'html.handlebars': 'handlebars', 'html.hbs': 'handlebars', 'https': 'http', 'hx': 'haxe', 'hylang': 'hy', 'i7': 'inform7', 'iced': 'coffeescript', 'icl': 'clean', 'instances': 'roboconf', 'ipynb': 'json', 'irb': 'ruby', 'jinja': 'django', 'js': 'javascript', 'jsp': 'java', 'jsx': 'javascript', 'k': 'q', 'kdb': 'q', 'lassoscript': 'lasso', 'lazarus': 'delphi', 'lfm': 'delphi', 'lpr': 'delphi', 'ls': 'livescript', 'm': 'objectivec', 'mak': 'makefile', 'md': 'markdown', 'mikrotik': 'routeros', 'mips': 'mipsasm', 'mk': 'makefile', 'mkd': 'markdown', 'mkdown': 'markdown', 'markdown': 'markdown', 'ml': 'ocaml', 'mm': 'objectivec', 'mma': 'mathematica', 'moo': 'mercury', 'moon': 'moonscript', 'nc': 'gcode', 'nginxconf': 'nginx', 'nim': 'nimrod', 'nixos': 'nix', 'obj-c': 'objectivec', 'objc': 'objectivec', 'osascript': 'applescript', 'p21': 'step21', 'pas': 'delphi', 'pascal': 'delphi', 'patch': 'diff', 'pb': 'purebasic', 'pbi': 'purebasic', 'pcmk': 'crmsh', 'pf.conf': 'pf', 'php3': 'php', 'php4': 'php', 'php5': 'php', 'php6': 'php', 'pl': 'perl', 'plist': 'xml', 'pm': 'perl', 'podspec': 'ruby', 'pp': 'puppet', 'ps': 'powershell', 'py': 'python', 'qrc': 'xml', 'qs': 'javascript', 'qt': 'qml', 'rb': 'ruby', 'routeros': 'routeros', 'rs': 'rust', 'rst': 'nohighlight-swh', 'rss': 'xml', 'ru': 'ruby', 'scad': 'openscad', 'sci': 'scilab', 'scpt': 'applescript', 'sh': 'bash', 'smali': 'smali', 'sqf': 'sqf', 'st': 'smalltalk', 'step': 'step21', 'stp': 'step21', 'styl': 'stylus', 'sv': 'verilog', 'svh': 'verilog', 'tao': 'xl', 'thor': 'ruby', 'tk': 'tcl', 'toml': 'ini', 'ui': 'xml', 'v': 'verilog', 'vb': 'vbnet', 'vbs': 'vbscript', 'wildfly-cli': 'jboss-cli', 'xhtml': 'xml', 'xjb': 'xml', 'xls': 'excel', 'xlsx': 'excel', 'xpath': 'xquery', 'xq': 'xquery', 'xsd': 'xml', 'xsl': 'xml', 'yaml': 'yaml', 'yml': 'yaml', 'zep': 'zephir', 'zone': 'dns', 'zsh': 'bash' } # dictionary mapping pygment lexers to hljs languages _pygments_lexer_to_hljs_language = {} # dictionary mapping mime types to hljs languages _mime_type_to_hljs_language = { 'text/x-c': 'cpp', 'text/x-c++': 'cpp', 'text/x-msdos-batch': 'dos', 'text/x-lisp': 'lisp', 'text/x-shellscript': 'bash', } # function to fill the above dictionnaries def _init_pygments_to_hljs_map(): if len(_pygments_lexer_to_hljs_language) == 0: for lexer in get_all_lexers(): lexer_name = lexer[0] lang_aliases = lexer[1] lang_mime_types = lexer[3] lang = None for lang_alias in lang_aliases: if lang_alias in _hljs_languages: lang = lang_alias _pygments_lexer_to_hljs_language[lexer_name] = lang_alias break if lang: for lang_mime_type in lang_mime_types: _mime_type_to_hljs_language[lang_mime_type] = lang def get_hljs_language_from_filename(filename): """Function that tries to associate a language supported by highlight.js from a filename. Args: filename: input filename Returns: - highlight.js language id or None if no correspondance has been found + highlight.js language id or None if no correspondence has been found """ _init_pygments_to_hljs_map() if filename: exts = filename.lower().split('.') # check if file extension matches an hljs language # also handle .ext.in cases for ext in reversed(exts[-2:]): if ext in _hljs_languages: return ext if ext in _hljs_languages_aliases: return _hljs_languages_aliases[ext] # otherwise use Pygments language database lexer = None # try to find a Pygment lexer try: lexer = get_lexer_for_filename(filename) except Exception: pass - # if there is a correspondance between the lexer and an hljs + # if there is a correspondence between the lexer and an hljs # language, return it if lexer and lexer.name in _pygments_lexer_to_hljs_language: return _pygments_lexer_to_hljs_language[lexer.name] # otherwise, try to find a match between the file extensions # associated to the lexer and the hljs language aliases if lexer: exts = [ext.replace('*.', '') for ext in lexer.filenames] for ext in exts: if ext in _hljs_languages_aliases: return _hljs_languages_aliases[ext] return None def get_hljs_language_from_mime_type(mime_type): """Function that tries to associate a language supported by highlight.js from a mime type. Args: mime_type: input mime type Returns: - highlight.js language id or None if no correspondance has been found + highlight.js language id or None if no correspondence has been found """ _init_pygments_to_hljs_map() if mime_type and mime_type in _mime_type_to_hljs_language: return _mime_type_to_hljs_language[mime_type] return None diff --git a/swh/web/settings/common.py b/swh/web/settings/common.py index ef78f66f..d0fbf47f 100644 --- a/swh/web/settings/common.py +++ b/swh/web/settings/common.py @@ -1,225 +1,225 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information """ Django common settings for swh-web. """ import os from swh.web.config import get_config swh_web_config = get_config() # Build paths inside the project like this: os.path.join(BASE_DIR, ...) PROJECT_DIR = os.path.dirname(os.path.abspath(__file__)) # Quick-start development settings - unsuitable for production # See https://docs.djangoproject.com/en/1.11/howto/deployment/checklist/ # SECURITY WARNING: keep the secret key used in production secret! SECRET_KEY = swh_web_config['secret_key'] # SECURITY WARNING: don't run with debug turned on in production! DEBUG = swh_web_config['debug'] DEBUG_PROPAGATE_EXCEPTIONS = swh_web_config['debug'] ALLOWED_HOSTS = ['127.0.0.1', 'localhost'] + swh_web_config['allowed_hosts'] # Application definition INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'rest_framework', 'swh.web.api', 'swh.web.browse', 'webpack_loader', 'django_js_reverse' ] MIDDLEWARE = [ 'django.middleware.security.SecurityMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', 'swh.web.common.middlewares.ThrottlingHeadersMiddleware' ] # Compress all assets (static ones and dynamically generated html) -# served by django in a local development environement context. +# served by django in a local development environment context. # In a production environment, assets compression will be directly # handled by web servers like apache or nginx. if swh_web_config['debug']: MIDDLEWARE.insert(0, 'django.middleware.gzip.GZipMiddleware') ROOT_URLCONF = 'swh.web.urls' TEMPLATES = [ { 'BACKEND': 'django.template.backends.django.DjangoTemplates', 'DIRS': [os.path.join(PROJECT_DIR, "../templates")], 'APP_DIRS': True, 'OPTIONS': { 'context_processors': [ 'django.template.context_processors.debug', 'django.template.context_processors.request', 'django.contrib.auth.context_processors.auth', 'django.contrib.messages.context_processors.messages', ], 'libraries': { 'swh_templatetags': 'swh.web.common.swh_templatetags', }, }, }, ] WSGI_APPLICATION = 'swh.web.wsgi.application' # Database # https://docs.djangoproject.com/en/1.11/ref/settings/#databases DATABASES = { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': os.path.join(PROJECT_DIR, 'db.sqlite3'), } } # Password validation # https://docs.djangoproject.com/en/1.11/ref/settings/#auth-password-validators AUTH_PASSWORD_VALIDATORS = [ { 'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator', # noqa }, { 'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator', # noqa }, { 'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator', # noqa }, { 'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator', # noqa }, ] # Internationalization # https://docs.djangoproject.com/en/1.11/topics/i18n/ LANGUAGE_CODE = 'en-us' TIME_ZONE = 'UTC' USE_I18N = True USE_L10N = True USE_TZ = True # Static files (CSS, JavaScript, Images) # https://docs.djangoproject.com/en/1.11/howto/static-files/ STATIC_URL = '/static/' STATICFILES_DIRS = [ os.path.join(PROJECT_DIR, "../static") ] INTERNAL_IPS = ['127.0.0.1'] throttle_rates = {} http_requests = ['GET', 'HEAD', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'PATCH'] throttling = swh_web_config['throttling'] for limiter_scope, limiter_conf in throttling['scopes'].items(): if 'default' in limiter_conf['limiter_rate']: throttle_rates[limiter_scope] = limiter_conf['limiter_rate']['default'] # for backward compatibility else: throttle_rates[limiter_scope] = limiter_conf['limiter_rate'] # register sub scopes specific for HTTP request types for http_request in http_requests: if http_request in limiter_conf['limiter_rate']: throttle_rates[limiter_scope + '_' + http_request.lower()] = \ limiter_conf['limiter_rate'][http_request] REST_FRAMEWORK = { 'DEFAULT_RENDERER_CLASSES': ( 'rest_framework.renderers.JSONRenderer', 'swh.web.api.renderers.YAMLRenderer', 'rest_framework.renderers.TemplateHTMLRenderer' ), 'DEFAULT_THROTTLE_CLASSES': ( 'swh.web.common.throttling.SwhWebRateThrottle', ), 'DEFAULT_THROTTLE_RATES': throttle_rates } LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'filters': { 'require_debug_false': { '()': 'django.utils.log.RequireDebugFalse', }, 'require_debug_true': { '()': 'django.utils.log.RequireDebugTrue', }, }, 'formatters': { 'verbose': { 'format': '[%(asctime)s] [%(levelname)s] %(request)s %(status_code)s', # noqa 'datefmt': "%d/%b/%Y %H:%M:%S" }, }, 'handlers': { 'console': { 'level': 'DEBUG', 'filters': ['require_debug_true'], 'class': 'logging.StreamHandler', }, 'file': { 'level': 'INFO', 'filters': ['require_debug_false'], 'class': 'logging.FileHandler', 'filename': os.path.join(swh_web_config['log_dir'], 'swh-web.log'), 'formatter': 'verbose' }, }, 'loggers': { 'django': { 'handlers': ['console', 'file'], 'level': 'DEBUG' if DEBUG else 'INFO', 'propagate': True, }, 'django.request': { 'handlers': ['file'], 'level': 'DEBUG' if DEBUG else 'INFO', 'propagate': False, } }, } WEBPACK_LOADER = { 'DEFAULT': { 'CACHE': not DEBUG, 'BUNDLE_DIR_NAME': './', 'STATS_FILE': os.path.join(PROJECT_DIR, '../static/webpack-stats.json'), # noqa 'POLL_INTERVAL': 0.1, 'TIMEOUT': None, 'IGNORE': ['.+\.hot-update.js', '.+\.map'] } } diff --git a/swh/web/tests/api/test_apidoc.py b/swh/web/tests/api/test_apidoc.py index 1c6f571d..21ea26f7 100644 --- a/swh/web/tests/api/test_apidoc.py +++ b/swh/web/tests/api/test_apidoc.py @@ -1,288 +1,288 @@ # Copyright (C) 2015-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from nose.tools import istest, nottest from rest_framework.test import APITestCase from rest_framework.response import Response from swh.web.api.apidoc import api_doc, _parse_httpdomain_doc from swh.web.api.apiurls import api_route from swh.web.tests.testcase import SWHWebTestCase # flake8: noqa httpdomain_doc = """ .. http:get:: /api/1/revision/(sha1_git)/ Get information about a revision in the SWH archive. Revisions are identified by *sha1* checksums, compatible with Git commit identifiers. See :func:`swh.model.identifiers.revision_identifier` in our data model module for details about how they are computed. :param string sha1_git: hexadecimal representation of the revision *sha1_git* identifier :reqheader Accept: the requested response content type, either *application/json* (default) or *application/yaml* :resheader Content-Type: this depends on :http:header:`Accept` header of request :>json object author: information about the author of the revision :>json string author_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the author of the revision :>json object committer: information about the committer of the revision :>json string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get information about the committer of the revision :>json string committer_date: ISO representation of the commit date (in UTC) :>json string date: ISO representation of the revision date (in UTC) :>json string directory: the unique identifier that revision points to :>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]` to get information about the directory associated to the revision :>json string id: the revision unique identifier :>json boolean merge: whether or not the revision corresponds to a merge commit :>json string message: the message associated to the revision :>json array parents: the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/` - to get more informations about it + to get more information about it :>json string type: the type of the revision **Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options` :statuscode 200: no error :statuscode 400: an invalid *sha1_git* value has been provided :statuscode 404: requested revision can not be found in the SWH archive **Request:** .. parsed-literal:: $ curl -i :swh_web_api:`revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/` """ class APIDocTestCase(SWHWebTestCase, APITestCase): @istest def apidoc_nodoc_failure(self): with self.assertRaises(Exception): @api_doc('/my/nodoc/url/') def apidoc_nodoc_tester(request, arga=0, argb=0): return Response(arga + argb) @staticmethod @api_route(r'/some/(?P[0-9]+)/(?P[0-9]+)/', 'some-doc-route') @api_doc('/some/doc/route/') @nottest def apidoc_route_tester(request, myarg, myotherarg, akw=0): """ Sample doc """ return {'result': int(myarg) + int(myotherarg) + akw} @istest def apidoc_route_doc(self): # when rv = self.client.get('/api/1/some/doc/route/') # then self.assertEqual(rv.status_code, 200) self.assertTemplateUsed('api/apidoc.html') @istest def apidoc_route_fn(self): # when rv = self.client.get('/api/1/some/1/1/') # then self.assertEqual(rv.status_code, 200) @staticmethod @api_route(r'/some/full/(?P[0-9]+)/(?P[0-9]+)/', 'some-complete-doc-route') @api_doc('/some/complete/doc/route/') @nottest def apidoc_full_stack_tester(request, myarg, myotherarg, akw=0): """ Sample doc """ return {'result': int(myarg) + int(myotherarg) + akw} @istest def apidoc_full_stack_doc(self): # when rv = self.client.get('/api/1/some/complete/doc/route/') # then self.assertEqual(rv.status_code, 200) self.assertTemplateUsed('api/apidoc.html') @istest def apidoc_full_stack_fn(self): # when rv = self.client.get('/api/1/some/full/1/1/') # then self.assertEqual(rv.status_code, 200) @istest def test_api_doc_parse_httpdomain(self): doc_data = { 'description': '', 'urls': [], 'args': [], 'params': [], 'resheaders': [], 'reqheaders': [], 'return_type': '', 'returns': [], 'status_codes': [], 'examples': [] } _parse_httpdomain_doc(httpdomain_doc, doc_data) expected_urls = [{ 'rule': '/api/1/revision/ **\\(sha1_git\\)** /', 'methods': ['GET', 'HEAD', 'OPTIONS'] }] self.assertIn('urls', doc_data) self.assertEqual(doc_data['urls'], expected_urls) expected_description = 'Get information about a revision in the SWH archive. \ Revisions are identified by *sha1* checksums, compatible with Git commit \ identifiers. See **swh.model.identifiers.revision_identifier** in our data \ model module for details about how they are computed.' self.assertIn('description', doc_data) self.assertEqual(doc_data['description'], expected_description) expected_args = [{ 'name': 'sha1_git', 'type': 'string', 'doc': 'hexadecimal representation of the revision *sha1_git* identifier' }] self.assertIn('args', doc_data) self.assertEqual(doc_data['args'], expected_args) expected_params = [] self.assertIn('params', doc_data) self.assertEqual(doc_data['params'], expected_params) expected_reqheaders = [{ 'doc': 'the requested response content type, either *application/json* or *application/yaml*', 'name': 'Accept' }] self.assertIn('reqheaders', doc_data) self.assertEqual(doc_data['reqheaders'], expected_reqheaders) expected_resheaders = [{ 'doc': 'this depends on **Accept** header of request', 'name': 'Content-Type' }] self.assertIn('resheaders', doc_data) self.assertEqual(doc_data['resheaders'], expected_resheaders) expected_statuscodes = [ { 'code': '200', 'doc': 'no error' }, { 'code': '400', 'doc': 'an invalid *sha1_git* value has been provided' }, { 'code': '404', 'doc': 'requested revision can not be found in the SWH archive' } ] self.assertIn('status_codes', doc_data) self.assertEqual(doc_data['status_codes'], expected_statuscodes) expected_return_type = 'object' self.assertIn('return_type', doc_data) self.assertEqual(doc_data['return_type'], expected_return_type) expected_returns = [ { 'name': 'author', 'type': 'object', 'doc': 'information about the author of the revision' }, { 'name': 'author_url', 'type': 'string', 'doc': 'link to ``_ to get information about the author of the revision' }, { 'name': 'committer', 'type': 'object', 'doc': 'information about the committer of the revision' }, { 'name': 'committer_url', 'type': 'string', 'doc': 'link to ``_ to get information about the committer of the revision' }, { 'name': 'committer_date', 'type': 'string', 'doc': 'ISO representation of the commit date (in UTC)' }, { 'name': 'date', 'type': 'string', 'doc': 'ISO representation of the revision date (in UTC)' }, { 'name': 'directory', 'type': 'string', 'doc': 'the unique identifier that revision points to' }, { 'name': 'directory_url', 'type': 'string', 'doc': 'link to ``_ to get information about the directory associated to the revision' }, { 'name': 'id', 'type': 'string', 'doc': 'the revision unique identifier' }, { 'name': 'merge', 'type': 'boolean', 'doc': 'whether or not the revision corresponds to a merge commit' }, { 'name': 'message', 'type': 'string', 'doc': 'the message associated to the revision' }, { 'name': 'parents', 'type': 'array', - 'doc': 'the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to ``_ to get more informations about it' + 'doc': 'the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to ``_ to get more information about it' }, { 'name': 'type', 'type': 'string', 'doc': 'the type of the revision' } ] self.assertIn('returns', doc_data) self.assertEqual(doc_data['returns'], expected_returns) expected_examples = ['/api/1/revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/'] self.assertIn('examples', doc_data) self.assertEqual(doc_data['examples'], expected_examples) diff --git a/swh/web/tests/testcase.py b/swh/web/tests/testcase.py index e3169c3a..e28333ad 100644 --- a/swh/web/tests/testcase.py +++ b/swh/web/tests/testcase.py @@ -1,70 +1,70 @@ # Copyright (C) 2015-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information # Functions defined here are NOT DESIGNED FOR PRODUCTION from django.core.cache import cache from django.test import TestCase from swh.storage.api.client import RemoteStorage as Storage from swh.web import config # Because the Storage's __init__ function does side effect at startup... class RemoteStorageAdapter(Storage): def __init__(self, base_url): self.base_url = base_url def _init_mock_storage(base_url='https://somewhere.org:4321'): - """Instanciate a remote storage whose goal is to be mocked in a test + """Instantiate a remote storage whose goal is to be mocked in a test context. NOT FOR PRODUCTION Returns: An instance of swh.storage.api.client.RemoteStorage destined to be mocked (it does not do any rest call) """ return RemoteStorageAdapter(base_url) # destined to be used as mock def create_config(base_url='https://somewhere.org:4321'): """Function to initiate swh-web config with storage designed to be mocked. Returns: dict containing swh-web config for tests NOT FOR PRODUCTION """ storage = _init_mock_storage(base_url) swh_config = config.get_config() # inject the mock data swh_config.update({'storage': storage}) return swh_config class SWHWebTestCase(TestCase): """Testing API class. """ @classmethod def setUpClass(cls): super(SWHWebTestCase, cls).setUpClass() cls.test_config = create_config() cls.maxDiff = None @classmethod def storage(cls): return cls.test_config['storage'] def setUp(self): cache.clear()