diff --git a/docs/developers-info.rst b/docs/developers-info.rst
index 0e825497..cf2f662d 100644
--- a/docs/developers-info.rst
+++ b/docs/developers-info.rst
@@ -1,135 +1,135 @@
Developers Information
======================
Sample configuration
--------------------
The configuration will be taken from the default configuration file: *~/.config/swh/webapp/webapp.yml*.
The following introduces a default configuration file:
.. sourcecode:: yaml
storage:
cls: remote
args:
url: http://localhost:5002
debug: false
throttling:
cache_uri: None
scopes:
swh_api:
limiter_rate:
default: 120/h
exempted_networks:
- 127.0.0.0/8
Run server
----------
Either use the django manage script directly (useful in development mode as it offers various commands):
.. sourcecode:: shell
$ python3 -m swh.web.manage runserver
or use the following shortcut:
.. sourcecode:: shell
$ make run
Modules description
-------------------
Common to all web applications
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Configuration and settings
""""""""""""""""""""""""""
* :mod:`swh.web.config`: holds the SWH configuration for the web applications.
* :mod:`swh.web.doc_config`: utility module used to extend the sphinx configuration
when building the documentation.
- * :mod:`swh.web.manage`: Django management module for developpers.
+ * :mod:`swh.web.manage`: Django management module for developers.
* :mod:`swh.web.urls`: module that holds the whole URI scheme of all
the web applications.
* :mod:`swh.web.wsgi`: WSGI module to use when deploying the web applications
in production.
* :mod:`swh.web.settings.common`: Common Django settings
* :mod:`swh.web.settings.development`: Django settings for development
* :mod:`swh.web.settings.production`: Django settings for production
* :mod:`swh.web.settings.tests`: Django settings for tests
Common utilities
""""""""""""""""
* :mod:`swh.web.common.converters`: conversion module used to transform SWH raw data
to serializable ones. It is used by :mod:`swh.web.common.service`: to convert data
before transmitting then to Django views.
* :mod:`swh.web.common.exc`: module defining exceptions used in the web applications.
* :mod:`swh.web.common.highlightjs`: utility module to ease the use of the highlightjs_
library in produced Django views.
* :mod:`swh.web.common.query`: Utilities to parse data from HTTP endpoints. It is used
by :mod:`swh.web.common.service`.
* :mod:`swh.web.common.service`: Orchestration layer used by views module
in charge of communication with :mod:`swh.storage` to retrieve information and
perform conversion for the upper layer.
* :mod:`swh.web.common.swh_templatetags`: Custom Django template tags library for swh.
* :mod:`swh.web.common.throttling`: Custom request rate limiter to use with the `Django REST Framework
`_
* :mod:`swh.web.common.urlsindex`: Utilities to help the registering of endpoints
for the web applications
* :mod:`swh.web.common.utils`: Utility functions used in the web applications implementation
SWH Web API application
^^^^^^^^^^^^^^^^^^^^^^^
* :mod:`swh.web.api.apidoc`: Utilities to document the web api for its html
browsable rendering.
* :mod:`swh.web.api.apiresponse`: Utility module to ease the generation of
web api responses.
* :mod:`swh.web.api.apiurls`: Utilities to facilitate the registration of SWH
web api endpoints.
* :mod:`swh.web.api.urls`: Module that defines the whole URI scheme for the api endpoints
* :mod:`swh.web.api.utils`: Utility functions used in the SWH web api implementation.
* :mod:`swh.web.api.views.content`: Implementation of API endpoints for getting information
about SWH contents.
* :mod:`swh.web.api.views.directory`: Implementation of API endpoints for getting information
about SWH directories.
* :mod:`swh.web.api.views.entity`: Implementation of API endpoints for getting information
about SWH entities.
* :mod:`swh.web.api.views.origin`: Implementation of API endpoints for getting information
about SWH origins.
* :mod:`swh.web.api.views.person`: Implementation of API endpoints for getting information
about SWH persons.
* :mod:`swh.web.api.views.release`: Implementation of API endpoints for getting information
about SWH releases.
* :mod:`swh.web.api.views.revision`: Implementation of API endpoints for getting information
about SWH revisions.
* :mod:`swh.web.api.views.snapshot`: Implementation of API endpoints for getting information
about SWH snapshots.
* :mod:`swh.web.api.views.stat`: Implementation of API endpoints for getting information
about SWH archive statistics.
* :mod:`swh.web.api.views.utils`: Utilities used in the web api endpoints implementation.
SWH Web browse application
^^^^^^^^^^^^^^^^^^^^^^^^^^
* :mod:`swh.web.browse.browseurls`: Utilities to facilitate the registration of SWH web
browse endpoints.
* :mod:`swh.web.browse.urls`: Module that defines the whole URI scheme for the SWH web
browse endpoints.
* :mod:`swh.web.browse.utils`: Utilities functions used troughout the SWH web browse
endpoints implementation.
* :mod:`swh.web.browse.views.content`: Implementation of endpoints for browsing SWH contents.
* :mod:`swh.web.browse.views.directory`: Implementation of endpoints for browsing SWH directories.
* :mod:`swh.web.browse.views.identifiers`: Implementation of endpoints for browsing SWH objects
through persistent identifiers.
* :mod:`swh.web.browse.views.origin`: Implementation of endpoints for browsing SWH origins.
* :mod:`swh.web.browse.views.person`: Implementation of endpoints for browsing SWH persons.
* :mod:`swh.web.browse.views.release`: Implementation of endpoints for browsing SWH releases.
* :mod:`swh.web.browse.views.revision`: Implementation of endpoints for browsing SWH revisions.
* :mod:`swh.web.browse.views.snapshot`: Implementation of endpoints for browsing SWH snapshots.
.. _highlightjs: https://highlightjs.org/
diff --git a/docs/uri-scheme-browse-content.rst b/docs/uri-scheme-browse-content.rst
index d81fbc7d..d977fac1 100644
--- a/docs/uri-scheme-browse-content.rst
+++ b/docs/uri-scheme-browse-content.rst
@@ -1,92 +1,92 @@
Content
^^^^^^^
.. http:get:: /browse/content/[(algo_hash):](hash)/
HTML view that displays a SWH content identified by its hash value.
If the content to display is textual, it will be highlighted client-side
if possible using highlightjs_. In order for that operation to be
performed, a programming language must first be associated to the content.
The following procedure is used in order to find the language:
1) First try to find a language from the content filename
(provided as query parameter when navigating from a directory view).
2) If no language has been found from the filename,
try to find one from the content mime type.
The mime type is retrieved from the content metadata stored
in the SWH archive or is computed server-side using Python
magic module.
It is also possible to highlight specific lines of a textual
content (not in terms of syntax highlighting but to emphasize
some relevant content part) by either:
* clicking on line numbers (holding shift to highlight a lines range)
* using an url fragment in the form '#Ln' or '#Lm-Ln'
When that view is called in the context of a navigation coming from
a directory view, a breadcrumb will be displayed on top of the rendered
content in order to easily navigate up to the associated root directory.
In that case, the path query parameter will be used and filled with the path
of the file relative to the root directory.
:param string algo_hash: optional parameter to indicate the algorithm used
to compute the content checksum (can be either *sha1*,
*sha1_git*, *sha256* or *blake2s256*, default to *sha1*)
:param string hash: hexadecimal representation for the checksum from which
to retrieve the associated content in the SWH archive
:query string path: describe the path of the content relative to a root
directory (used to add context aware navigation links when navigating
from a directory view)
:statuscode 200: no error
:statuscode 400: an invalid query string has been provided
:statuscode 404: requested content can not be found in the SWH archive
**Examples:**
.. parsed-literal::
:swh_web_browse:`content/sha1_git:f5d0b39a0cdddb91a31a537052b7d8d31a4aa79f/`
:swh_web_browse:`content/sha1_git:f5d0b39a0cdddb91a31a537052b7d8d31a4aa79f/#L23-L41`
:swh_web_browse:`content/blake2s256:1cc1e3124957c9be8a454c58e92eb925cf4aa9823984bd01451c5b7e0fee99d1/`
:swh_web_browse:`content/sha1:1cb1447c1c7ddc1b03eac88398e40bd914d46b62/`
:swh_web_browse:`content/sha256:8ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b903/`
.. http:get:: /browse/content/[(algo_hash):](hash)/raw/
HTML view that produces a raw display of a SWH content identified by its hash value.
The behaviour of that view depends on the mime type of the requested content.
If the mime type is from the text family, the view will return a response whose
content type is 'text/plain' that will be rendered by the browser. Otherwise,
the view will return a response whose content type is 'application/octet-stream'
and the browser will then offer to download the file.
In the context of a navigation coming from a directory view, the filename query
parameter will be used in order to provide the real name of the file when
one wants to save it locally.
- :param string algo_hash: optionnal parameter to indicate the algorithm used
+ :param string algo_hash: optional parameter to indicate the algorithm used
to compute the content checksum (can be either *sha1*,
*sha1_git*, *sha256* or *blake2s256*, default to *sha1*)
:param string hash: hexadecimal representation for the checksum from which
to retrieve the associated content in the SWH archive
:query string filename: indicate the name of the file holding the requested
content (used when one wants to save the content to a local file)
:statuscode 200: no error
:statuscode 400: an invalid query string has been provided
:statuscode 404: requested content can not be found in the SWH archive
**Examples:**
.. parsed-literal::
:swh_web_browse:`content/sha1_git:f5d0b39a0cdddb91a31a537052b7d8d31a4aa79f/raw/?filename=LICENSE`
:swh_web_browse:`content/blake2s256:1cc1e3124957c9be8a454c58e92eb925cf4aa9823984bd01451c5b7e0fee99d1/raw/?filename=MAINTAINERS`
:swh_web_browse:`content/sha1:1cb1447c1c7ddc1b03eac88398e40bd914d46b62/raw/`
:swh_web_browse:`content/sha256:8ceb4b9ee5adedde47b31e975c1d90c73ad27b6b165a1dcd80c7c545eb65b903/raw/?filename=COPYING`
.. _highlightjs: https://highlightjs.org/
\ No newline at end of file
diff --git a/docs/uri-scheme-browse-release.rst b/docs/uri-scheme-browse-release.rst
index 608c1b50..ba895ff9 100644
--- a/docs/uri-scheme-browse-release.rst
+++ b/docs/uri-scheme-browse-release.rst
@@ -1,25 +1,25 @@
Release
^^^^^^^
.. http:get:: /browse/release/(sha1_git)/
HTML view that displays metadata associated to a SWH release:
* the author
* the release date
* the release name
* the associated message
- * the type of target the release points to (revision, directory, content or relase)
+ * the type of target the release points to (revision, directory, content or release)
* the link to browse the release target
:param string sha1_git: hexadecimal representation for the *sha1_git*
identifier of a SWH release
:statuscode 200: no error
:statuscode 404: requested release can not be found in the SWH archive
**Examples:**
.. parsed-literal::
:swh_web_browse:`release/208f61cc7a5dbc9879ae6e5c2f95891e270f09ef/`
:swh_web_browse:`release/f883596e997fe5bcbc5e89bee01b869721326109/`
diff --git a/swh/web/api/apiresponse.py b/swh/web/api/apiresponse.py
index 709f900e..f2e9af46 100644
--- a/swh/web/api/apiresponse.py
+++ b/swh/web/api/apiresponse.py
@@ -1,180 +1,180 @@
# Copyright (C) 2017-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU Affero General Public License version 3, or any later version
# See top-level LICENSE file for more information
import json
from rest_framework.response import Response
from swh.storage.exc import StorageDBError, StorageAPIError
from swh.web.common.exc import NotFoundExc, ForbiddenExc
from swh.web.common.utils import shorten_path, gen_path_info
from swh.web.api import utils
def compute_link_header(rv, options):
"""Add Link header in returned value results.
Args:
rv (dict): dictionary with keys:
- headers: potential headers with 'link-next' and 'link-prev'
keys
- results: containing the result to return
options (dict): the initial dict to update with result if any
Returns:
dict: dictionary with optional keys 'link-next' and 'link-prev'
"""
link_headers = []
if 'headers' not in rv:
return {}
rv_headers = rv['headers']
if 'link-next' in rv_headers:
link_headers.append('<%s>; rel="next"' % (
rv_headers['link-next']))
if 'link-prev' in rv_headers:
link_headers.append('<%s>; rel="previous"' % (
rv_headers['link-prev']))
if link_headers:
link_header_str = ','.join(link_headers)
headers = options.get('headers', {})
headers.update({
'Link': link_header_str
})
return headers
return {}
def filter_by_fields(request, data):
"""Extract a request parameter 'fields' if it exists to permit the filtering on
he data dict's keys.
If such field is not provided, returns the data as is.
"""
fields = request.query_params.get('fields')
if fields:
fields = set(fields.split(','))
data = utils.filter_field_keys(data, fields)
return data
def transform(rv):
"""Transform an eventual returned value with multiple layer of
information with only what's necessary.
If the returned value rv contains the 'results' key, this is the
associated value which is returned.
Otherwise, return the initial dict without the potential 'headers'
key.
"""
if 'results' in rv:
return rv['results']
if 'headers' in rv:
rv.pop('headers')
return rv
def make_api_response(request, data, doc_data={}, options={}):
"""Generates an API response based on the requested mimetype.
Args:
request: a DRF Request object
data: raw data to return in the API response
doc_data: documentation data for HTML response
- options: optionnal data that can be used to generate the response
+ options: optional data that can be used to generate the response
Returns:
a DRF Response a object
"""
if data:
options['headers'] = compute_link_header(data, options)
data = transform(data)
data = filter_by_fields(request, data)
doc_env = doc_data
headers = {}
if 'headers' in options:
doc_env['headers_data'] = options['headers']
headers = options['headers']
# get request status code
doc_env['status_code'] = options.get('status', 200)
response_args = {'status': doc_env['status_code'],
'headers': headers,
'content_type': request.accepted_media_type}
# when requesting HTML, typically when browsing the API through its
# documented views, we need to enrich the input data with documentation
# related ones and inform DRF that we request HTML template rendering
if request.accepted_media_type == 'text/html':
if data:
data = json.dumps(data, sort_keys=True,
indent=4,
separators=(',', ': '))
doc_env['response_data'] = data
doc_env['request'] = {
'path': request.path,
'method': request.method,
'absolute_uri': request.build_absolute_uri(),
}
doc_env['heading'] = shorten_path(str(request.path))
if 'route' in doc_env:
doc_env['endpoint_path'] = gen_path_info(doc_env['route'])
response_args['data'] = doc_env
response_args['template_name'] = 'api/apidoc.html'
# otherwise simply return the raw data and let DRF picks
# the correct renderer (JSON or YAML)
else:
response_args['data'] = data
return Response(**response_args)
def error_response(request, error, doc_data):
"""Private function to create a custom error response.
Args:
request: a DRF Request object
error: the exception that caused the error
doc_data: documentation data for HTML response
"""
error_code = 400
if isinstance(error, NotFoundExc):
error_code = 404
elif isinstance(error, ForbiddenExc):
error_code = 403
elif isinstance(error, StorageDBError):
error_code = 503
elif isinstance(error, StorageAPIError):
error_code = 503
error_opts = {'status': error_code}
error_data = {
'exception': error.__class__.__name__,
'reason': str(error),
}
return make_api_response(request, error_data, doc_data,
options=error_opts)
diff --git a/swh/web/api/views/revision.py b/swh/web/api/views/revision.py
index d178661b..6b88589f 100644
--- a/swh/web/api/views/revision.py
+++ b/swh/web/api/views/revision.py
@@ -1,504 +1,504 @@
# Copyright (C) 2015-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU Affero General Public License version 3, or any later version
# See top-level LICENSE file for more information
from django.http import HttpResponse
from swh.web.common import service
from swh.web.common.utils import reverse
from swh.web.common.utils import parse_timestamp
from swh.web.api import utils
from swh.web.api.apidoc import api_doc
from swh.web.api.apiurls import api_route
from swh.web.api.views.utils import api_lookup
def _revision_directory_by(revision, path, request_path,
limit=100, with_data=False):
"""
Compute the revision matching criterion's directory or content data.
Args:
revision: dictionary of criterions representing a revision to lookup
path: directory's path to lookup
request_path: request path which holds the original context to
limit: optional query parameter to limit the revisions log
(default to 100). For now, note that this limit could impede the
transitivity conclusion about sha1_git not being an ancestor of
with_data: indicate to retrieve the content's raw data if path resolves
to a content.
"""
def enrich_directory_local(dir, context_url=request_path):
return utils.enrich_directory(dir, context_url)
rev_id, result = service.lookup_directory_through_revision(
revision, path, limit=limit, with_data=with_data)
content = result['content']
if result['type'] == 'dir': # dir_entries
result['content'] = list(map(enrich_directory_local, content))
else: # content
result['content'] = utils.enrich_content(content)
return result
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/branch/(?P.+)/log/',
'revision-origin-log')
@api_route(r'/revision/origin/(?P[0-9]+)/log/',
'revision-origin-log')
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/ts/(?P.+)/log/',
'revision-origin-log')
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/branch/(?P.+)'
r'/ts/(?P.+)/log/',
'revision-origin-log')
@api_doc('/revision/origin/log/')
def api_revision_log_by(request, origin_id,
branch_name='refs/heads/master',
ts=None):
"""
.. http:get:: /api/1/revision/origin/(origin_id)[/branch/(branch_name)][/ts/(timestamp)]/log
Show the commit log for a revision, searching for it based on software origin,
branch name, and/or visit timestamp.
This endpoint behaves like :http:get:`/api/1/revision/(sha1_git)[/prev/(prev_sha1s)]/log/`,
but operates on the revision that has been found at a given software origin,
close to a given point in time, pointed by a given branch.
:param int origin_id: a SWH origin identifier
:param string branch_name: optional parameter specifying a fully-qualified branch name
associated to the software origin, e.g., "refs/heads/master". Defaults to the master branch.
:param string timestamp: optional parameter specifying a timestamp close to which the revision
pointed by the given branch should be looked up. The timestamp can be expressed either
as an ISO date or as a Unix one (in UTC). Defaults to now.
:reqheader Accept: the requested response content type,
either *application/json* (default) or *application/yaml*
:resheader Content-Type: this depends on :http:header:`Accept` header of request
:>jsonarr object author: information about the author of the revision
:>jsonarr string author_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the author of the revision
:>jsonarr object committer: information about the committer of the revision
:>jsonarr string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the committer of the revision
:>jsonarr string committer_date: ISO representation of the commit date (in UTC)
:>jsonarr string date: ISO representation of the revision date (in UTC)
:>jsonarr string directory: the unique identifier that revision points to
:>jsonarr string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]`
to get information about the directory associated to the revision
:>jsonarr string id: the revision unique identifier
:>jsonarr boolean merge: whether or not the revision corresponds to a merge commit
:>jsonarr string message: the message associated to the revision
:>jsonarr array parents: the parents of the revision, i.e. the previous revisions
that head directly to it, each entry of that array contains an unique parent
revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/`
- to get more informations about it
+ to get more information about it
:>jsonarr string type: the type of the revision
**Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options`
:statuscode 200: no error
:statuscode 404: no revision matching the given criteria could be found in the SWH archive
**Example:**
.. parsed-literal::
:swh_web_api:`revision/origin/723566/ts/2016-01-17T00:00:00+00:00/log/`
""" # noqa
result = {}
per_page = int(request.query_params.get('per_page', '10'))
if ts:
ts = parse_timestamp(ts)
def lookup_revision_log_by_with_limit(o_id, br, ts, limit=per_page+1):
return service.lookup_revision_log_by(o_id, br, ts, limit)
error_msg = 'No revision matching origin %s ' % origin_id
error_msg += ', branch name %s' % branch_name
error_msg += (' and time stamp %s.' % ts) if ts else '.'
rev_get = api_lookup(
lookup_revision_log_by_with_limit, origin_id, branch_name, ts,
notfound_msg=error_msg,
enrich_fn=utils.enrich_revision)
nb_rev = len(rev_get)
if nb_rev == per_page+1:
revisions = rev_get[:-1]
last_sha1_git = rev_get[-1]['id']
params = {k: v for k, v in {'origin_id': origin_id,
'branch_name': branch_name,
'ts': ts,
}.items() if v is not None}
query_params = {}
query_params['sha1_git'] = last_sha1_git
if request.query_params.get('per_page'):
query_params['per_page'] = per_page
result['headers'] = {
'link-next': reverse('revision-origin-log', kwargs=params,
query_params=query_params)
}
else:
revisions = rev_get
result.update({'results': revisions})
return result
@api_route(r'/revision/origin/(?P[0-9]+)/directory/',
'revision-directory')
@api_route(r'/revision/origin/(?P[0-9]+)/directory/(?P.+)/',
'revision-directory')
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/branch/(?P.+)/directory/',
'revision-directory')
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/branch/(?P.+)/ts/(?P.+)/directory/',
'revision-directory')
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/branch/(?P.+)/directory/(?P.+)/',
'revision-directory')
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/branch/(?P.+)/ts/(?P.+)'
r'/directory/(?P.+)/',
'revision-directory')
@api_doc('/revision/origin/directory/', tags=['hidden'])
def api_directory_through_revision_origin(request, origin_id,
branch_name="refs/heads/master",
ts=None,
path=None,
with_data=False):
"""
Display directory or content information through a revision identified
by origin/branch/timestamp.
"""
if ts:
ts = parse_timestamp(ts)
return _revision_directory_by({'origin_id': origin_id,
'branch_name': branch_name,
'ts': ts
},
path, request.path,
with_data=with_data)
@api_route(r'/revision/origin/(?P[0-9]+)/',
'revision-origin')
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/branch/(?P.+)/',
'revision-origin')
@api_route(r'/revision/origin/(?P[0-9]+)'
r'/branch/(?P.+)/ts/(?P.+)/',
'revision-origin')
@api_route(r'/revision/origin/(?P[0-9]+)/ts/(?P.+)/',
'revision-origin')
@api_doc('/revision/origin/')
def api_revision_with_origin(request, origin_id,
branch_name="refs/heads/master",
ts=None):
"""
.. http:get:: /api/1/revision/origin/(origin_id)/[branch/(branch_name)/][ts/(timestamp)/]
Get information about a revision, searching for it based on software origin,
branch name, and/or visit timestamp.
This endpoint behaves like :http:get:`/api/1/revision/(sha1_git)/`,
but operates on the revision that has been found at a given software origin,
close to a given point in time, pointed by a given branch.
:param int origin_id: a SWH origin identifier
:param string branch_name: optional parameter specifying a fully-qualified branch name
associated to the software origin, e.g., "refs/heads/master". Defaults to the master branch.
:param string timestamp: optional parameter specifying a timestamp close to which the revision
pointed by the given branch should be looked up. The timestamp can be expressed either
as an ISO date or as a Unix one (in UTC). Defaults to now.
:reqheader Accept: the requested response content type,
either *application/json* (default) or *application/yaml*
:resheader Content-Type: this depends on :http:header:`Accept` header of request
:>json object author: information about the author of the revision
:>json string author_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the author of the revision
:>json object committer: information about the committer of the revision
:>json string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the committer of the revision
:>json string committer_date: ISO representation of the commit date (in UTC)
:>json string date: ISO representation of the revision date (in UTC)
:>json string directory: the unique identifier that revision points to
:>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]`
to get information about the directory associated to the revision
:>json string id: the revision unique identifier
:>json boolean merge: whether or not the revision corresponds to a merge commit
:>json string message: the message associated to the revision
:>json array parents: the parents of the revision, i.e. the previous revisions
that head directly to it, each entry of that array contains an unique parent
revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/`
- to get more informations about it
+ to get more information about it
:>json string type: the type of the revision
**Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options`
:statuscode 200: no error
:statuscode 404: no revision matching the given criteria could be found in the SWH archive
**Example:**
.. parsed-literal::
:swh_web_api:`revision/origin/13706355/branch/refs/heads/2.7/`
""" # noqa
ts = parse_timestamp(ts)
return api_lookup(
service.lookup_revision_by, origin_id, branch_name, ts,
notfound_msg=('Revision with (origin_id: {}, branch_name: {}'
', ts: {}) not found.'.format(origin_id,
branch_name, ts)),
enrich_fn=utils.enrich_revision)
@api_route(r'/revision/(?P[0-9a-f]+)/prev/(?P[0-9a-f/]+)/',
'revision-context')
@api_doc('/revision/prev/', tags=['hidden'])
def api_revision_with_context(request, sha1_git, context):
"""
Return information about revision with id sha1_git.
"""
def _enrich_revision(revision, context=context):
return utils.enrich_revision(revision, context)
return api_lookup(
service.lookup_revision, sha1_git,
notfound_msg='Revision with sha1_git %s not found.' % sha1_git,
enrich_fn=_enrich_revision)
@api_route(r'/revision/(?P[0-9a-f]+)/', 'revision')
@api_doc('/revision/')
def api_revision(request, sha1_git):
"""
.. http:get:: /api/1/revision/(sha1_git)/
Get information about a revision in the SWH archive.
Revisions are identified by *sha1* checksums, compatible with Git commit identifiers.
See :func:`swh.model.identifiers.revision_identifier` in our data model module for details
about how they are computed.
:param string sha1_git: hexadecimal representation of the revision *sha1_git* identifier
:reqheader Accept: the requested response content type,
either *application/json* (default) or *application/yaml*
:resheader Content-Type: this depends on :http:header:`Accept` header of request
:>json object author: information about the author of the revision
:>json string author_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the author of the revision
:>json object committer: information about the committer of the revision
:>json string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the committer of the revision
:>json string committer_date: ISO representation of the commit date (in UTC)
:>json string date: ISO representation of the revision date (in UTC)
:>json string directory: the unique identifier that revision points to
:>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]`
to get information about the directory associated to the revision
:>json string id: the revision unique identifier
:>json boolean merge: whether or not the revision corresponds to a merge commit
:>json string message: the message associated to the revision
:>json array parents: the parents of the revision, i.e. the previous revisions
that head directly to it, each entry of that array contains an unique parent
revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/`
- to get more informations about it
+ to get more information about it
:>json string type: the type of the revision
**Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options`
:statuscode 200: no error
:statuscode 400: an invalid *sha1_git* value has been provided
:statuscode 404: requested revision can not be found in the SWH archive
**Example:**
.. parsed-literal::
:swh_web_api:`revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/`
""" # noqa
return api_lookup(
service.lookup_revision, sha1_git,
notfound_msg='Revision with sha1_git {} not found.'.format(sha1_git),
enrich_fn=utils.enrich_revision)
@api_route(r'/revision/(?P[0-9a-f]+)/raw/', 'revision-raw-message')
@api_doc('/revision/raw/', tags=['hidden'], handle_response=True)
def api_revision_raw_message(request, sha1_git):
"""Return the raw data of the message of revision identified by sha1_git
"""
raw = service.lookup_revision_message(sha1_git)
response = HttpResponse(raw['message'],
content_type='application/octet-stream')
response['Content-disposition'] = \
'attachment;filename=rev_%s_raw' % sha1_git
return response
@api_route(r'/revision/(?P[0-9a-f]+)/directory/',
'revision-directory')
@api_route(r'/revision/(?P[0-9a-f]+)/directory/(?P.+)/',
'revision-directory')
@api_doc('/revision/directory/')
def api_revision_directory(request, sha1_git,
dir_path=None,
with_data=False):
"""
.. http:get:: /api/1/revision/(sha1_git)/directory/[(path)/]
Get information about directory (entry) objects associated to revisions.
Each revision is associated to a single "root" directory.
This endpoint behaves like :http:get:`/api/1/directory/(sha1_git)/[(path)/]`,
but operates on the root directory associated to a given revision.
:param string sha1_git: hexadecimal representation of the revision *sha1_git* identifier
:param string path: optional parameter to get information about the directory entry
pointed by that relative path
:reqheader Accept: the requested response content type,
either *application/json* (default) or *application/yaml*
:resheader Content-Type: this depends on :http:header:`Accept` header of request
:>json array content: directory entries as returned by :http:get:`/api/1/directory/(sha1_git)/[(path)/]`
:>json string path: path of directory from the revision root one
:>json string revision: the unique revision identifier
:>json string type: the type of the directory
**Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options`
:statuscode 200: no error
:statuscode 400: an invalid *sha1_git* value has been provided
:statuscode 404: requested revision can not be found in the SWH archive
**Example:**
.. parsed-literal::
:swh_web_api:`revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/directory/`
""" # noqa
return _revision_directory_by({'sha1_git': sha1_git},
dir_path, request.path,
with_data=with_data)
@api_route(r'/revision/(?P[0-9a-f]+)/log/', 'revision-log')
@api_route(r'/revision/(?P[0-9a-f]+)'
r'/prev/(?P[0-9a-f/]+)/log/',
'revision-log')
@api_doc('/revision/log/')
def api_revision_log(request, sha1_git, prev_sha1s=None):
"""
.. http:get:: /api/1/revision/(sha1_git)[/prev/(prev_sha1s)]/log/
Get a list of all SWH revisions heading to a given one, in other words show the commit log.
:param string sha1_git: hexadecimal representation of the revision *sha1_git* identifier
:param string prev_sha1s: optional parameter representing the navigation breadcrumbs
(descendant revisions previously visited). If multiple values, use / as delimiter.
If provided, revisions information will be added at the beginning of the returned list.
:query int per_page: number of elements in the returned list, for pagination purpose
:reqheader Accept: the requested response content type,
either *application/json* (default) or *application/yaml*
:resheader Content-Type: this depends on :http:header:`Accept` header of request
:resheader Link: indicates that a subsequent result page is available and contains
the url pointing to it
:>jsonarr object author: information about the author of the revision
:>jsonarr string author_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the author of the revision
:>jsonarr object committer: information about the committer of the revision
:>jsonarr string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the committer of the revision
:>jsonarr string committer_date: ISO representation of the commit date (in UTC)
:>jsonarr string date: ISO representation of the revision date (in UTC)
:>jsonarr string directory: the unique identifier that revision points to
:>jsonarr string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]`
to get information about the directory associated to the revision
:>jsonarr string id: the revision unique identifier
:>jsonarr boolean merge: whether or not the revision corresponds to a merge commit
:>jsonarr string message: the message associated to the revision
:>jsonarr array parents: the parents of the revision, i.e. the previous revisions
that head directly to it, each entry of that array contains an unique parent
revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/`
- to get more informations about it
+ to get more information about it
:>jsonarr string type: the type of the revision
**Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options`
:statuscode 200: no error
:statuscode 400: an invalid *sha1_git* value has been provided
:statuscode 404: requested revision can not be found in the SWH archive
**Example:**
.. parsed-literal::
:swh_web_api:`revision/e1a315fa3fa734e2a6154ed7b5b9ae0eb8987aad/log/`
""" # noqa
result = {}
per_page = int(request.query_params.get('per_page', '10'))
def lookup_revision_log_with_limit(s, limit=per_page+1):
return service.lookup_revision_log(s, limit)
error_msg = 'Revision with sha1_git %s not found.' % sha1_git
rev_get = api_lookup(lookup_revision_log_with_limit, sha1_git,
notfound_msg=error_msg,
enrich_fn=utils.enrich_revision)
nb_rev = len(rev_get)
if nb_rev == per_page+1:
rev_backward = rev_get[:-1]
new_last_sha1 = rev_get[-1]['id']
query_params = {}
if request.query_params.get('per_page'):
query_params['per_page'] = per_page
result['headers'] = {
'link-next': reverse('revision-log',
kwargs={'sha1_git': new_last_sha1},
query_params=query_params)
}
else:
rev_backward = rev_get
if not prev_sha1s: # no nav breadcrumbs, so we're done
revisions = rev_backward
else:
rev_forward_ids = prev_sha1s.split('/')
rev_forward = api_lookup(
service.lookup_revision_multiple, rev_forward_ids,
notfound_msg=error_msg,
enrich_fn=utils.enrich_revision)
revisions = rev_forward + rev_backward
result.update({
'results': revisions
})
return result
diff --git a/swh/web/api/views/stat.py b/swh/web/api/views/stat.py
index 0c6de479..3c3884e0 100644
--- a/swh/web/api/views/stat.py
+++ b/swh/web/api/views/stat.py
@@ -1,55 +1,55 @@
# Copyright (C) 2015-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU Affero General Public License version 3, or any later version
# See top-level LICENSE file for more information
from swh.web.common import service
from swh.web.api.apidoc import api_doc
from swh.web.api.apiurls import api_route
@api_route(r'/stat/counters/', 'stat-counters')
@api_doc('/stat/counters/', noargs=True)
def api_stats(request):
"""
.. http:get:: /api/1/stat/counters/
Get statistics about the content of the archive.
:>json number content: current number of content objects (aka files) in the SWH archive
:>json number directory: current number of directory objects in the SWH archive
:>json number directory_entry_dir: current number of SWH directory entries
pointing to others SWH directories in the SWH archive
:>json number directory_entry_file: current number of SWH directory entries
pointing to SWH content objects in the SWH archive
:>json number directory_entry_rev: current number of SWH directory entries
pointing to SWH revision objects (e.g. git submodules) in the SWH archive
:>json number entity: current number of SWH entities (a SWH entity is either
a *group_of_entities*, a *group_of_persons*, a *project*, a *person*, an *organization*,
or a *hosting* service) in the SWH archive
:>json number occurrence: current number of SWH occurrences (an occurrence may be assimilated
to a branch found during a SWH crawl of a repository) in the SWH archive
:>json number origin: current number of SWH origins (an origin is a "place" where code
source can be found, e.g. a git repository, a tarball, ...) in the SWH archive
- :>json number person: current number of SWH persons (code source authors or commiters)
+ :>json number person: current number of SWH persons (code source authors or committers)
in the SWH archive
:>json number release: current number of SWH releases objects in the SWH archive
:>json number revision: current number of SWH revision objects (aka commits) in the SWH archive
:>json number skipped_content: current number of content objects (aka files) which where
not inserted in the SWH archive
:reqheader Accept: the requested response content type,
either *application/json* (default) or *application/yaml*
:resheader Content-Type: this depends on :http:header:`Accept` header of request
**Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options`
:statuscode 200: no error
**Example:**
.. parsed-literal::
:swh_web_api:`stat/counters/`
""" # noqa
return service.stat_counters()
diff --git a/swh/web/browse/utils.py b/swh/web/browse/utils.py
index 70d658d9..ce2074da 100644
--- a/swh/web/browse/utils.py
+++ b/swh/web/browse/utils.py
@@ -1,1093 +1,1093 @@
# Copyright (C) 2017-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU Affero General Public License version 3, or any later version
# See top-level LICENSE file for more information
import base64
import magic
import math
import pypandoc
import stat
from django.core.cache import cache
from django.utils.safestring import mark_safe
from importlib import reload
from swh.web.common import highlightjs, service
from swh.web.common.exc import NotFoundExc, http_status_code_message
from swh.web.common.utils import (
reverse, format_utc_iso_date, parse_timestamp,
get_origin_visits, get_swh_persistent_id
)
from swh.web.config import get_config
def get_directory_entries(sha1_git):
"""Function that retrieves the content of a SWH directory
from the SWH archive.
The directories entries are first sorted in lexicographical order.
Sub-directories and regular files are then extracted.
Args:
sha1_git: sha1_git identifier of the directory
Returns:
A tuple whose first member corresponds to the sub-directories list
and second member the regular files list
Raises:
NotFoundExc if the directory is not found
"""
cache_entry_id = 'directory_entries_%s' % sha1_git
cache_entry = cache.get(cache_entry_id)
if cache_entry:
return cache_entry
entries = list(service.lookup_directory(sha1_git))
entries = sorted(entries, key=lambda e: e['name'])
for entry in entries:
entry['perms'] = stat.filemode(entry['perms'])
dirs = [e for e in entries if e['type'] == 'dir']
files = [e for e in entries if e['type'] == 'file']
cache.set(cache_entry_id, (dirs, files))
return dirs, files
def get_mimetype_and_encoding_for_content(content):
"""Function that returns the mime type and the encoding associated to
a content buffer using the magic module under the hood.
Args:
content (bytes): a content buffer
Returns:
A tuple (mimetype, encoding), for instance ('text/plain', 'us-ascii'),
associated to the provided content.
"""
while True:
try:
magic_result = magic.detect_from_content(content)
mime_type = magic_result.mime_type
encoding = magic_result.encoding
break
except Exception as exc:
# workaround an issue with the magic module who can fail
# if detect_from_content is called multiple times in
# a short amount of time
reload(magic)
return mime_type, encoding
# maximum authorized content size in bytes for HTML display
# with code highlighting
content_display_max_size = get_config()['content_display_max_size']
def request_content(query_string, max_size=content_display_max_size,
raise_if_unavailable=True, reencode=True):
"""Function that retrieves a SWH content from the SWH archive.
Raw bytes content is first retrieved, then the content mime type.
If the mime type is not stored in the archive, it will be computed
using Python magic module.
Args:
query_string: a string of the form "[ALGO_HASH:]HASH" where
optional ALGO_HASH can be either *sha1*, *sha1_git*, *sha256*,
or *blake2s256* (default to *sha1*) and HASH the hexadecimal
representation of the hash value
max_size: the maximum size for a content to retrieve (default to 1MB,
no size limit if None)
Returns:
A tuple whose first member corresponds to the content raw bytes
and second member the content mime type
Raises:
NotFoundExc if the content is not found
"""
content_data = service.lookup_content(query_string)
filetype = None
language = None
license = None
# requests to the indexer db may fail so properly handle
# those cases in order to avoid content display errors
try:
filetype = service.lookup_content_filetype(query_string)
language = service.lookup_content_language(query_string)
license = service.lookup_content_license(query_string)
except Exception as e:
pass
mimetype = 'unknown'
encoding = 'unknown'
if filetype:
mimetype = filetype['mimetype']
encoding = filetype['encoding']
content_data['error_code'] = 200
content_data['error_message'] = ''
content_data['error_description'] = ''
if not max_size or content_data['length'] < max_size:
try:
content_raw = service.lookup_content_raw(query_string)
except Exception as e:
if raise_if_unavailable:
raise e
else:
content_data['raw_data'] = None
content_data['error_code'] = 404
content_data['error_description'] = \
'The bytes of the content are currently not available in the archive.' # noqa
content_data['error_message'] = \
http_status_code_message[content_data['error_code']]
else:
content_data['raw_data'] = content_raw['data']
if not filetype:
mimetype, encoding = \
get_mimetype_and_encoding_for_content(content_data['raw_data']) # noqa
# encode textual content to utf-8 if needed
if reencode and mimetype.startswith('text/'):
- # probably a malformed UTF-8 content, reencode it
+ # probably a malformed UTF-8 content, re-encode it
# by replacing invalid chars with a substitution one
if encoding == 'unknown-8bit':
content_data['raw_data'] = \
content_data['raw_data'].decode('utf-8', 'replace')\
.encode('utf-8')
elif 'ascii' not in encoding and encoding not in ['utf-8', 'binary']: # noqa
content_data['raw_data'] = \
content_data['raw_data'].decode(encoding, 'replace')\
.encode('utf-8')
elif reencode and mimetype.startswith('application/octet-stream'):
# file may detect an iso-8859-* encoded content as binary
# so try to decode it for display
encodings = ['iso-8859-%s' % i for i in range(1, 17)]
for encoding in encodings:
try:
content_data['raw_data'] = \
content_data['raw_data'].decode(encoding)\
.encode('utf-8')
except Exception:
pass
else:
break
else:
content_data['raw_data'] = None
content_data['mimetype'] = mimetype
content_data['encoding'] = encoding
if language:
content_data['language'] = language['lang']
else:
content_data['language'] = 'not detected'
if license:
content_data['licenses'] = ', '.join(license['facts'][0]['licenses'])
else:
content_data['licenses'] = 'not detected'
return content_data
_browsers_supported_image_mimes = set(['image/gif', 'image/png',
'image/jpeg', 'image/bmp',
'image/webp', 'image/svg',
'image/svg+xml'])
def prepare_content_for_display(content_data, mime_type, path):
"""Function that prepares a content for HTML display.
The function tries to associate a programming language to a
content in order to perform syntax highlighting client-side
using highlightjs. The language is determined using either
the content filename or its mime type.
If the mime type corresponds to an image format supported
by web browsers, the content will be encoded in base64
for displaying the image.
Args:
content_data (bytes): raw bytes of the content
mime_type (string): mime type of the content
path (string): path of the content including filename
Returns:
A dict containing the content bytes (possibly different from the one
provided as parameter if it is an image) under the key 'content_data
and the corresponding highlightjs language class under the
key 'language'.
"""
language = highlightjs.get_hljs_language_from_filename(path)
if not language:
language = highlightjs.get_hljs_language_from_mime_type(mime_type)
if not language:
language = 'nohighlight-swh'
elif mime_type.startswith('application/'):
mime_type = mime_type.replace('application/', 'text/')
if mime_type.startswith('image/'):
if mime_type in _browsers_supported_image_mimes:
content_data = base64.b64encode(content_data)
else:
content_data = None
if mime_type.startswith('image/svg'):
mime_type = 'image/svg+xml'
return {'content_data': content_data,
'language': language,
'mimetype': mime_type}
def get_origin_visit(origin_info, visit_ts=None, visit_id=None,
snapshot_id=None):
"""Function that returns information about a SWH visit for
a given origin.
The visit is retrieved from a provided timestamp.
The closest visit from that timestamp is selected.
Args:
origin_info (dict): a dict filled with origin information
(id, url, type)
visit_ts (int or str): an ISO date string or Unix timestamp to parse
Returns:
A dict containing the visit info as described below::
{'origin': 2,
'date': '2017-10-08T11:54:25.582463+00:00',
'metadata': {},
'visit': 25,
'status': 'full'}
"""
visits = get_origin_visits(origin_info)
if not visits:
raise NotFoundExc('No SWH visit associated to origin with'
' type %s and url %s!' % (origin_info['type'],
origin_info['url']))
if snapshot_id:
visit = [v for v in visits if v['snapshot'] == snapshot_id]
if len(visit) == 0:
raise NotFoundExc(
'Visit for snapshot with id %s for origin with type %s'
' and url %s not found!' % (snapshot_id, origin_info['type'],
origin_info['url']))
return visit[0]
if visit_id:
visit = [v for v in visits if v['visit'] == int(visit_id)]
if len(visit) == 0:
raise NotFoundExc(
'Visit with id %s for origin with type %s'
' and url %s not found!' % (visit_id, origin_info['type'],
origin_info['url']))
return visit[0]
if not visit_ts:
# returns the latest full visit when no timestamp is provided
for v in reversed(visits):
if v['status'] == 'full':
return v
return visits[-1]
parsed_visit_ts = math.floor(parse_timestamp(visit_ts).timestamp())
visit_idx = None
for i, visit in enumerate(visits):
ts = math.floor(parse_timestamp(visit['date']).timestamp())
if i == 0 and parsed_visit_ts <= ts:
return visit
elif i == len(visits) - 1:
if parsed_visit_ts >= ts:
return visit
else:
next_ts = math.floor(
parse_timestamp(visits[i+1]['date']).timestamp())
if parsed_visit_ts >= ts and parsed_visit_ts < next_ts:
if (parsed_visit_ts - ts) < (next_ts - parsed_visit_ts):
visit_idx = i
break
else:
visit_idx = i+1
break
if visit_idx:
visit = visits[visit_idx]
while visit_idx < len(visits) - 1 and \
visit['date'] == visits[visit_idx+1]['date']:
visit_idx = visit_idx + 1
visit = visits[visit_idx]
return visit
else:
raise NotFoundExc(
'Visit with timestamp %s for origin with type %s and url %s not found!' % # noqa
(visit_ts, origin_info['type'], origin_info['url']))
def get_snapshot_content(snapshot_id):
"""Returns the lists of branches and releases
associated to a swh snapshot.
That list is put in cache in order to speedup the navigation
in the swh-web/browse ui.
Args:
snapshot_id (str): hexadecimal representation of the snapshot
identifier
Returns:
A tuple with two members. The first one is a list of dict describing
the snapshot branches. The second one is a list of dict describing the
snapshot releases.
Raises:
NotFoundExc if the snapshot does not exist
"""
cache_entry_id = 'swh_snapshot_%s' % snapshot_id
cache_entry = cache.get(cache_entry_id)
if cache_entry:
return cache_entry['branches'], cache_entry['releases']
branches = []
releases = []
if snapshot_id:
revision_ids = []
releases_ids = []
snapshot = service.lookup_snapshot(snapshot_id)
snapshot_branches = snapshot['branches']
for key in sorted(snapshot_branches.keys()):
if not snapshot_branches[key]:
continue
if snapshot_branches[key]['target_type'] == 'revision':
branches.append({'name': key,
'revision': snapshot_branches[key]['target']})
revision_ids.append(snapshot_branches[key]['target'])
elif snapshot_branches[key]['target_type'] == 'release':
releases_ids.append(snapshot_branches[key]['target'])
releases_info = service.lookup_release_multiple(releases_ids)
for release in releases_info:
releases.append({'name': release['name'],
'date': format_utc_iso_date(release['date']),
'id': release['id'],
'message': release['message'],
'target_type': release['target_type'],
'target': release['target']})
revision_ids.append(release['target'])
revisions = service.lookup_revision_multiple(revision_ids)
branches_to_remove = []
for idx, revision in enumerate(revisions):
if idx < len(branches):
if revision:
branches[idx]['directory'] = revision['directory']
branches[idx]['date'] = format_utc_iso_date(revision['date']) # noqa
branches[idx]['message'] = revision['message']
else:
branches_to_remove.append(branches[idx])
else:
rel_idx = idx - len(branches)
if revision:
releases[rel_idx]['directory'] = revision['directory']
for b in branches_to_remove:
branches.remove(b)
cache.set(cache_entry_id, {'branches': branches, 'releases': releases})
return branches, releases
def get_origin_visit_snapshot(origin_info, visit_ts=None, visit_id=None,
snapshot_id=None):
"""Returns the lists of branches and releases
associated to a swh origin for a given visit.
The visit is expressed by a timestamp. In the latter case,
the closest visit from the provided timestamp will be used.
If no visit parameter is provided, it returns the list of branches
found for the latest visit.
That list is put in cache in order to speedup the navigation
in the swh-web/browse ui.
Args:
origin_info (dict): a dict filled with origin information
(id, url, type)
visit_ts (int or str): an ISO date string or Unix timestamp to parse
visit_id (int): optional visit id for desambiguation in case
several visits have the same timestamp
Returns:
A tuple with two members. The first one is a list of dict describing
the origin branches for the given visit.
The second one is a list of dict describing the origin releases
for the given visit.
Raises:
NotFoundExc if the origin or its visit are not found
"""
visit_info = get_origin_visit(origin_info, visit_ts, visit_id, snapshot_id)
return get_snapshot_content(visit_info['snapshot'])
def gen_link(url, link_text=None, link_attrs={}):
"""
Utility function for generating an HTML link to insert
in Django templates.
Args:
url (str): an url
link_text (str): optional text for the produced link,
if not provided the url will be used
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form 'link_text'
"""
attrs = ' '
for k, v in link_attrs.items():
attrs += '%s="%s" ' % (k, v)
if not link_text:
link_text = url
link = '%s' % (attrs, url, link_text)
return mark_safe(link)
def gen_person_link(person_id, person_name, snapshot_context=None,
link_attrs={}):
"""
Utility function for generating a link to a SWH person HTML view
to insert in Django templates.
Args:
person_id (int): a SWH person id
person_name (str): the associated person name
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form 'person_name'
"""
query_params = None
if snapshot_context and snapshot_context['origin_info']:
origin_info = snapshot_context['origin_info']
query_params = {'origin_type': origin_info['type'],
'origin': origin_info['url']}
if 'timestamp' in snapshot_context['url_args']:
query_params['timestamp'] = \
snapshot_context['url_args']['timestamp']
if 'visit_id' in snapshot_context['query_params']:
query_params['visit_id'] = \
snapshot_context['query_params']['visit_id']
elif snapshot_context:
query_params = {'snapshot_id': snapshot_context['snapshot_id']}
person_url = reverse('browse-person', kwargs={'person_id': person_id},
query_params=query_params)
return gen_link(person_url, person_name, link_attrs)
def gen_revision_link(revision_id, shorten_id=False, snapshot_context=None,
link_text=None, link_attrs={}):
"""
Utility function for generating a link to a SWH revision HTML view
to insert in Django templates.
Args:
revision_id (str): a SWH revision id
- shorten_id (boolean): wheter to shorten the revision id to 7
+ shorten_id (boolean): whether to shorten the revision id to 7
characters for the link text
snapshot_context (dict): if provided, generate snapshot-dependent
browsing link
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form 'revision_id'
"""
query_params = None
if snapshot_context and snapshot_context['origin_info']:
origin_info = snapshot_context['origin_info']
origin_type = snapshot_context['origin_type']
query_params = {'origin_type': origin_type,
'origin': origin_info['url']}
if 'timestamp' in snapshot_context['url_args']:
query_params['timestamp'] = \
snapshot_context['url_args']['timestamp']
if 'visit_id' in snapshot_context['query_params']:
query_params['visit_id'] = \
snapshot_context['query_params']['visit_id']
elif snapshot_context:
query_params = {'snapshot_id': snapshot_context['snapshot_id']}
revision_url = reverse('browse-revision',
kwargs={'sha1_git': revision_id},
query_params=query_params)
if shorten_id:
return gen_link(revision_url, revision_id[:7], link_attrs)
else:
if not link_text:
link_text = revision_id
return gen_link(revision_url, link_text, link_attrs)
def gen_origin_link(origin_info, link_attrs={}):
"""
Utility function for generating a link to a SWH origin HTML view
to insert in Django templates.
Args:
origin_info (dict): a dicted filled with origin information
(id, type, url)
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form 'Origin: origin_url'
""" # noqa
origin_browse_url = reverse('browse-origin',
kwargs={'origin_type': origin_info['type'],
'origin_url': origin_info['url']})
return gen_link(origin_browse_url,
'Origin: ' + origin_info['url'], link_attrs)
def gen_directory_link(sha1_git, link_text=None, link_attrs={}):
"""
Utility function for generating a link to a SWH directory HTML view
to insert in Django templates.
Args:
sha1_git (str): directory identifier
link_text (str): optional text for the generated link
(the generated url will be used by default)
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form 'link_text'
"""
directory_url = reverse('browse-directory',
kwargs={'sha1_git': sha1_git})
if not link_text:
link_text = directory_url
return gen_link(directory_url, link_text, link_attrs)
def gen_snapshot_link(snapshot_id, link_text=None, link_attrs={}):
"""
Utility function for generating a link to a SWH snapshot HTML view
to insert in Django templates.
Args:
snapshot_id (str): snapshot identifier
link_text (str): optional text for the generated link
(the generated url will be used by default)
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form 'link_text'
"""
snapshot_url = reverse('browse-snapshot',
kwargs={'snapshot_id': snapshot_id})
if not link_text:
link_text = snapshot_url
return gen_link(snapshot_url, link_text, link_attrs)
def gen_snapshot_directory_link(snapshot_context, revision_id=None,
link_text=None, link_attrs={}):
"""
Utility function for generating a link to a SWH directory HTML view
in the context of a snapshot to insert in Django templates.
Args:
snapshot_context (dict): the snapshot information
revision_id (str): optional revision identifier in order
to use the associated directory
link_text (str): optional text to use for the generated link
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form
'origin_directory_view_url'
"""
query_params = {'revision': revision_id}
if snapshot_context['origin_info']:
origin_info = snapshot_context['origin_info']
url_args = {'origin_type': origin_info['type'],
'origin_url': origin_info['url']}
if 'timestamp' in snapshot_context['url_args']:
url_args['timestamp'] = \
snapshot_context['url_args']['timestamp']
if 'visit_id' in snapshot_context['query_params']:
query_params['visit_id'] = \
snapshot_context['query_params']['visit_id']
directory_url = reverse('browse-origin-directory',
kwargs=url_args,
query_params=query_params)
else:
url_args = {'snapshot_id': snapshot_context['snapshot_id']}
directory_url = reverse('browse-snapshot-directory',
kwargs=url_args,
query_params=query_params)
if not link_text:
link_text = directory_url
return gen_link(directory_url, link_text, link_attrs)
def gen_content_link(sha1_git, link_text=None, link_attrs={}):
"""
Utility function for generating a link to a SWH content HTML view
to insert in Django templates.
Args:
sha1_git (str): content identifier
link_text (str): optional text for the generated link
(the generated url will be used by default)
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form 'link_text'
"""
content_url = reverse('browse-content',
kwargs={'query_string': 'sha1_git:' + sha1_git})
if not link_text:
link_text = content_url
return gen_link(content_url, link_text, link_attrs)
def get_revision_log_url(revision_id, snapshot_context=None):
"""
Utility function for getting the URL for a SWH revision log HTML view
(possibly in the context of an origin).
Args:
revision_id (str): revision identifier the history heads to
snapshot_context (dict): if provided, generate snapshot-dependent
browsing link
Returns:
The SWH revision log view URL
"""
query_params = {'revision': revision_id}
if snapshot_context and snapshot_context['origin_info']:
origin_info = snapshot_context['origin_info']
url_args = {'origin_type': origin_info['type'],
'origin_url': origin_info['url']}
if 'timestamp' in snapshot_context['url_args']:
url_args['timestamp'] = \
snapshot_context['url_args']['timestamp']
if 'visit_id' in snapshot_context['query_params']:
query_params['visit_id'] = \
snapshot_context['query_params']['visit_id']
revision_log_url = reverse('browse-origin-log',
kwargs=url_args,
query_params=query_params)
elif snapshot_context:
url_args = {'snapshot_id': snapshot_context['snapshot_id']}
revision_log_url = reverse('browse-snapshot-log',
kwargs=url_args,
query_params=query_params)
else:
revision_log_url = reverse('browse-revision-log',
kwargs={'sha1_git': revision_id})
return revision_log_url
def gen_revision_log_link(revision_id, snapshot_context=None, link_text=None,
link_attrs={}):
"""
Utility function for generating a link to a SWH revision log HTML view
(possibly in the context of an origin) to insert in Django templates.
Args:
revision_id (str): revision identifier the history heads to
snapshot_context (dict): if provided, generate snapshot-dependent
browsing link
link_text (str): optional text to use for the generated link
link_attrs (dict): optional attributes (e.g. class)
to add to the link
Returns:
An HTML link in the form
'link_text'
"""
revision_log_url = get_revision_log_url(revision_id, snapshot_context)
if not link_text:
link_text = revision_log_url
return gen_link(revision_log_url, link_text, link_attrs)
def _format_log_entries(revision_log, per_page, snapshot_context=None):
revision_log_data = []
for i, log in enumerate(revision_log):
if i == per_page:
break
author_name = 'None'
author_link = 'None'
if log['author']:
author_name = log['author']['name'] or log['author']['fullname']
author_link = gen_person_link(log['author']['id'], author_name,
snapshot_context)
revision_log_data.append(
{'author': author_link,
'revision': gen_revision_link(log['id'], True, snapshot_context),
'message': log['message'],
'date': format_utc_iso_date(log['date']),
'directory': log['directory']})
return revision_log_data
def prepare_revision_log_for_display(revision_log, per_page, revs_breadcrumb,
snapshot_context=None):
"""
Utility functions that process raw revision log data for HTML display.
Its purpose is to:
* add links to relevant SWH browse views
* format date in human readable format
* truncate the message log
It also computes the data needed to generate the links for navigating back
and forth in the history log.
Args:
revision_log (list): raw revision log as returned by the SWH web api
per_page (int): number of log entries per page
revs_breadcrumb (str): breadcrumbs of revisions navigated so far,
in the form 'rev1[/rev2/../revN]'. Each revision corresponds to
the first one displayed in the HTML view for history log.
snapshot_context (dict): if provided, generate snapshot-dependent
browsing link
"""
current_rev = revision_log[0]['id']
next_rev = None
prev_rev = None
next_revs_breadcrumb = None
prev_revs_breadcrumb = None
if len(revision_log) == per_page + 1:
prev_rev = revision_log[-1]['id']
prev_rev_bc = current_rev
if snapshot_context:
prev_rev_bc = prev_rev
if revs_breadcrumb:
revs = revs_breadcrumb.split('/')
next_rev = revs[-1]
if len(revs) > 1:
next_revs_breadcrumb = '/'.join(revs[:-1])
if len(revision_log) == per_page + 1:
prev_revs_breadcrumb = revs_breadcrumb + '/' + prev_rev_bc
else:
prev_revs_breadcrumb = prev_rev_bc
return {'revision_log_data': _format_log_entries(revision_log, per_page,
snapshot_context),
'prev_rev': prev_rev,
'prev_revs_breadcrumb': prev_revs_breadcrumb,
'next_rev': next_rev,
'next_revs_breadcrumb': next_revs_breadcrumb}
# list of origin types that can be found in the swh archive
# TODO: retrieve it dynamically in an efficient way instead
# of hardcoding it
_swh_origin_types = ['git', 'svn', 'deb', 'hg', 'ftp', 'deposit']
def get_origin_info(origin_url, origin_type=None):
"""
Get info about a SWH origin.
Its main purpose is to automatically find an origin type
when it is not provided as parameter.
Args:
origin_url (str): complete url of a SWH origin
- origin_type (str): optionnal origin type
+ origin_type (str): optional origin type
Returns:
A dict with the following entries:
* type: the origin type
* url: the origin url
* id: the SWH internal id of the origin
"""
if origin_type:
return service.lookup_origin({'type': origin_type,
'url': origin_url})
else:
for origin_type in _swh_origin_types:
try:
origin_info = service.lookup_origin({'type': origin_type,
'url': origin_url})
return origin_info
except Exception:
pass
raise NotFoundExc('Origin with url %s not found!' % origin_url)
def get_snapshot_context(snapshot_id=None, origin_type=None, origin_url=None,
timestamp=None, visit_id=None):
"""
Utility function to compute relevant information when navigating
the SWH archive in a snapshot context. The snapshot is either
referenced by its id or it will be retrieved from an origin visit.
Args:
snapshot_id (str): hexadecimal representation of a snapshot identifier,
all other parameters will be ignored if it is provided
origin_type (str): the origin type (git, svn, deposit, ...)
origin_url (str): the origin_url (e.g. https://github.com/(user)/(repo)/)
timestamp (str): a datetime string for retrieving the closest
SWH visit of the origin
visit_id (int): optional visit id for disambiguation in case
of several visits with the same timestamp
Returns:
A dict with the following entries:
* origin_info: dict containing origin information
* visit_info: dict containing SWH visit information
* branches: the list of branches for the origin found
during the visit
* releases: the list of releases for the origin found
during the visit
* origin_browse_url: the url to browse the origin
* origin_branches_url: the url to browse the origin branches
* origin_releases_url': the url to browse the origin releases
* origin_visit_url: the url to browse the snapshot of the origin
found during the visit
- * url_args: dict containg url arguments to use when browsing in
+ * url_args: dict containing url arguments to use when browsing in
the context of the origin and its visit
""" # noqa
origin_info = None
visit_info = None
url_args = None
query_params = {}
branches = []
releases = []
browse_url = None
visit_url = None
branches_url = None
releases_url = None
swh_type = 'snapshot'
if origin_url:
swh_type = 'origin'
origin_info = get_origin_info(origin_url, origin_type)
visit_info = get_origin_visit(origin_info, timestamp, visit_id,
snapshot_id)
visit_info['fmt_date'] = format_utc_iso_date(visit_info['date'])
snapshot_id = visit_info['snapshot']
# provided timestamp is not necessarily equals to the one
# of the retrieved visit, so get the exact one in order
# use it in the urls generated below
if timestamp:
timestamp = visit_info['date']
branches, releases = \
get_origin_visit_snapshot(origin_info, timestamp, visit_id,
snapshot_id)
url_args = {'origin_type': origin_type,
'origin_url': origin_info['url']}
query_params = {'visit_id': visit_id}
browse_url = reverse('browse-origin-visits',
kwargs=url_args)
if timestamp:
url_args['timestamp'] = format_utc_iso_date(timestamp,
'%Y-%m-%dT%H:%M:%S')
visit_url = reverse('browse-origin-directory',
kwargs=url_args,
query_params=query_params)
visit_info['url'] = visit_url
branches_url = reverse('browse-origin-branches',
kwargs=url_args,
query_params=query_params)
releases_url = reverse('browse-origin-releases',
kwargs=url_args,
query_params=query_params)
elif snapshot_id:
branches, releases = get_snapshot_content(snapshot_id)
url_args = {'snapshot_id': snapshot_id}
browse_url = reverse('browse-snapshot',
kwargs=url_args)
branches_url = reverse('browse-snapshot-branches',
kwargs=url_args)
releases_url = reverse('browse-snapshot-releases',
kwargs=url_args)
releases = list(reversed(releases))
return {
'swh_type': swh_type,
'snapshot_id': snapshot_id,
'origin_info': origin_info,
# keep track if the origin type was provided as url argument
'origin_type': origin_type,
'visit_info': visit_info,
'branches': branches,
'releases': releases,
'branch': None,
'release': None,
'browse_url': browse_url,
'branches_url': branches_url,
'releases_url': releases_url,
'url_args': url_args,
'query_params': query_params
}
# list of common readme names ordered by preference
# (lower indices have higher priority)
_common_readme_names = [
"readme.markdown",
"readme.md",
"readme.rst",
"readme.txt",
"readme"
]
def get_readme_to_display(readmes):
"""
Process a list of readme files found in a directory
in order to find the adequate one to display.
Args:
readmes: a list of dict where keys are readme file names and values
are readme sha1s
Returns:
A tuple (readme_name, readme_sha1)
"""
readme_name = None
readme_url = None
readme_sha1 = None
readme_html = None
lc_readmes = {k.lower(): {'orig_name': k, 'sha1': v}
for k, v in readmes.items()}
# look for readme names according to the preference order
# defined by the _common_readme_names list
for common_readme_name in _common_readme_names:
if common_readme_name in lc_readmes:
readme_name = lc_readmes[common_readme_name]['orig_name']
readme_sha1 = lc_readmes[common_readme_name]['sha1']
readme_url = reverse('browse-content-raw',
kwargs={'query_string': readme_sha1})
break
# otherwise pick the first readme like file if any
if not readme_name and len(readmes.items()) > 0:
readme_name = next(iter(readmes))
readme_sha1 = readmes[readme_name]
readme_url = reverse('browse-content-raw',
kwargs={'query_string': readme_sha1})
# convert rst README to html server side as there is
# no viable solution to perform that task client side
if readme_name and readme_name.endswith('.rst'):
cache_entry_id = 'readme_%s' % readme_sha1
cache_entry = cache.get(cache_entry_id)
if cache_entry:
readme_html = cache_entry
else:
try:
rst_doc = request_content(readme_sha1)
readme_html = pypandoc.convert_text(rst_doc['raw_data'],
'html', format='rst')
cache.set(cache_entry_id, readme_html)
except Exception:
readme_html = 'Readme bytes are not available'
return readme_name, readme_url, readme_html
def get_swh_persistent_ids(swh_objects, snapshot_context=None):
"""
Returns a list of dict containing info related to persistent
identifiers of swh objects.
Args:
swh_objects (list): a list of dict with the following keys:
* type: swh object type (content/directory/release/revision/snapshot)
* id: swh object id
snapshot_context (dict): optional parameter describing the snapshot in which
the object has been found
Returns:
list: a list of dict with the following keys:
* object_type: the swh object type (content/directory/release/revision/snapshot)
* object_icon: the swh object icon to use in HTML views
* swh_id: the computed swh object persistent identifier
* swh_id_url: the url resolving the persistent identifier
* show_options: boolean indicating if the persistent id options must
be displayed in persistent ids HTML view
""" # noqa
swh_ids = []
for swh_object in swh_objects:
swh_id = get_swh_persistent_id(swh_object['type'], swh_object['id'])
show_options = swh_object['type'] == 'content' or \
(snapshot_context and snapshot_context['origin_info'] is not None)
object_icon = mark_safe('')
if swh_object['type'] == 'directory':
object_icon = mark_safe('')
elif swh_object['type'] == 'release':
object_icon = mark_safe('')
elif swh_object['type'] == 'revision':
object_icon = mark_safe('') # noqa
elif swh_object['type'] == 'snapshot':
object_icon = mark_safe('')
swh_ids.append({
'object_type': swh_object['type'],
'object_icon': object_icon,
'swh_id': swh_id,
'swh_id_url': reverse('browse-swh-id',
kwargs={'swh_id': swh_id}),
'show_options': show_options
})
return swh_ids
diff --git a/swh/web/common/highlightjs.py b/swh/web/common/highlightjs.py
index bfc8b489..feb6941c 100644
--- a/swh/web/common/highlightjs.py
+++ b/swh/web/common/highlightjs.py
@@ -1,301 +1,301 @@
# Copyright (C) 2017-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU Affero General Public License version 3, or any later version
# See top-level LICENSE file for more information
from pygments.lexers import (
get_all_lexers,
get_lexer_for_filename
)
# set of languages ids that can be highlighted
# by highlight.js library
_hljs_languages = set([
'1c', 'abnf', 'accesslog', 'actionscript',
'ada', 'apache', 'applescript', 'arduino',
'armasm', 'asciidoc', 'aspectj', 'autohotkey',
'autoit', 'avrasm', 'awk', 'axapta', 'bash',
'basic', 'bnf', 'brainfuck', 'cal', 'capnproto',
'ceylon', 'clean', 'clojure', 'clojure-repl',
'cmake', 'coffeescript', 'coq', 'cos', 'cpp',
'crmsh', 'crystal', 'cs', 'csp', 'css', 'dart',
'delphi', 'diff', 'django', 'd', 'dns', 'dockerfile',
'dos', 'dsconfig', 'dts', 'dust', 'ebnf', 'elixir',
'elm', 'erb', 'erlang', 'erlang-repl', 'excel',
'fix', 'flix', 'fortran', 'fsharp', 'gams', 'gauss',
'gcode', 'gherkin', 'glsl', 'go', 'golo', 'gradle',
'groovy', 'haml', 'handlebars', 'haskell', 'haxe',
'hsp', 'htmlbars', 'http', 'hy', 'inform7',
'ini', 'irpf90', 'java', 'javascript', 'jboss-cli',
'json', 'julia', 'julia-repl', 'kotlin', 'lasso',
'ldif', 'leaf', 'less', 'lisp', 'livecodeserver',
'livescript', 'llvm', 'lsl', 'lua', 'makefile',
'markdown', 'mathematica', 'matlab', 'maxima',
'mel', 'mercury', 'mipsasm', 'mizar', 'mojolicious',
'monkey', 'moonscript', 'n1ql', 'nginx', 'nimrod',
'nix', 'nsis', 'objectivec', 'ocaml', 'openscad',
'oxygene', 'parser3', 'perl', 'pf', 'php', 'pony',
'powershell', 'processing', 'profile', 'prolog',
'protobuf', 'puppet', 'purebasic', 'python', 'q',
'qml', 'rib', 'r', 'roboconf', 'routeros', 'rsl',
'ruby', 'ruleslanguage', 'rust', 'scala', 'scheme',
'scilab', 'scss', 'shell', 'smali', 'smalltalk',
'sml', 'sqf', 'sql', 'stan', 'stata', 'step21',
'stylus', 'subunit', 'swift', 'taggerscript',
'tap', 'tcl', 'tex', 'thrift', 'tp', 'twig',
'typescript', 'vala', 'vbnet', 'vbscript-html',
'vbscript', 'verilog', 'vhdl', 'vim', 'x86asm',
'xl', 'xml', 'xquery', 'yaml', 'zephir',
])
# languages aliases defined in highlight.js
_hljs_languages_aliases = {
'ado': 'stata',
'adoc': 'asciidoc',
'ahk': 'autohotkey',
'apacheconf': 'apache',
'arm': 'armasm',
'as': 'actionscript',
'atom': 'xml',
'bat': 'dos',
'bf': 'brainfuck',
'bind': 'dns',
'c': 'cpp',
'c++': 'cpp',
'capnp': 'capnproto',
'cc': 'cpp',
'clean': 'clean',
'clj': 'clojure',
'cls': 'cos',
'cmake.in': 'cmake',
'cmd': 'dos',
'coffee': 'coffeescript',
'console': 'shell',
'cos': 'cos',
'cr': 'crystal',
'craftcms': 'twig',
'crm': 'crmsh',
'csharp': 'cs',
'cson': 'coffeescript',
'dcl': 'clean',
'desktop': 'ini',
'dfm': 'delphi',
'do': 'stata',
'docker': 'dockerfile',
'dpr': 'delphi',
'dst': 'dust',
'el': 'lisp',
'erl': 'erlang',
'f90': 'fortran',
'f95': 'fortran',
'feature': 'gherkin',
'freepascal': 'delphi',
'fs': 'fsharp',
'gemspec': 'ruby',
'gms': 'gams',
'golang': 'go',
'graph': 'roboconf',
'gss': 'gauss',
'gyp': 'python',
'h': 'cpp',
'h++': 'cpp',
'hbs': 'handlebars',
'hpp': 'cpp',
'hs': 'haskell',
'html': 'xml',
'html.handlebars': 'handlebars',
'html.hbs': 'handlebars',
'https': 'http',
'hx': 'haxe',
'hylang': 'hy',
'i7': 'inform7',
'iced': 'coffeescript',
'icl': 'clean',
'instances': 'roboconf',
'ipynb': 'json',
'irb': 'ruby',
'jinja': 'django',
'js': 'javascript',
'jsp': 'java',
'jsx': 'javascript',
'k': 'q',
'kdb': 'q',
'lassoscript': 'lasso',
'lazarus': 'delphi',
'lfm': 'delphi',
'lpr': 'delphi',
'ls': 'livescript',
'm': 'objectivec',
'mak': 'makefile',
'md': 'markdown',
'mikrotik': 'routeros',
'mips': 'mipsasm',
'mk': 'makefile',
'mkd': 'markdown',
'mkdown': 'markdown',
'markdown': 'markdown',
'ml': 'ocaml',
'mm': 'objectivec',
'mma': 'mathematica',
'moo': 'mercury',
'moon': 'moonscript',
'nc': 'gcode',
'nginxconf': 'nginx',
'nim': 'nimrod',
'nixos': 'nix',
'obj-c': 'objectivec',
'objc': 'objectivec',
'osascript': 'applescript',
'p21': 'step21',
'pas': 'delphi',
'pascal': 'delphi',
'patch': 'diff',
'pb': 'purebasic',
'pbi': 'purebasic',
'pcmk': 'crmsh',
'pf.conf': 'pf',
'php3': 'php',
'php4': 'php',
'php5': 'php',
'php6': 'php',
'pl': 'perl',
'plist': 'xml',
'pm': 'perl',
'podspec': 'ruby',
'pp': 'puppet',
'ps': 'powershell',
'py': 'python',
'qrc': 'xml',
'qs': 'javascript',
'qt': 'qml',
'rb': 'ruby',
'routeros': 'routeros',
'rs': 'rust',
'rst': 'nohighlight-swh',
'rss': 'xml',
'ru': 'ruby',
'scad': 'openscad',
'sci': 'scilab',
'scpt': 'applescript',
'sh': 'bash',
'smali': 'smali',
'sqf': 'sqf',
'st': 'smalltalk',
'step': 'step21',
'stp': 'step21',
'styl': 'stylus',
'sv': 'verilog',
'svh': 'verilog',
'tao': 'xl',
'thor': 'ruby',
'tk': 'tcl',
'toml': 'ini',
'ui': 'xml',
'v': 'verilog',
'vb': 'vbnet',
'vbs': 'vbscript',
'wildfly-cli': 'jboss-cli',
'xhtml': 'xml',
'xjb': 'xml',
'xls': 'excel',
'xlsx': 'excel',
'xpath': 'xquery',
'xq': 'xquery',
'xsd': 'xml',
'xsl': 'xml',
'yaml': 'yaml',
'yml': 'yaml',
'zep': 'zephir',
'zone': 'dns',
'zsh': 'bash'
}
# dictionary mapping pygment lexers to hljs languages
_pygments_lexer_to_hljs_language = {}
# dictionary mapping mime types to hljs languages
_mime_type_to_hljs_language = {
'text/x-c': 'cpp',
'text/x-c++': 'cpp',
'text/x-msdos-batch': 'dos',
'text/x-lisp': 'lisp',
'text/x-shellscript': 'bash',
}
# function to fill the above dictionnaries
def _init_pygments_to_hljs_map():
if len(_pygments_lexer_to_hljs_language) == 0:
for lexer in get_all_lexers():
lexer_name = lexer[0]
lang_aliases = lexer[1]
lang_mime_types = lexer[3]
lang = None
for lang_alias in lang_aliases:
if lang_alias in _hljs_languages:
lang = lang_alias
_pygments_lexer_to_hljs_language[lexer_name] = lang_alias
break
if lang:
for lang_mime_type in lang_mime_types:
_mime_type_to_hljs_language[lang_mime_type] = lang
def get_hljs_language_from_filename(filename):
"""Function that tries to associate a language supported by highlight.js
from a filename.
Args:
filename: input filename
Returns:
- highlight.js language id or None if no correspondance has been found
+ highlight.js language id or None if no correspondence has been found
"""
_init_pygments_to_hljs_map()
if filename:
exts = filename.lower().split('.')
# check if file extension matches an hljs language
# also handle .ext.in cases
for ext in reversed(exts[-2:]):
if ext in _hljs_languages:
return ext
if ext in _hljs_languages_aliases:
return _hljs_languages_aliases[ext]
# otherwise use Pygments language database
lexer = None
# try to find a Pygment lexer
try:
lexer = get_lexer_for_filename(filename)
except Exception:
pass
- # if there is a correspondance between the lexer and an hljs
+ # if there is a correspondence between the lexer and an hljs
# language, return it
if lexer and lexer.name in _pygments_lexer_to_hljs_language:
return _pygments_lexer_to_hljs_language[lexer.name]
# otherwise, try to find a match between the file extensions
# associated to the lexer and the hljs language aliases
if lexer:
exts = [ext.replace('*.', '') for ext in lexer.filenames]
for ext in exts:
if ext in _hljs_languages_aliases:
return _hljs_languages_aliases[ext]
return None
def get_hljs_language_from_mime_type(mime_type):
"""Function that tries to associate a language supported by highlight.js
from a mime type.
Args:
mime_type: input mime type
Returns:
- highlight.js language id or None if no correspondance has been found
+ highlight.js language id or None if no correspondence has been found
"""
_init_pygments_to_hljs_map()
if mime_type and mime_type in _mime_type_to_hljs_language:
return _mime_type_to_hljs_language[mime_type]
return None
diff --git a/swh/web/settings/common.py b/swh/web/settings/common.py
index ef78f66f..d0fbf47f 100644
--- a/swh/web/settings/common.py
+++ b/swh/web/settings/common.py
@@ -1,225 +1,225 @@
# Copyright (C) 2017-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU Affero General Public License version 3, or any later version
# See top-level LICENSE file for more information
"""
Django common settings for swh-web.
"""
import os
from swh.web.config import get_config
swh_web_config = get_config()
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
PROJECT_DIR = os.path.dirname(os.path.abspath(__file__))
# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/1.11/howto/deployment/checklist/
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = swh_web_config['secret_key']
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = swh_web_config['debug']
DEBUG_PROPAGATE_EXCEPTIONS = swh_web_config['debug']
ALLOWED_HOSTS = ['127.0.0.1', 'localhost'] + swh_web_config['allowed_hosts']
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'swh.web.api',
'swh.web.browse',
'webpack_loader',
'django_js_reverse'
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
'swh.web.common.middlewares.ThrottlingHeadersMiddleware'
]
# Compress all assets (static ones and dynamically generated html)
-# served by django in a local development environement context.
+# served by django in a local development environment context.
# In a production environment, assets compression will be directly
# handled by web servers like apache or nginx.
if swh_web_config['debug']:
MIDDLEWARE.insert(0, 'django.middleware.gzip.GZipMiddleware')
ROOT_URLCONF = 'swh.web.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [os.path.join(PROJECT_DIR, "../templates")],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
'libraries': {
'swh_templatetags': 'swh.web.common.swh_templatetags',
},
},
},
]
WSGI_APPLICATION = 'swh.web.wsgi.application'
# Database
# https://docs.djangoproject.com/en/1.11/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(PROJECT_DIR, 'db.sqlite3'),
}
}
# Password validation
# https://docs.djangoproject.com/en/1.11/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator', # noqa
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator', # noqa
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator', # noqa
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator', # noqa
},
]
# Internationalization
# https://docs.djangoproject.com/en/1.11/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.11/howto/static-files/
STATIC_URL = '/static/'
STATICFILES_DIRS = [
os.path.join(PROJECT_DIR, "../static")
]
INTERNAL_IPS = ['127.0.0.1']
throttle_rates = {}
http_requests = ['GET', 'HEAD', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'PATCH']
throttling = swh_web_config['throttling']
for limiter_scope, limiter_conf in throttling['scopes'].items():
if 'default' in limiter_conf['limiter_rate']:
throttle_rates[limiter_scope] = limiter_conf['limiter_rate']['default']
# for backward compatibility
else:
throttle_rates[limiter_scope] = limiter_conf['limiter_rate']
# register sub scopes specific for HTTP request types
for http_request in http_requests:
if http_request in limiter_conf['limiter_rate']:
throttle_rates[limiter_scope + '_' + http_request.lower()] = \
limiter_conf['limiter_rate'][http_request]
REST_FRAMEWORK = {
'DEFAULT_RENDERER_CLASSES': (
'rest_framework.renderers.JSONRenderer',
'swh.web.api.renderers.YAMLRenderer',
'rest_framework.renderers.TemplateHTMLRenderer'
),
'DEFAULT_THROTTLE_CLASSES': (
'swh.web.common.throttling.SwhWebRateThrottle',
),
'DEFAULT_THROTTLE_RATES': throttle_rates
}
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'filters': {
'require_debug_false': {
'()': 'django.utils.log.RequireDebugFalse',
},
'require_debug_true': {
'()': 'django.utils.log.RequireDebugTrue',
},
},
'formatters': {
'verbose': {
'format': '[%(asctime)s] [%(levelname)s] %(request)s %(status_code)s', # noqa
'datefmt': "%d/%b/%Y %H:%M:%S"
},
},
'handlers': {
'console': {
'level': 'DEBUG',
'filters': ['require_debug_true'],
'class': 'logging.StreamHandler',
},
'file': {
'level': 'INFO',
'filters': ['require_debug_false'],
'class': 'logging.FileHandler',
'filename': os.path.join(swh_web_config['log_dir'], 'swh-web.log'),
'formatter': 'verbose'
},
},
'loggers': {
'django': {
'handlers': ['console', 'file'],
'level': 'DEBUG' if DEBUG else 'INFO',
'propagate': True,
},
'django.request': {
'handlers': ['file'],
'level': 'DEBUG' if DEBUG else 'INFO',
'propagate': False,
}
},
}
WEBPACK_LOADER = {
'DEFAULT': {
'CACHE': not DEBUG,
'BUNDLE_DIR_NAME': './',
'STATS_FILE': os.path.join(PROJECT_DIR, '../static/webpack-stats.json'), # noqa
'POLL_INTERVAL': 0.1,
'TIMEOUT': None,
'IGNORE': ['.+\.hot-update.js', '.+\.map']
}
}
diff --git a/swh/web/tests/api/test_apidoc.py b/swh/web/tests/api/test_apidoc.py
index 1c6f571d..21ea26f7 100644
--- a/swh/web/tests/api/test_apidoc.py
+++ b/swh/web/tests/api/test_apidoc.py
@@ -1,288 +1,288 @@
# Copyright (C) 2015-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU Affero General Public License version 3, or any later version
# See top-level LICENSE file for more information
from nose.tools import istest, nottest
from rest_framework.test import APITestCase
from rest_framework.response import Response
from swh.web.api.apidoc import api_doc, _parse_httpdomain_doc
from swh.web.api.apiurls import api_route
from swh.web.tests.testcase import SWHWebTestCase
# flake8: noqa
httpdomain_doc = """
.. http:get:: /api/1/revision/(sha1_git)/
Get information about a revision in the SWH archive.
Revisions are identified by *sha1* checksums, compatible with Git commit identifiers.
See :func:`swh.model.identifiers.revision_identifier` in our data model module for details
about how they are computed.
:param string sha1_git: hexadecimal representation of the revision *sha1_git* identifier
:reqheader Accept: the requested response content type,
either *application/json* (default) or *application/yaml*
:resheader Content-Type: this depends on :http:header:`Accept` header of request
:>json object author: information about the author of the revision
:>json string author_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the author of the revision
:>json object committer: information about the committer of the revision
:>json string committer_url: link to :http:get:`/api/1/person/(person_id)/` to get
information about the committer of the revision
:>json string committer_date: ISO representation of the commit date (in UTC)
:>json string date: ISO representation of the revision date (in UTC)
:>json string directory: the unique identifier that revision points to
:>json string directory_url: link to :http:get:`/api/1/directory/(sha1_git)/[(path)/]`
to get information about the directory associated to the revision
:>json string id: the revision unique identifier
:>json boolean merge: whether or not the revision corresponds to a merge commit
:>json string message: the message associated to the revision
:>json array parents: the parents of the revision, i.e. the previous revisions
that head directly to it, each entry of that array contains an unique parent
revision identifier but also a link to :http:get:`/api/1/revision/(sha1_git)/`
- to get more informations about it
+ to get more information about it
:>json string type: the type of the revision
**Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`, :http:method:`options`
:statuscode 200: no error
:statuscode 400: an invalid *sha1_git* value has been provided
:statuscode 404: requested revision can not be found in the SWH archive
**Request:**
.. parsed-literal::
$ curl -i :swh_web_api:`revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/`
"""
class APIDocTestCase(SWHWebTestCase, APITestCase):
@istest
def apidoc_nodoc_failure(self):
with self.assertRaises(Exception):
@api_doc('/my/nodoc/url/')
def apidoc_nodoc_tester(request, arga=0, argb=0):
return Response(arga + argb)
@staticmethod
@api_route(r'/some/(?P[0-9]+)/(?P[0-9]+)/',
'some-doc-route')
@api_doc('/some/doc/route/')
@nottest
def apidoc_route_tester(request, myarg, myotherarg, akw=0):
"""
Sample doc
"""
return {'result': int(myarg) + int(myotherarg) + akw}
@istest
def apidoc_route_doc(self):
# when
rv = self.client.get('/api/1/some/doc/route/')
# then
self.assertEqual(rv.status_code, 200)
self.assertTemplateUsed('api/apidoc.html')
@istest
def apidoc_route_fn(self):
# when
rv = self.client.get('/api/1/some/1/1/')
# then
self.assertEqual(rv.status_code, 200)
@staticmethod
@api_route(r'/some/full/(?P[0-9]+)/(?P[0-9]+)/',
'some-complete-doc-route')
@api_doc('/some/complete/doc/route/')
@nottest
def apidoc_full_stack_tester(request, myarg, myotherarg, akw=0):
"""
Sample doc
"""
return {'result': int(myarg) + int(myotherarg) + akw}
@istest
def apidoc_full_stack_doc(self):
# when
rv = self.client.get('/api/1/some/complete/doc/route/')
# then
self.assertEqual(rv.status_code, 200)
self.assertTemplateUsed('api/apidoc.html')
@istest
def apidoc_full_stack_fn(self):
# when
rv = self.client.get('/api/1/some/full/1/1/')
# then
self.assertEqual(rv.status_code, 200)
@istest
def test_api_doc_parse_httpdomain(self):
doc_data = {
'description': '',
'urls': [],
'args': [],
'params': [],
'resheaders': [],
'reqheaders': [],
'return_type': '',
'returns': [],
'status_codes': [],
'examples': []
}
_parse_httpdomain_doc(httpdomain_doc, doc_data)
expected_urls = [{
'rule': '/api/1/revision/ **\\(sha1_git\\)** /',
'methods': ['GET', 'HEAD', 'OPTIONS']
}]
self.assertIn('urls', doc_data)
self.assertEqual(doc_data['urls'], expected_urls)
expected_description = 'Get information about a revision in the SWH archive. \
Revisions are identified by *sha1* checksums, compatible with Git commit \
identifiers. See **swh.model.identifiers.revision_identifier** in our data \
model module for details about how they are computed.'
self.assertIn('description', doc_data)
self.assertEqual(doc_data['description'], expected_description)
expected_args = [{
'name': 'sha1_git',
'type': 'string',
'doc': 'hexadecimal representation of the revision *sha1_git* identifier'
}]
self.assertIn('args', doc_data)
self.assertEqual(doc_data['args'], expected_args)
expected_params = []
self.assertIn('params', doc_data)
self.assertEqual(doc_data['params'], expected_params)
expected_reqheaders = [{
'doc': 'the requested response content type, either *application/json* or *application/yaml*',
'name': 'Accept'
}]
self.assertIn('reqheaders', doc_data)
self.assertEqual(doc_data['reqheaders'], expected_reqheaders)
expected_resheaders = [{
'doc': 'this depends on **Accept** header of request',
'name': 'Content-Type'
}]
self.assertIn('resheaders', doc_data)
self.assertEqual(doc_data['resheaders'], expected_resheaders)
expected_statuscodes = [
{
'code': '200',
'doc': 'no error'
},
{
'code': '400',
'doc': 'an invalid *sha1_git* value has been provided'
},
{
'code': '404',
'doc': 'requested revision can not be found in the SWH archive'
}
]
self.assertIn('status_codes', doc_data)
self.assertEqual(doc_data['status_codes'], expected_statuscodes)
expected_return_type = 'object'
self.assertIn('return_type', doc_data)
self.assertEqual(doc_data['return_type'], expected_return_type)
expected_returns = [
{
'name': 'author',
'type': 'object',
'doc': 'information about the author of the revision'
},
{
'name': 'author_url',
'type': 'string',
'doc': 'link to ``_ to get information about the author of the revision'
},
{
'name': 'committer',
'type': 'object',
'doc': 'information about the committer of the revision'
},
{
'name': 'committer_url',
'type': 'string',
'doc': 'link to ``_ to get information about the committer of the revision'
},
{
'name': 'committer_date',
'type': 'string',
'doc': 'ISO representation of the commit date (in UTC)'
},
{
'name': 'date',
'type': 'string',
'doc': 'ISO representation of the revision date (in UTC)'
},
{
'name': 'directory',
'type': 'string',
'doc': 'the unique identifier that revision points to'
},
{
'name': 'directory_url',
'type': 'string',
'doc': 'link to ``_ to get information about the directory associated to the revision'
},
{
'name': 'id',
'type': 'string',
'doc': 'the revision unique identifier'
},
{
'name': 'merge',
'type': 'boolean',
'doc': 'whether or not the revision corresponds to a merge commit'
},
{
'name': 'message',
'type': 'string',
'doc': 'the message associated to the revision'
},
{
'name': 'parents',
'type': 'array',
- 'doc': 'the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to ``_ to get more informations about it'
+ 'doc': 'the parents of the revision, i.e. the previous revisions that head directly to it, each entry of that array contains an unique parent revision identifier but also a link to ``_ to get more information about it'
},
{
'name': 'type',
'type': 'string',
'doc': 'the type of the revision'
}
]
self.assertIn('returns', doc_data)
self.assertEqual(doc_data['returns'], expected_returns)
expected_examples = ['/api/1/revision/aafb16d69fd30ff58afdd69036a26047f3aebdc6/']
self.assertIn('examples', doc_data)
self.assertEqual(doc_data['examples'], expected_examples)
diff --git a/swh/web/tests/testcase.py b/swh/web/tests/testcase.py
index e3169c3a..e28333ad 100644
--- a/swh/web/tests/testcase.py
+++ b/swh/web/tests/testcase.py
@@ -1,70 +1,70 @@
# Copyright (C) 2015-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU Affero General Public License version 3, or any later version
# See top-level LICENSE file for more information
# Functions defined here are NOT DESIGNED FOR PRODUCTION
from django.core.cache import cache
from django.test import TestCase
from swh.storage.api.client import RemoteStorage as Storage
from swh.web import config
# Because the Storage's __init__ function does side effect at startup...
class RemoteStorageAdapter(Storage):
def __init__(self, base_url):
self.base_url = base_url
def _init_mock_storage(base_url='https://somewhere.org:4321'):
- """Instanciate a remote storage whose goal is to be mocked in a test
+ """Instantiate a remote storage whose goal is to be mocked in a test
context.
NOT FOR PRODUCTION
Returns:
An instance of swh.storage.api.client.RemoteStorage destined to be
mocked (it does not do any rest call)
"""
return RemoteStorageAdapter(base_url) # destined to be used as mock
def create_config(base_url='https://somewhere.org:4321'):
"""Function to initiate swh-web config with storage designed to be mocked.
Returns:
dict containing swh-web config for tests
NOT FOR PRODUCTION
"""
storage = _init_mock_storage(base_url)
swh_config = config.get_config()
# inject the mock data
swh_config.update({'storage': storage})
return swh_config
class SWHWebTestCase(TestCase):
"""Testing API class.
"""
@classmethod
def setUpClass(cls):
super(SWHWebTestCase, cls).setUpClass()
cls.test_config = create_config()
cls.maxDiff = None
@classmethod
def storage(cls):
return cls.test_config['storage']
def setUp(self):
cache.clear()