diff --git a/docs/uri-scheme-browse.rst b/docs/uri-scheme-browse.rst index ed2fc260..2f5cfaea 100644 --- a/docs/uri-scheme-browse.rst +++ b/docs/uri-scheme-browse.rst @@ -1,105 +1,98 @@ URI scheme for SWH Web Browse application ========================================= This web application aims to provide HTML views to easily navigate in the SWH archive, thus it needs to be reached from a web browser. If you intend to query the SWH archive programmatically through any HTTP client, please refer to the :ref:`swh-web-api-urls` section instead. Context-independent browsing ---------------------------- Context-independent URLs provide information about SWH objects (e.g., revisions, directories, contents, person, ...), independently of the contexts where they have been found (e.g., specific repositories, branches, commits, ...). The following endpoints are the same of the API case (see below), and just render the corresponding information for user consumption. Where hyperlinks are created, they always point to other context-independent user URLs: - * :http:get:`/browse/content/[(algo_hash):](hash)/`: Display a content - * :http:get:`/browse/content/[(algo_hash):](hash)/raw/`: Get / Download content data - * :http:get:`/browse/directory/(sha1_git)/[(path)/]`: Browse the content of a directory - * :http:get:`/browse/origin/(origin_id)/`: Information on origin - * :http:get:`/browse/person/(person_id)/`: Information on person - * :http:get:`/browse/revision/(sha1_git)/`: Browse revision - * :http:get:`/browse/revision/(sha1_git)/log/`: Browse history log heading to revision + * :http:get:`/browse/content/[(algo_hash):](hash)/`: Display a SWH content + * :http:get:`/browse/content/[(algo_hash):](hash)/raw/`: Get / Download SWH content raw data + * :http:get:`/browse/directory/(sha1_git)/[(path)/]`: Browse the content of a SWH directory + * :http:get:`/browse/origin/(origin_id)/`: Information on a SWH origin + * :http:get:`/browse/person/(person_id)/`: Information on a SWH person + * :http:get:`/browse/revision/(sha1_git)/`: Browse a SWH revision + * :http:get:`/browse/revision/(sha1_git)/log/`: Browse history log heading to a SWH revision Context-dependent browsing -------------------------- Context-dependent URLs provide information about SWH objects, limited to -specific contexts where the objects have been found. +specific contexts where the objects have been found. For instance, instead of having to specify a (root) revision by *sha1_git*, users might want to specify a place and a time. In SWH a "place" is an origin, with an optional branch name; a "time" is a timestamp at which some place has been observed by SWH crawlers. Wherever a revision context is expected in a path (i.e., a **/browse/revision/(sha1_git)/** path fragment) we can put in its stead a path fragment -of the form /origin/[/branch/][/ts//]. Such a -fragment is resolved, internally by the SWH archive, to a SHA1_GIT as follows: - -- [if is absent] look for the most recent crawl of origin -- [if is given] look for the most recent crawl of origin - whose timestamp is <= -- [if is given] look for the branch -- [if is absent] look for branch "master" -- return the pointed by the chosen branch +of the form **/origin/(origin_id)[/visit/(visit_id)/][?branch=(branch)]** or +**/origin/(origin_id)[/ts/(timestamp/][?branch=(branch)]**. Such a +fragment is resolved, internally by the SWH archive, to a *sha1_git* as follows: + +- if *timestamp* or *visit_id* is absent: look for the most recent crawl of origin *origin_id* +- if *visit_id* is given: look for crawl with id *visit_id* of origin *origin_id* +- if *timestamp* is given: look for the closest crawl of origin *origin_id* + from timestamp *timestamp* +- if *branch* is given as a query parameter: look for the branch *branch* +- if *branch* is absent: look for branch "HEAD" or "master" +- return the *sha1_git* pointed by the chosen branch The already mentioned URLs for revision contexts can therefore be alternatively specified by users as: -* /revision/origin/[/branch/][/ts/]/ -* /revision/origin/[/branch/][/ts/]/history// -* /revision/origin/[/branch/][/ts/]/directory/[] -* /revision/origin/[/branch/][/ts/]/history//directory/[] +* :http:get:`/browse/origin/(origin_id)/directory/[(path)/]` +* :http:get:`/browse/origin/(origin_id)/visit/(visit_id)/directory/[(path)/]` +* :http:get:`/browse/origin/(origin_id)/ts/(timestamp)/directory/[(path)/]` +* :http:get:`/browse/origin/(origin_id)/content/(path)/` +* :http:get:`/browse/origin/(origin_id)/visit/(visit_id)/content/(path)/` +* :http:get:`/browse/origin/(origin_id)/ts/(timestamp)/content/(path)/` +* :http:get:`/browse/origin/(origin_id)/log/` +* :http:get:`/browse/origin/(origin_id)/visit/(visit_id)/log/` +* :http:get:`/browse/origin/(origin_id)/ts/(timestamp)/log/` Typing: -- s are given as integer identifiers, pointing into the origin table. +- *origin_id* is given as an integer identifier, pointing into the origin table. There will be separate mechanisms for finding origins by other means (e.g., - URLs, metadata, etc). Once an origin is found, it can be used by ID into the - above URL schemes + URLs, metadata, etc). -- names are given as per the corresponding VCS (e.g., Git) and might - therefore contains characters that are either invalid in URLs, or that might - make the above URL schemes ambiguous (e.g., '/'). All those characters will - need to be URL-escaped. (e.g., '/' will become '%2F') +- *branch* name is given as per the corresponding VCS (e.g., Git) as + a query parameter to the requestes URL. -- s are given in a format as liberal as possible, to uphold the - principle of least surprise. At the very minimum it should be possible to +- *timestamp* is given in a format as liberal as possible, to uphold the + principle of least surprise. At the very minimum it is possible to enter timestamps as: + - Unix epoch timestamp (see for instance the output of `date +%s`) - ISO 8601 timestamps (see for instance the output of `date -I`, `date -Is`) - YYYY[MM[DD[HH[MM[SS]]]]] ad-hoc format - - Implementation proposal: use Python dateutil's parser and be done with it - https://dateutil.readthedocs.org/en/latest/parser.html . Note: that dateutil - does *not* allow to use classical UNIX timestamps expressed as seconds since - the epoch (i.e., `date +%s` output). We will need to single case them. - - The same escaping considerations given for apply. - -Notes: - -- Differently from , s are still specified as SHA1 and - cannot be specified a origin/branch/ts triples. This is to preserve some URL - sanity. + - YYYY[-MM[-DD[ HH:[MM:[SS:]]]]] ad-hoc format SWH Browse Urls --------------- .. include:: uri-scheme-browse-content.rst .. include:: uri-scheme-browse-directory.rst .. include:: uri-scheme-browse-origin.rst .. include:: uri-scheme-browse-person.rst .. include:: uri-scheme-browse-revision.rst diff --git a/swh/web/api/views/utils.py b/swh/web/api/views/utils.py index ef422a20..0f761fef 100644 --- a/swh/web/api/views/utils.py +++ b/swh/web/api/views/utils.py @@ -1,90 +1,91 @@ # Copyright (C) 2015-2017 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from rest_framework.response import Response from rest_framework.decorators import api_view from types import GeneratorType from swh.web.common.exc import NotFoundExc from swh.web.api.apiurls import APIUrls, api_route # canned doc string snippets that are used in several doc strings doc_arg_content_id = """A "[hash_type:]hash" content identifier, where hash_type is one of "sha1" (the default), "sha1_git", "sha256", and hash is a checksum obtained with the hash_type hashing algorithm.""" doc_arg_last_elt = 'element to start listing from, for pagination purposes' doc_arg_per_page = 'number of elements to list, for pagination purposes' doc_exc_bad_id = 'syntax error in the given identifier(s)' doc_exc_id_not_found = 'no object matching the given criteria could be found' doc_ret_revision_meta = 'metadata of the revision identified by sha1_git' doc_ret_revision_log = """list of dictionaries representing the metadata of each revision found in the commit log heading to revision sha1_git. For each commit at least the following information are returned: author/committer, authoring/commit timestamps, revision id, commit message, parent (i.e., immediately preceding) commits, "root" directory id.""" doc_header_link = """indicates that a subsequent result page is available, pointing to it""" def api_lookup(lookup_fn, *args, notfound_msg='Object not found', enrich_fn=lambda x: x): - """Capture a redundant behavior of: - - looking up the backend with a criteria (be it an identifier or checksum) - passed to the function lookup_fn - - if nothing is found, raise an NotFoundExc exception with error - message notfound_msg. - - Otherwise if something is returned: - - either as list, map or generator, map the enrich_fn function to it - and return the resulting data structure as list. - - either as dict and pass to enrich_fn and return the dict enriched. + """ + Capture a redundant behavior of: + - looking up the backend with a criteria (be it an identifier or + checksum) passed to the function lookup_fn + - if nothing is found, raise an NotFoundExc exception with error + message notfound_msg. + - Otherwise if something is returned: + - either as list, map or generator, map the enrich_fn function to + it and return the resulting data structure as list. + - either as dict and pass to enrich_fn and return the dict + enriched. Args: - - criteria: discriminating criteria to lookup - lookup_fn: function expects one criteria and optional supplementary - *args. + \*args. - notfound_msg: if nothing matching the criteria is found, - raise NotFoundExc with this error message. + raise NotFoundExc with this error message. - enrich_fn: Function to use to enrich the result returned by - lookup_fn. Default to the identity function if not provided. - - *args: supplementary arguments to pass to lookup_fn. + lookup_fn. Default to the identity function if not provided. + - \*args: supplementary arguments to pass to lookup_fn. Raises: NotFoundExp or whatever `lookup_fn` raises. """ res = lookup_fn(*args) if not res: raise NotFoundExc(notfound_msg) if isinstance(res, (map, list, GeneratorType)): return [enrich_fn(x) for x in res] return enrich_fn(res) @api_view(['GET', 'HEAD']) def api_home(request): return Response({}, template_name='api.html') APIUrls.add_url_pattern(r'^$', api_home, view_name='api-homepage') @api_route(r'/', 'endpoints') def api_endpoints(request): """Display the list of opened api endpoints. """ routes = APIUrls.get_app_endpoints().copy() for route, doc in routes.items(): doc['doc_intro'] = doc['docstring'].split('\n\n')[0] # Return a list of routes with consistent ordering env = { 'doc_routes': sorted(routes.items()) } return Response(env, template_name="api-endpoints.html")