diff --git a/docs/uri-scheme-browse-revision.rst b/docs/uri-scheme-browse-revision.rst index e4cd4851..823a5947 100644 --- a/docs/uri-scheme-browse-revision.rst +++ b/docs/uri-scheme-browse-revision.rst @@ -1,79 +1,81 @@ Revision ^^^^^^^^ .. http:get:: /browse/revision/(sha1_git)/ HTML view to browse a SWH revision. It notably shows the revision date and message but also offers links to get more details on: * its author * its parent revisions * the history log reachable from it The view also enables to navigate in the source tree associated to the revision and browse its content. Last but not least, the view displays the list of file changes introduced in the revision but also the diffs of each changed files. :param string sha1_git: hexadecimal representation for the *sha1_git* identifier of a SWH revision :query string origin_type: used internally to associate a SWH origin type (*git*, *svn*, *deb* ...) to the revision :query string origin_url: used internally to associate an origin url (e.g. https://github.com/user/repo) to the revision :query string timestamp: used internally to associate an origin visit to the revision, must be a date string (any format parsable by `dateutil.parser.parse`_) or Unix timestamp to parse in order to find the closest SWH visit. :query int visit_id: used internally to specify a SWH visit id instead of using the provided timestamp :query string path: used internally when navigating in the source tree associated to the revision :statuscode 200: no error :statuscode 404: requested revision can not be found in the SWH archive **Examples:** .. parsed-literal:: :swh_web_browse:`revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/` :swh_web_browse:`revision/d1aa2b3f607b35dc5dbf613b2334b6d243ec2bda/` + .. _dateutil.parser.parse: http://dateutil.readthedocs.io/en/stable/parser.html + .. http:get:: /browse/revision/(sha1_git)/log/ HTML view that displays the list of revisions heading to a given one. In other words, it shows a commit log. The following data are displayed for each log entry: * author of the revision * link to browse the revision * message associated to the revision * date of the revision * link to browse the associated source tree N log entries are displayed per page (default is 20). In order to navigate in a large history, two buttons are present at the bottom of the view: * *Newer*: fetch and display if available the N more recent log entries than the ones currently displayed * *Older*: fetch and display if available the N older log entries than the ones currently displayed :param string sha1_git: hexadecimal representation for the *sha1_git* identifier of a SWH revision :query string revs_breadcrumb: used internally to store the navigation breadcrumbs (i.e. the list of descendant revisions visited so far). It must be a string in the form "[//.../]" where rev_i corresponds to a revision sha1_git. :query int per_page: the number of log entries to display per page (default is 20, max is 50) :statuscode 200: no error :statuscode 404: requested revision can not be found in the SWH archive **Examples:** .. parsed-literal:: :swh_web_browse:`revision/f1b94134a4b879bc55c3dacdb496690c8ebdc03f/log/` - :swh_web_browse:`revision/d1aa2b3f607b35dc5dbf613b2334b6d243ec2bda/log/` \ No newline at end of file + :swh_web_browse:`revision/d1aa2b3f607b35dc5dbf613b2334b6d243ec2bda/log/` diff --git a/docs/uri-scheme-browse.rst b/docs/uri-scheme-browse.rst index a950afe9..ad43b2c4 100644 --- a/docs/uri-scheme-browse.rst +++ b/docs/uri-scheme-browse.rst @@ -1,96 +1,99 @@ URI scheme for SWH Web Browse application ========================================= This web application aims to provide HTML views to easily navigate in the SWH archive, thus it needs to be reached from a web browser. If you intend to query the SWH archive programmatically through any HTTP client, please refer to the :ref:`swh-web-api-urls` section instead. Context-independent browsing ---------------------------- Context-independent URLs provide information about SWH objects (e.g., revisions, directories, contents, person, ...), independently of the contexts where they have been found (e.g., specific repositories, branches, commits, ...). The following endpoints are the same of the API case (see below), and just render the corresponding information for user consumption. Where hyperlinks are created, they always point to other context-independent user URLs: * :http:get:`/browse/content/[(algo_hash):](hash)/`: Display a SWH content * :http:get:`/browse/content/[(algo_hash):](hash)/raw/`: Get / Download SWH content raw data * :http:get:`/browse/directory/(sha1_git)/[(path)/]`: Browse the content of a SWH directory * :http:get:`/browse/person/(person_id)/`: Information on a SWH person * :http:get:`/browse/revision/(sha1_git)/`: Browse a SWH revision * :http:get:`/browse/revision/(sha1_git)/log/`: Browse history log heading to a SWH revision Context-dependent browsing -------------------------- Context-dependent URLs provide information about SWH objects, limited to specific contexts where the objects have been found. For instance, instead of having to specify a (root) revision by *sha1_git*, users might want to specify a place and a time. In SWH a "place" is an origin, with an optional branch name; a "time" is a timestamp at which some place has been observed by SWH crawlers. Wherever a revision context is expected in a path (i.e., a **/browse/revision/(sha1_git)/** path fragment) we can put in its stead a path fragment of the form **/origin/(origin_type)/url/(origin_url)/[/visit/(timestamp)/][?branch=(branch)]**. -Such a fragment is resolved, internally by the SWH archive, to a *sha1_git* as follows: +Such a fragment is resolved, internally by the SWH archive, to a revision *sha1_git* as follows: -- if *timestamp* is absent: look for the most recent crawl of origin *origin_id* -- if *timestamp* is given: look for the closest crawl of origin *origin_id* - from timestamp *timestamp* +- if *timestamp* is absent: look for the most recent crawl of origin + identified by *origin_type* and *origin_url* +- if *timestamp* is given: look for the closest crawl of origin identified + by *origin_type* and *origin_url* from timestamp *timestamp* - if *branch* is given as a query parameter: look for the branch *branch* - if *branch* is absent: look for branch "HEAD" or "master" -- return the *sha1_git* pointed by the chosen branch +- return the revision *sha1_git* pointed by the chosen branch The already mentioned URLs for revision contexts can therefore be alternatively specified by users as: * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/directory/[(path)/]` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/visit/(timestamp)/directory/[(path)/]` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/content/(path)/` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/visit/(timestamp)/content/(path)/` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/log/` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/visit/(timestamp)/log/` Typing: -- *origin_id* is given as an integer identifier, pointing into the origin table. - There will be separate mechanisms for finding origins by other means (e.g., - URLs, metadata, etc). +- *origin_type* corresponds to the type of the archived origin: + *git*, *svn*, *hg*, *deposit*, *deb*, ... + +- *origin_url* corresponds to the URL the origin was crawled from, + for instance https://github.com/(user)/(repo)/ - *branch* name is given as per the corresponding VCS (e.g., Git) as a query parameter to the requestes URL. - *timestamp* is given in a format as liberal as possible, to uphold the principle of least surprise. At the very minimum it is possible to enter timestamps as: - Unix epoch timestamp (see for instance the output of `date +%s`) - ISO 8601 timestamps (see for instance the output of `date -I`, `date -Is`) - YYYY[MM[DD[HH[MM[SS]]]]] ad-hoc format - YYYY[-MM[-DD[ HH:[MM:[SS:]]]]] ad-hoc format SWH Browse Urls --------------- .. include:: uri-scheme-browse-content.rst .. include:: uri-scheme-browse-directory.rst .. include:: uri-scheme-browse-identifiers.rst .. include:: uri-scheme-browse-origin.rst .. include:: uri-scheme-browse-person.rst .. include:: uri-scheme-browse-release.rst .. include:: uri-scheme-browse-revision.rst diff --git a/swh/web/api/apiurls.py b/swh/web/api/apiurls.py index 9676a2b4..4e3b8fe7 100644 --- a/swh/web/api/apiurls.py +++ b/swh/web/api/apiurls.py @@ -1,126 +1,126 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information import re from rest_framework.decorators import api_view from swh.web.common.urlsindex import UrlsIndex from swh.web.common.throttling import throttle_scope class APIUrls(UrlsIndex): """ Class to manage API documentation URLs. - Indexes all routes documented using apidoc's decorators. - Tracks endpoint/request processing method relationships for use in generating related urls in API documentation """ - apidoc_routes = {} - method_endpoints = {} + _apidoc_routes = {} + _method_endpoints = {} scope = 'api' @classmethod def get_app_endpoints(cls): - return cls.apidoc_routes + return cls._apidoc_routes @classmethod def get_method_endpoints(cls, f): - if f.__name__ not in cls.method_endpoints: - cls.method_endpoints[f.__name__] = cls.group_routes_by_method(f) - return cls.method_endpoints[f.__name__] + if f.__name__ not in cls._method_endpoints: + cls._method_endpoints[f.__name__] = cls.group_routes_by_method(f) + return cls._method_endpoints[f.__name__] @classmethod def group_routes_by_method(cls, f): """ Group URL endpoints according to their processing method. Returns: A dict where keys are the processing method names, and values are the routes that are bound to the key method. """ rules = [] for urlp in cls.get_url_patterns(): endpoint = urlp.callback.__name__ if endpoint != f.__name__: continue method_names = urlp.callback.http_method_names url_rule = urlp.regex.pattern.replace('^', '/').replace('$', '') url_rule_params = re.findall('\([^)]+\)', url_rule) for param in url_rule_params: param_name = re.findall('<(.*)>', param) param_name = param_name[0] if len(param_name) > 0 else None if param_name and hasattr(f, 'doc_data') and f.doc_data['args']: # noqa param_index = \ next(i for (i, d) in enumerate(f.doc_data['args']) if d['name'] == param_name) if param_index is not None: url_rule = url_rule.replace( param, '<' + f.doc_data['args'][param_index]['name'] + ': ' + f.doc_data['args'][param_index]['type'] + '>').replace('.*', '') rule_dict = {'rule': '/api' + url_rule, 'name': urlp.name, 'methods': {method.upper() for method in method_names} } rules.append(rule_dict) return rules @classmethod def add_route(cls, route, docstring, **kwargs): """ Add a route to the self-documenting API reference """ route_view_name = route[1:-1].replace('/', '-') - if route not in cls.apidoc_routes: + if route not in cls._apidoc_routes: d = {'docstring': docstring, 'route_view_name': route_view_name} for k, v in kwargs.items(): d[k] = v - cls.apidoc_routes[route] = d + cls._apidoc_routes[route] = d class api_route(object): # noqa: N801 """ Decorator to ease the registration of an API endpoint using the Django REST Framework. Args: url_pattern: the url pattern used by DRF to identify the API route view_name: the name of the API view associated to the route used to reverse the url methods: array of HTTP methods supported by the API route """ def __init__(self, url_pattern=None, view_name=None, methods=['GET', 'HEAD', 'OPTIONS'], throttle_scope='swh_api', api_version='1'): super().__init__() self.url_pattern = '^' + api_version + url_pattern + '$' self.view_name = view_name self.methods = methods self.throttle_scope = throttle_scope def __call__(self, f): # create a DRF view from the wrapped function @api_view(self.methods) @throttle_scope(self.throttle_scope) def api_view_f(*args, **kwargs): return f(*args, **kwargs) # small hacks for correctly generating API endpoints index doc api_view_f.__name__ = f.__name__ api_view_f.http_method_names = self.methods # register the route and its view in the endpoints index APIUrls.add_url_pattern(self.url_pattern, api_view_f, self.view_name) return f diff --git a/swh/web/browse/views/origin.py b/swh/web/browse/views/origin.py index df6032ee..c3e60abb 100644 --- a/swh/web/browse/views/origin.py +++ b/swh/web/browse/views/origin.py @@ -1,896 +1,896 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information import json from distutils.util import strtobool from django.http import HttpResponse from django.shortcuts import render from django.utils.safestring import mark_safe from django.template.defaultfilters import filesizeformat from swh.web.common import service from swh.web.common.utils import ( gen_path_info, reverse, format_utc_iso_date, parse_timestamp ) from swh.web.common.exc import NotFoundExc, handle_view_exception from swh.web.browse.utils import ( get_origin_visits, get_directory_entries, request_content, prepare_content_for_display, prepare_revision_log_for_display, get_origin_context, gen_directory_link, gen_revision_link, gen_revision_log_link, gen_content_link, gen_origin_directory_link, content_display_max_size ) from swh.web.browse.browseurls import browse_route def _branch_not_found(origin_info, timestamp, branch_type, branch, branches, visit_id=None): """ Utility function to raise an exception when a specified branch/release can not be found. """ if branch_type: occ_type = 'Branch' occ_type_plural = 'branches' else: occ_type = 'Release' occ_type_plural = 'releases' if visit_id: if len(branches) == 0: raise NotFoundExc('Origin with type %s and url %s' ' for visit with id %s has an empty list' ' of %s!' % (origin_info['type'], origin_info['url'], visit_id, occ_type_plural)) else: raise NotFoundExc('%s %s associated to visit with' ' id %s for origin with type %s and url %s' ' not found!' % (occ_type, branch, visit_id, origin_info['type'], origin_info['url'])) else: if len(branches) == 0: raise NotFoundExc('Origin with type %s and url %s' ' for visit with timestamp %s has an empty list' ' of %s!' % (origin_info['type'], origin_info['url'], timestamp, occ_type_plural)) else: raise NotFoundExc('%s %s associated to visit with' ' timestamp %s for origin with type %s' ' and url %s not found!' % (occ_type, branch, timestamp, origin_info['type'], origin_info['url'])) def _get_branch(branches, branch_name): """ Utility function to get a specific branch from an origin branches list. Its purpose is to get the default HEAD branch as some SWH origin (e.g those with svn type) does not have it. In that latter case, check if there is a master branch instead and returns it. """ filtered_branches = \ [b for b in branches if b['name'].endswith(branch_name)] if len(filtered_branches) > 0: return filtered_branches[0] elif branch_name == 'HEAD': filtered_branches = \ [b for b in branches if b['name'].endswith('master')] if len(filtered_branches) > 0: return filtered_branches[0] elif len(branches) > 0: return branches[0] return None def _get_release(releases, release_name): filtered_releases = \ [r for r in releases if r['name'] == release_name] if len(filtered_releases) > 0: return filtered_releases[0] else: return None def _process_origin_request(request, origin_type, origin_url, timestamp, path, browse_view_name): """ Utility function to perform common input request processing for origin context views. """ visit_id = request.GET.get('visit_id', None) origin_context = get_origin_context(origin_type, origin_url, timestamp, visit_id) for b in origin_context['branches']: branch_url_args = dict(origin_context['url_args']) if path: b['path'] = path branch_url_args['path'] = path b['url'] = reverse(browse_view_name, kwargs=branch_url_args, query_params={'branch': b['name'], 'visit_id': visit_id}) for r in origin_context['releases']: release_url_args = dict(origin_context['url_args']) if path: r['path'] = path release_url_args['path'] = path r['url'] = reverse(browse_view_name, kwargs=release_url_args, query_params={'release': r['name'], 'visit_id': visit_id}) root_sha1_git = None query_params = origin_context['query_params'] revision_id = request.GET.get('revision', None) release_name = request.GET.get('release', None) branch_name = None if revision_id: revision = service.lookup_revision(revision_id) root_sha1_git = revision['directory'] origin_context['branches'].append({'name': revision_id, 'revision': revision_id, 'directory': root_sha1_git, 'url': None}) branch_name = revision_id query_params['revision'] = revision_id elif release_name: release = _get_release(origin_context['releases'], release_name) if release: root_sha1_git = release['directory'] query_params['release'] = release_name revision_id = release['target'] else: _branch_not_found(origin_context['origin_info'], timestamp, False, release_name, origin_context['releases'], visit_id) else: branch_name = request.GET.get('branch', None) if branch_name: query_params['branch'] = branch_name branch = _get_branch(origin_context['branches'], branch_name or 'HEAD') if branch: branch_name = branch['name'] root_sha1_git = branch['directory'] revision_id = branch['revision'] else: _branch_not_found(origin_context['origin_info'], timestamp, True, branch_name, origin_context['branches'], visit_id) origin_context['root_sha1_git'] = root_sha1_git origin_context['revision_id'] = revision_id origin_context['branch'] = branch_name origin_context['release'] = release_name return origin_context @browse_route(r'origin/(?P[a-z]+)/url/(?P.+)/visit/(?P.+)/directory/', # noqa r'origin/(?P[a-z]+)/url/(?P.+)/visit/(?P.+)/directory/(?P.+)/', # noqa r'origin/(?P[a-z]+)/url/(?P.+)/directory/', # noqa r'origin/(?P[a-z]+)/url/(?P.+)/directory/(?P.+)/', # noqa view_name='browse-origin-directory') def origin_directory_browse(request, origin_type, origin_url, timestamp=None, path=None): """Django view for browsing the content of a SWH directory associated to an origin for a given visit. The url scheme that points to it is the following: * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/directory/[(path)/]` - * :http:get:`/browse/origin/(origin_type)/url/(origin_type)/visit/(timestamp)/directory/[(path)/]` + * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/visit/(timestamp)/directory/[(path)/]` Args: request: input django http request origin_type: the type of swh origin (git, svn, hg, ...) origin_url: the url of the swh origin timestamp: optional swh visit timestamp parameter (the last one will be used by default) path: optional path parameter used to navigate in directories reachable from the origin root one branch: optional query parameter that specifies the origin branch from which to retrieve the directory release: optional query parameter that specifies the origin release from which to retrieve the directory revision: optional query parameter to specify the origin revision from which to retrieve the directory Returns: The HTML rendering for the content of the directory associated to the provided origin and visit. """ # noqa try: origin_context = _process_origin_request( request, origin_type, origin_url, timestamp, path, 'browse-origin-directory') root_sha1_git = origin_context['root_sha1_git'] sha1_git = root_sha1_git if path: dir_info = service.lookup_directory_with_path(root_sha1_git, path) sha1_git = dir_info['target'] dirs, files = get_directory_entries(sha1_git) except Exception as exc: return handle_view_exception(request, exc) origin_info = origin_context['origin_info'] visit_info = origin_context['visit_info'] url_args = origin_context['url_args'] query_params = origin_context['query_params'] revision_id = origin_context['revision_id'] path_info = gen_path_info(path) breadcrumbs = [] breadcrumbs.append({'name': root_sha1_git[:7], 'url': reverse('browse-origin-directory', kwargs=url_args, query_params=query_params)}) for pi in path_info: bc_url_args = dict(url_args) bc_url_args['path'] = pi['path'] breadcrumbs.append({'name': pi['name'], 'url': reverse('browse-origin-directory', kwargs=bc_url_args, query_params=query_params)}) path = '' if path is None else (path + '/') for d in dirs: bc_url_args = dict(url_args) bc_url_args['path'] = path + d['name'] d['url'] = reverse('browse-origin-directory', kwargs=bc_url_args, query_params=query_params) sum_file_sizes = 0 readme_name = None readme_url = None for f in files: bc_url_args = dict(url_args) bc_url_args['path'] = path + f['name'] f['url'] = reverse('browse-origin-content', kwargs=bc_url_args, query_params=query_params) sum_file_sizes += f['length'] f['length'] = filesizeformat(f['length']) if f['name'].lower().startswith('readme'): readme_name = f['name'] readme_sha1 = f['checksums']['sha1'] readme_url = reverse('browse-content-raw', kwargs={'query_string': readme_sha1}) history_url = reverse('browse-origin-log', kwargs=url_args, query_params=query_params) sum_file_sizes = filesizeformat(sum_file_sizes) browse_dir_link = \ gen_directory_link(sha1_git, link_text='Browse', link_attrs={'class': 'btn btn-md btn-swh', 'role': 'button'}) browse_rev_link = \ gen_revision_link(revision_id, origin_context=origin_context, link_text='Browse', link_attrs={'class': 'btn btn-md btn-swh', 'role': 'button'}) dir_metadata = {'id': sha1_git, 'context-independent directory': browse_dir_link, 'number of regular files': len(files), 'number of subdirectories': len(dirs), 'sum of regular file sizes': sum_file_sizes, 'origin id': origin_info['id'], 'origin type': origin_info['type'], 'origin url': origin_info['url'], 'origin visit date': format_utc_iso_date(visit_info['date']), # noqa 'origin visit id': visit_info['visit'], 'path': '/' + path, 'revision id': revision_id, 'revision': browse_rev_link} vault_cooking = { 'directory_context': True, 'directory_id': sha1_git, 'revision_context': True, 'revision_id': revision_id } return render(request, 'directory.html', {'empty_browse': False, 'heading': 'Directory information', 'top_panel_visible': True, 'top_panel_collapsible': True, 'top_panel_text': 'SWH object: Directory', 'swh_object_metadata': dir_metadata, 'main_panel_visible': True, 'dirs': dirs, 'files': files, 'breadcrumbs': breadcrumbs, 'top_right_link': history_url, 'top_right_link_text': mark_safe( '' 'History' ), 'readme_name': readme_name, 'readme_url': readme_url, 'origin_context': origin_context, 'vault_cooking': vault_cooking, 'show_actions_menu': True}) @browse_route(r'origin/(?P[a-z]+)/url/(?P.+)/visit/(?P.+)/content/(?P.+)/', # noqa r'origin/(?P[a-z]+)/url/(?P.+)/content/(?P.+)/', # noqa view_name='browse-origin-content') def origin_content_display(request, origin_type, origin_url, path, timestamp=None): """Django view that produces an HTML display of a SWH content associated to an origin for a given visit. The url scheme that points to it is the following: * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/content/(path)/` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/visit/(timestamp)/content/(path)/` Args: request: input django http request origin_type: the type of swh origin (git, svn, hg, ...) origin_url: the url of the swh origin path: path of the content relative to the origin root directory timestamp: optional swh visit timestamp parameter (the last one will be used by default) branch: optional query parameter that specifies the origin branch from which to retrieve the content release: optional query parameter that specifies the origin release from which to retrieve the content revision: optional query parameter to specify the origin revision from which to retrieve the content Returns: The HTML rendering of the requested content associated to the provided origin and visit. """ # noqa try: origin_context = _process_origin_request( request, origin_type, origin_url, timestamp, path, 'browse-origin-content') root_sha1_git = origin_context['root_sha1_git'] content_info = service.lookup_directory_with_path(root_sha1_git, path) sha1_git = content_info['target'] query_string = 'sha1_git:' + sha1_git content_data = request_content(query_string) except Exception as exc: return handle_view_exception(request, exc) url_args = origin_context['url_args'] query_params = origin_context['query_params'] revision_id = origin_context['revision_id'] origin_info = origin_context['origin_info'] visit_info = origin_context['visit_info'] content = None language = None if content_data['raw_data'] is not None: content_display_data = prepare_content_for_display( content_data['raw_data'], content_data['mimetype'], path) content = content_display_data['content_data'] language = content_display_data['language'] filename = None path_info = None breadcrumbs = [] split_path = path.split('/') filename = split_path[-1] path = path[:-len(filename)] path_info = gen_path_info(path) breadcrumbs.append({'name': root_sha1_git[:7], 'url': reverse('browse-origin-directory', kwargs=url_args, query_params=query_params)}) for pi in path_info: bc_url_args = dict(url_args) bc_url_args['path'] = pi['path'] breadcrumbs.append({'name': pi['name'], 'url': reverse('browse-origin-directory', kwargs=bc_url_args, query_params=query_params)}) breadcrumbs.append({'name': filename, 'url': None}) browse_content_link = \ gen_content_link(sha1_git, link_text='Browse', link_attrs={'class': 'btn btn-md btn-swh', 'role': 'button'}) content_raw_url = reverse('browse-content-raw', kwargs={'query_string': query_string}, query_params={'filename': filename}) browse_rev_link = \ gen_revision_link(revision_id, origin_context=origin_context, link_text='Browse', link_attrs={'class': 'btn btn-md btn-swh', 'role': 'button'}) content_metadata = { 'context-independent content': browse_content_link, 'sha1 checksum': content_data['checksums']['sha1'], 'sha1_git checksum': content_data['checksums']['sha1_git'], 'sha256 checksum': content_data['checksums']['sha256'], 'blake2s256 checksum': content_data['checksums']['blake2s256'], 'mime type': content_data['mimetype'], 'encoding': content_data['encoding'], 'size': filesizeformat(content_data['length']), 'language': content_data['language'], 'licenses': content_data['licenses'], 'origin id': origin_info['id'], 'origin type': origin_info['type'], 'origin url': origin_info['url'], 'origin visit date': format_utc_iso_date(visit_info['date']), 'origin visit id': visit_info['visit'], 'path': '/' + path, 'filename': filename, 'revision id': revision_id, 'revision': browse_rev_link } return render(request, 'content.html', {'empty_browse': False, 'heading': 'Content information', 'top_panel_visible': True, 'top_panel_collapsible': True, 'top_panel_text': 'SWH object: Content', 'swh_object_metadata': content_metadata, 'main_panel_visible': True, 'content': content, 'content_size': content_data['length'], 'max_content_size': content_display_max_size, 'mimetype': content_data['mimetype'], 'language': language, 'breadcrumbs': breadcrumbs, 'top_right_link': content_raw_url, 'top_right_link_text': mark_safe( 'Raw File'), 'origin_context': origin_context, 'vault_cooking': None, 'show_actions_menu': False}) PER_PAGE = 20 @browse_route(r'origin/(?P[a-z]+)/url/(?P.+)/visit/(?P.+)/log/', # noqa r'origin/(?P[a-z]+)/url/(?P.+)/log/', view_name='browse-origin-log') def origin_log_browse(request, origin_type, origin_url, timestamp=None): """Django view that produces an HTML display of revisions history (aka the commit log) associated to a SWH origin. The url scheme that points to it is the following: * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/log/` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/visit/(timestamp)/log/` Args: request: input django http request origin_type: the type of swh origin (git, svn, hg, ...) origin_url: the url of the swh origin timestamp: optional visit timestamp parameter (the last one will be used by default) revs_breadcrumb: query parameter used internally to store the navigation breadcrumbs (i.e. the list of descendant revisions visited so far). per_page: optional query parameter used to specify the number of log entries per page branch: optional query parameter that specifies the origin branch from which to retrieve the commit log release: optional query parameter that specifies the origin release from which to retrieve the commit log revision: optional query parameter to specify the origin revision from which to retrieve the commit log Returns: The HTML rendering of revisions history for a given SWH visit. """ # noqa try: origin_context = _process_origin_request( request, origin_type, origin_url, timestamp, None, 'browse-origin-log') revision_id = origin_context['revision_id'] current_rev = revision_id per_page = int(request.GET.get('per_page', PER_PAGE)) revs_breadcrumb = request.GET.get('revs_breadcrumb', None) if revs_breadcrumb: current_rev = revs_breadcrumb.split('/')[-1] revision_log = service.lookup_revision_log(current_rev, limit=per_page+1) revision_log = list(revision_log) except Exception as exc: return handle_view_exception(request, exc) origin_info = origin_context['origin_info'] visit_info = origin_context['visit_info'] url_args = origin_context['url_args'] query_params = origin_context['query_params'] query_params['per_page'] = per_page revision_log_display_data = prepare_revision_log_for_display( revision_log, per_page, revs_breadcrumb, origin_context) prev_rev = revision_log_display_data['prev_rev'] prev_revs_breadcrumb = revision_log_display_data['prev_revs_breadcrumb'] prev_log_url = None query_params['revs_breadcrumb'] = prev_revs_breadcrumb if prev_rev: prev_log_url = \ reverse('browse-origin-log', kwargs=url_args, query_params=query_params) next_rev = revision_log_display_data['next_rev'] next_revs_breadcrumb = revision_log_display_data['next_revs_breadcrumb'] next_log_url = None query_params['revs_breadcrumb'] = next_revs_breadcrumb if next_rev: next_log_url = \ reverse('browse-origin-log', kwargs=url_args, query_params=query_params) revision_log_data = revision_log_display_data['revision_log_data'] for i, log in enumerate(revision_log_data): params = { 'revision': revision_log[i]['id'], } if 'visit_id' in query_params: params['visit_id'] = query_params['visit_id'] log['directory'] = gen_origin_directory_link( origin_context, revision_log[i]['id'], link_text='Browse files', link_attrs={'class': 'btn btn-md btn-swh', 'role': 'button'}) browse_log_link = \ gen_revision_log_link(revision_id, link_text='Browse', link_attrs={'class': 'btn btn-md btn-swh', 'role': 'button'}) revision_metadata = { 'context-independent revision history': browse_log_link, 'origin id': origin_info['id'], 'origin type': origin_info['type'], 'origin url': origin_info['url'], 'origin visit date': format_utc_iso_date(visit_info['date']), 'origin visit id': visit_info['visit'] } return render(request, 'revision-log.html', {'empty_browse': False, 'heading': 'Revision history information', 'top_panel_visible': True, 'top_panel_collapsible': True, 'top_panel_text': 'SWH object: Revision history', 'swh_object_metadata': revision_metadata, 'main_panel_visible': True, 'revision_log': revision_log_data, 'next_log_url': next_log_url, 'prev_log_url': prev_log_url, 'breadcrumbs': None, 'top_right_link': None, 'top_right_link_text': None, 'origin_context': origin_context, 'vault_cooking': None, 'show_actions_menu': False}) @browse_route(r'origin/(?P[a-z]+)/url/(?P.+)/visit/(?P.+)/branches/', # noqa r'origin/(?P[a-z]+)/url/(?P.+)/branches/', # noqa view_name='browse-origin-branches') def origin_branches_browse(request, origin_type, origin_url, timestamp=None): """Django view that produces an HTML display of the list of branches associated to an origin for a given visit. The url scheme that points to it is the following: * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/branches/` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/visit/(timestamp)/branches/` """ # noqa try: origin_context = _process_origin_request( request, origin_type, origin_url, timestamp, None, 'browse-origin-directory') except Exception as exc: return handle_view_exception(request, exc) branches_offset = int(request.GET.get('branches_offset', 0)) origin_info = origin_context['origin_info'] url_args = origin_context['url_args'] query_params = origin_context['query_params'] branches = origin_context['branches'] displayed_branches = \ branches[branches_offset:branches_offset+PER_PAGE] for branch in displayed_branches: revision_url = reverse( 'browse-revision', kwargs={'sha1_git': branch['revision']}, query_params={'origin_type': origin_info['type'], 'origin_url': origin_info['url']}) query_params['branch'] = branch['name'] directory_url = reverse('browse-origin-directory', kwargs=url_args, query_params=query_params) del query_params['branch'] branch['revision_url'] = revision_url branch['directory_url'] = directory_url prev_branches_url = None next_branches_url = None next_offset = branches_offset + PER_PAGE prev_offset = branches_offset - PER_PAGE if next_offset < len(branches): query_params['branches_offset'] = next_offset next_branches_url = reverse('browse-origin-branches', kwargs=url_args, query_params=query_params) query_params['branches_offset'] = None if prev_offset >= 0: if prev_offset != 0: query_params['branches_offset'] = prev_offset prev_branches_url = reverse('browse-origin-branches', kwargs=url_args, query_params=query_params) return render(request, 'branches.html', {'empty_browse': False, 'heading': 'Origin branches list', 'top_panel_visible': False, 'top_panel_collapsible': False, 'top_panel_text': 'SWH object: Origin branches list', 'swh_object_metadata': {}, 'main_panel_visible': True, 'top_right_link': None, 'top_right_link_text': None, 'displayed_branches': displayed_branches, 'prev_branches_url': prev_branches_url, 'next_branches_url': next_branches_url, 'origin_context': origin_context}) @browse_route(r'origin/(?P[a-z]+)/url/(?P.+)/visit/(?P.+)/releases/', # noqa r'origin/(?P[a-z]+)/url/(?P.+)/releases/', # noqa view_name='browse-origin-releases') def origin_releases_browse(request, origin_type, origin_url, timestamp=None): """Django view that produces an HTML display of the list of releases associated to an origin for a given visit. The url scheme that points to it is the following: * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/releases/` * :http:get:`/browse/origin/(origin_type)/url/(origin_url)/visit/(timestamp)/releases/` """ # noqa try: origin_context = _process_origin_request( request, origin_type, origin_url, timestamp, None, 'browse-origin-directory') except Exception as exc: return handle_view_exception(request, exc) releases_offset = int(request.GET.get('releases_offset', 0)) origin_info = origin_context['origin_info'] url_args = origin_context['url_args'] query_params = origin_context['query_params'] releases = origin_context['releases'] displayed_releases = \ releases[releases_offset:releases_offset+PER_PAGE] for release in displayed_releases: release_url = reverse('browse-release', kwargs={'sha1_git': release['id']}, query_params={'origin_type': origin_info['type'], 'origin_url': origin_info['url']}) query_params['release'] = release['name'] del query_params['release'] release['release_url'] = release_url prev_releases_url = None next_releases_url = None next_offset = releases_offset + PER_PAGE prev_offset = releases_offset - PER_PAGE if next_offset < len(releases): query_params['releases_offset'] = next_offset next_releases_url = reverse('browse-origin-releases', kwargs=url_args, query_params=query_params) query_params['releases_offset'] = None if prev_offset >= 0: if prev_offset != 0: query_params['releases_offset'] = prev_offset prev_releases_url = reverse('browse-origin-releases', kwargs=url_args, query_params=query_params) return render(request, 'releases.html', {'empty_browse': False, 'heading': 'Origin releases list', 'top_panel_visible': False, 'top_panel_collapsible': False, 'top_panel_text': 'SWH object: Origin releases list', 'swh_object_metadata': {}, 'main_panel_visible': True, 'top_right_link': None, 'top_right_link_text': None, 'displayed_releases': displayed_releases, 'prev_releases_url': prev_releases_url, 'next_releases_url': next_releases_url, 'origin_context': origin_context, 'vault_cooking': None, 'show_actions_menu': False}) @browse_route(r'origin/(?P[a-z]+)/url/(?P.+)/', view_name='browse-origin') def origin_browse(request, origin_type=None, origin_url=None): """Django view that produces an HTML display of a swh origin identified by its id or its url. The url scheme that points to it is :http:get:`/browse/origin/(origin_type)/url/(origin_url)/`. Args: request: input django http request origin_type: type of origin (git, svn, ...) origin_url: url of the origin (e.g. https://github.com//) Returns: The HMTL rendering for the metadata of the provided origin. """ # noqa try: origin_info = service.lookup_origin({ 'type': origin_type, 'url': origin_url }) origin_visits = get_origin_visits(origin_info) origin_visits.reverse() except Exception as exc: return handle_view_exception(request, exc) origin_info['last swh visit browse url'] = \ reverse('browse-origin-directory', kwargs={'origin_type': origin_type, 'origin_url': origin_url}) origin_visits_data = [] visits_splitted = [] visits_by_year = {} for i, visit in enumerate(origin_visits): visit_date = parse_timestamp(visit['date']) visit_year = str(visit_date.year) url_date = format_utc_iso_date(visit['date'], '%Y-%m-%dT%H:%M:%SZ') visit['fmt_date'] = format_utc_iso_date(visit['date']) query_params = {} if i < len(origin_visits) - 1: if visit['date'] == origin_visits[i+1]['date']: query_params = {'visit_id': visit['visit']} if i > 0: if visit['date'] == origin_visits[i-1]['date']: query_params = {'visit_id': visit['visit']} visit['browse_url'] = reverse('browse-origin-directory', kwargs={'origin_type': origin_type, 'origin_url': origin_url, 'timestamp': url_date}, query_params=query_params) origin_visits_data.insert(0, {'date': visit_date.timestamp()}) if visit_year not in visits_by_year: # display 3 years by row in visits list view if len(visits_by_year) == 3: visits_splitted.append(visits_by_year) visits_by_year = {} visits_by_year[visit_year] = [] visits_by_year[visit_year].append(visit) if len(visits_by_year) > 0: visits_splitted.append(visits_by_year) return render(request, 'origin.html', {'empty_browse': False, 'heading': 'Origin information', 'top_panel_visible': False, 'top_panel_collapsible': False, 'top_panel_text': 'SWH object: Visits history', 'swh_object_metadata': origin_info, 'main_panel_visible': True, 'origin_visits_data': origin_visits_data, 'visits_splitted': visits_splitted, 'origin_info': origin_info, 'browse_url_base': '/browse/origin/%s/url/%s/' % (origin_type, origin_url), 'vault_cooking': None, 'show_actions_menu': False}) @browse_route(r'origin/search/(?P.+)/', view_name='browse-origin-search') def origin_search(request, url_pattern): """Search for origins whose urls contain a provided string pattern or match a provided regular expression. The search is performed in a case insensitive way. """ offset = int(request.GET.get('offset', '0')) limit = int(request.GET.get('limit', '50')) regexp = request.GET.get('regexp', 'false') results = service.search_origin(url_pattern, offset, limit, bool(strtobool(regexp))) results = json.dumps(list(results), sort_keys=True, indent=4, separators=(',', ': ')) return HttpResponse(results, content_type='application/json') @browse_route(r'origin/(?P[0-9]+)/latest_snapshot/', view_name='browse-origin-latest-snapshot') def _origin_latest_snapshot(request, origin_id): """ Internal browse endpoint used to check if an origin has already been visited by Software Heritage and has at least one full visit. """ result = service.lookup_latest_origin_snapshot(origin_id, allowed_statuses=['full']) result = json.dumps(result, sort_keys=True, indent=4, separators=(',', ': ')) return HttpResponse(result, content_type='application/json') diff --git a/swh/web/common/throttling.py b/swh/web/common/throttling.py index c27ee40a..1141745e 100644 --- a/swh/web/common/throttling.py +++ b/swh/web/common/throttling.py @@ -1,129 +1,130 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information import ipaddress from rest_framework.throttling import ScopedRateThrottle from swh.web.config import get_config class SwhWebRateThrottle(ScopedRateThrottle): """Custom request rate limiter for DRF enabling to exempt specific networks specified in swh-web configuration. Requests are grouped into scopes. It enables to apply different requests rate limiting based on the scope name but also the input HTTP request types. To associate a scope to requests, one must add a 'throttle_scope' attribute when using a class based view, or call the 'throttle_scope' decorator when using a function based view. By default, requests do not have an associated scope and are not rate limited. Rate limiting can also be configured according to the type of the input HTTP requests for fine grained tuning. For instance, the following YAML configuration section sets a rate of: - 1 per minute for POST requests - 60 per minute for other request types + for the 'swh_api' scope while exempting those comming from the 127.0.0.0/8 ip network. .. code-block:: yaml throttling: scopes: swh_api: limiter_rate: default: 60/m POST: 1/m exempted_networks: - 127.0.0.0/8 """ scope = None def __init__(self): super().__init__() self.exempted_networks = None def get_exempted_networks(self, scope_name): if not self.exempted_networks: scopes = get_config()['throttling']['scopes'] scope = scopes.get(scope_name) if scope: networks = scope.get('exempted_networks') if networks: self.exempted_networks = [ipaddress.ip_network(network) for network in networks] return self.exempted_networks def allow_request(self, request, view): # class based view case if not self.scope: default_scope = getattr(view, self.scope_attr, None) # check if there is a specific rate limiting associated # to the request type try: request_scope = default_scope + '_' + request.method.lower() setattr(view, self.scope_attr, request_scope) request_allowed = \ super(SwhWebRateThrottle, self).allow_request(request, view) # noqa setattr(view, self.scope_attr, default_scope) # use default rate limiting otherwise except: setattr(view, self.scope_attr, default_scope) request_allowed = \ super(SwhWebRateThrottle, self).allow_request(request, view) # noqa # function based view case else: default_scope = self.scope # check if there is a specific rate limiting associated # to the request type try: self.scope = default_scope + '_' + request.method.lower() self.rate = self.get_rate() # use default rate limiting otherwise except: self.scope = default_scope self.rate = self.get_rate() self.num_requests, self.duration = self.parse_rate(self.rate) request_allowed = \ super(ScopedRateThrottle, self).allow_request(request, view) self.scope = default_scope exempted_networks = self.get_exempted_networks(default_scope) if exempted_networks: remote_address = ipaddress.ip_address(self.get_ident(request)) return any(remote_address in network for network in exempted_networks) or \ request_allowed return request_allowed def throttle_scope(scope): """Decorator that allows the throttle scope of a DRF function based view to be set:: @api_view(['GET', ]) @throttle_scope('scope') def view(request): ... """ def decorator(func): SwhScopeRateThrottle = type( 'CustomScopeRateThrottle', (SwhWebRateThrottle,), {'scope': scope} ) func.throttle_classes = (SwhScopeRateThrottle, ) return func return decorator diff --git a/swh/web/common/urlsindex.py b/swh/web/common/urlsindex.py index 776c75b3..dc0bde36 100644 --- a/swh/web/common/urlsindex.py +++ b/swh/web/common/urlsindex.py @@ -1,48 +1,48 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information from django.conf.urls import url class UrlsIndex(object): """ Simple helper class for centralizing url patterns of a Django web application. Derived classes should override the 'scope' class attribute otherwise all declared patterns will be grouped under the default one. """ - urlpatterns = {} + _urlpatterns = {} scope = 'default' @classmethod def add_url_pattern(cls, url_pattern, view, view_name): """ Class method that adds an url pattern to the current scope. Args: url_pattern: regex describing a Django url view: function implementing the Django view view_name: name of the view used to reverse the url """ - if cls.scope not in cls.urlpatterns: - cls.urlpatterns[cls.scope] = [] + if cls.scope not in cls._urlpatterns: + cls._urlpatterns[cls.scope] = [] if view_name: - cls.urlpatterns[cls.scope].append(url(url_pattern, view, - name=view_name)) + cls._urlpatterns[cls.scope].append(url(url_pattern, view, + name=view_name)) else: - cls.urlpatterns[cls.scope].append(url(url_pattern, view)) + cls._urlpatterns[cls.scope].append(url(url_pattern, view)) @classmethod def get_url_patterns(cls): """ Class method that returns the list of url pattern associated to the current scope. Returns: The list of url patterns associated to the current scope """ - return cls.urlpatterns[cls.scope] + return cls._urlpatterns[cls.scope] diff --git a/swh/web/settings/common.py b/swh/web/settings/common.py index c41d28af..3f25d4a8 100644 --- a/swh/web/settings/common.py +++ b/swh/web/settings/common.py @@ -1,201 +1,193 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information """ -Django settings for swhweb project. - -Generated by 'django-admin startproject' using Django 1.11.3. - -For more information on this file, see -https://docs.djangoproject.com/en/1.11/topics/settings/ - -For the full list of settings and their values, see -https://docs.djangoproject.com/en/1.11/ref/settings/ +Django common settings for swh-web. """ import os from swh.web.config import get_config swh_web_config = get_config() # Build paths inside the project like this: os.path.join(BASE_DIR, ...) PROJECT_DIR = os.path.dirname(os.path.abspath(__file__)) # Quick-start development settings - unsuitable for production # See https://docs.djangoproject.com/en/1.11/howto/deployment/checklist/ # SECURITY WARNING: keep the secret key used in production secret! SECRET_KEY = swh_web_config['secret_key'] # SECURITY WARNING: don't run with debug turned on in production! DEBUG = swh_web_config['debug'] DEBUG_PROPAGATE_EXCEPTIONS = swh_web_config['debug'] ALLOWED_HOSTS = ['127.0.0.1', 'localhost'] + swh_web_config['allowed_hosts'] # Application definition INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'rest_framework', 'swh.web.api', 'swh.web.browse' ] MIDDLEWARE = [ 'django.middleware.security.SecurityMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware' ] ROOT_URLCONF = 'swh.web.urls' TEMPLATES = [ { 'BACKEND': 'django.template.backends.django.DjangoTemplates', 'DIRS': [os.path.join(PROJECT_DIR, "../templates")], 'APP_DIRS': True, 'OPTIONS': { 'context_processors': [ 'django.template.context_processors.debug', 'django.template.context_processors.request', 'django.contrib.auth.context_processors.auth', 'django.contrib.messages.context_processors.messages', ], 'libraries': { 'swh_templatetags': 'swh.web.common.swh_templatetags', }, }, }, ] WSGI_APPLICATION = 'swh.web.wsgi.application' # Database # https://docs.djangoproject.com/en/1.11/ref/settings/#databases DATABASES = { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': os.path.join(PROJECT_DIR, 'db.sqlite3'), } } # Password validation # https://docs.djangoproject.com/en/1.11/ref/settings/#auth-password-validators AUTH_PASSWORD_VALIDATORS = [ { 'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator', # noqa }, { 'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator', # noqa }, { 'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator', # noqa }, { 'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator', # noqa }, ] # Internationalization # https://docs.djangoproject.com/en/1.11/topics/i18n/ LANGUAGE_CODE = 'en-us' TIME_ZONE = 'UTC' USE_I18N = True USE_L10N = True USE_TZ = True # Static files (CSS, JavaScript, Images) # https://docs.djangoproject.com/en/1.11/howto/static-files/ STATIC_URL = '/static/' STATICFILES_DIRS = [ os.path.join(PROJECT_DIR, "../static") ] INTERNAL_IPS = ['127.0.0.1'] throttle_rates = {} http_requests = ['GET', 'HEAD', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'PATCH'] throttling = swh_web_config['throttling'] for limiter_scope, limiter_conf in throttling['scopes'].items(): if 'default' in limiter_conf['limiter_rate']: throttle_rates[limiter_scope] = limiter_conf['limiter_rate']['default'] # for backward compatibility else: throttle_rates[limiter_scope] = limiter_conf['limiter_rate'] # register sub scopes specific for HTTP request types for http_request in http_requests: if http_request in limiter_conf['limiter_rate']: throttle_rates[limiter_scope + '_' + http_request.lower()] = \ limiter_conf['limiter_rate'][http_request] REST_FRAMEWORK = { 'DEFAULT_RENDERER_CLASSES': ( 'rest_framework.renderers.JSONRenderer', 'swh.web.api.renderers.YAMLRenderer', 'rest_framework.renderers.TemplateHTMLRenderer' ), 'DEFAULT_THROTTLE_CLASSES': ( 'swh.web.common.throttling.SwhWebRateThrottle', ), 'DEFAULT_THROTTLE_RATES': throttle_rates } LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'filters': { 'require_debug_false': { '()': 'django.utils.log.RequireDebugFalse', }, 'require_debug_true': { '()': 'django.utils.log.RequireDebugTrue', }, }, 'handlers': { 'console': { 'level': 'DEBUG', 'filters': ['require_debug_true'], 'class': 'logging.StreamHandler', }, 'file': { 'level': 'INFO', 'filters': ['require_debug_false'], 'class': 'logging.FileHandler', 'filename': os.path.join(swh_web_config['log_dir'], 'swh-web.log'), }, }, 'loggers': { 'django': { 'handlers': ['console', 'file'], 'level': 'DEBUG' if DEBUG else 'INFO', 'propagate': True, } }, } diff --git a/swh/web/settings/development.py b/swh/web/settings/development.py index 4347d4ff..0fbcdc26 100644 --- a/swh/web/settings/development.py +++ b/swh/web/settings/development.py @@ -1,10 +1,13 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information +""" +Django development settings for swh-web. +""" from .common import * # noqa from django.core.cache import cache cache.clear() diff --git a/swh/web/settings/production.py b/swh/web/settings/production.py index 619f817b..91857719 100644 --- a/swh/web/settings/production.py +++ b/swh/web/settings/production.py @@ -1,29 +1,33 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information +""" +Django production settings for swh-web. +""" + from .common import * # noqa from .common import swh_web_config from .common import REST_FRAMEWORK # activate per-site caching MIDDLEWARE += ['django.middleware.cache.UpdateCacheMiddleware', # noqa 'django.middleware.common.CommonMiddleware', 'django.middleware.cache.FetchFromCacheMiddleware'] CACHES = { 'default': { 'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache', 'LOCATION': swh_web_config['throttling']['cache_uri'], } } # Setup support for proxy headers USE_X_FORWARDED_HOST = True SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https') ALLOWED_HOSTS += ['archive.softwareheritage.org'] # noqa # We're going through seven (or, in that case, 2) proxies thanks to Varnish REST_FRAMEWORK['NUM_PROXIES'] = 2 diff --git a/swh/web/settings/tests.py b/swh/web/settings/tests.py index 8bab5262..7c9f8025 100644 --- a/swh/web/settings/tests.py +++ b/swh/web/settings/tests.py @@ -1,63 +1,67 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information # flake8: noqa +""" +Django tests settings for swh-web. +""" + from swh.web.config import get_config scope1_limiter_rate = 3 scope1_limiter_rate_post = 1 scope2_limiter_rate = 5 scope2_limiter_rate_post = 2 scope3_limiter_rate = 1 scope3_limiter_rate_post = 1 swh_web_config = get_config() swh_web_config.update({ 'debug': True, 'secret_key': 'test', 'throttling': { 'cache_uri': None, 'scopes': { 'swh_api': { 'limiter_rate': { 'default': '60/min' }, 'exempted_networks': ['127.0.0.0/8'] }, 'swh_vault_cooking': { 'limiter_rate': { 'default': '120/h', 'GET': '60/m' }, 'exempted_networks': ['127.0.0.0/8'] }, 'scope1': { 'limiter_rate': { 'default': '%s/min' % scope1_limiter_rate, 'POST': '%s/min' % scope1_limiter_rate_post, } }, 'scope2': { 'limiter_rate': { 'default': '%s/min' % scope2_limiter_rate, 'POST': '%s/min' % scope2_limiter_rate_post } }, 'scope3': { 'limiter_rate': { 'default': '%s/min' % scope3_limiter_rate, 'POST': '%s/min' % scope3_limiter_rate_post }, 'exempted_networks': ['127.0.0.0/8'] } } } }) from .common import * ALLOWED_HOSTS += ['testserver'] # noqa \ No newline at end of file diff --git a/swh/web/templates/includes/browse-help.html b/swh/web/templates/includes/browse-help.html index f1b5c04e..5599dfcd 100644 --- a/swh/web/templates/includes/browse-help.html +++ b/swh/web/templates/includes/browse-help.html @@ -1,150 +1,146 @@

How to browse the Software Heritage archive ?

Overview

This web application aims to provide HTML views to easily navigate in the Software Heritage archive. This is an ongoing development and new features and improvements will be progressively added over the time.

URI scheme

The current URI scheme of that web application is described below and depends on the type of Software Heritage object to browse. Its exhaustive documentation can be consulted from the official Software Heritage development documentation

Context-independent browsing

Context-independent URLs provide information about SWH objects (e.g., revisions, directories, contents, persons, …), independently of the contexts where they have been found (e.g., specific software origins, branches, commits, …).

Below are some examples of endpoints used to just render the corresponding information for user consumption:

Where hyperlinks are created when browsing these kind of endpoints, they always point to other context-independent browsing URLs.

Context-dependent browsing

Context-dependent URLs provide information about SWH objects, limited to specific contexts where the objects have been found.

Currently, browsing the Software Heritage objects in the context of an origin is available. Below are some examples of such endpoints:

Search software origins to browse

In order to facilitate the browsing of the archive and generate relevant entry points to it, a search interface is available. Currently, it enables to search software origins from the URLs they were retrieved from. More search criteria will be added in the future.