diff --git a/docs/index.rst b/docs/index.rst index 3eb0916d..2e7d327e 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,22 +1,23 @@ .. _swh-web: Software Heritage Web Applications - Development Documentation =============================================================== .. toctree:: :maxdepth: 3 :caption: Contents: developers-info uri-scheme-api uri-scheme-browse + uri-scheme-identifiers Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * `URLs index `_ * :ref:`search` diff --git a/docs/uri-scheme-browse.rst b/docs/uri-scheme-browse.rst index 1021f5ae..b18c6b56 100644 --- a/docs/uri-scheme-browse.rst +++ b/docs/uri-scheme-browse.rst @@ -1,101 +1,99 @@ URI scheme for SWH Web Browse application ========================================= This web application aims to provide HTML views to easily navigate in the SWH archive, thus it needs to be reached from a web browser. If you intend to query the SWH archive programmatically through any HTTP client, please refer to the :ref:`swh-web-api-urls` section instead. Context-independent browsing ---------------------------- Context-independent URLs provide information about SWH objects (e.g., revisions, directories, contents, person, ...), independently of the contexts where they have been found (e.g., specific repositories, branches, commits, ...). The following endpoints are the same of the API case (see below), and just render the corresponding information for user consumption. Where hyperlinks are created, they always point to other context-independent user URLs: * :http:get:`/browse/content/[(algo_hash):](hash)/`: Display a SWH content * :http:get:`/browse/content/[(algo_hash):](hash)/raw/`: Get / Download SWH content raw data * :http:get:`/browse/directory/(sha1_git)/[(path)/]`: Browse the content of a SWH directory * :http:get:`/browse/person/(person_id)/`: Information on a SWH person * :http:get:`/browse/revision/(sha1_git)/`: Browse a SWH revision * :http:get:`/browse/revision/(sha1_git)/log/`: Browse history log heading to a SWH revision Context-dependent browsing -------------------------- Context-dependent URLs provide information about SWH objects, limited to specific contexts where the objects have been found. For instance, instead of having to specify a (root) revision by *sha1_git*, users might want to specify a place and a time. In SWH a "place" is an origin, with an optional branch name; a "time" is a timestamp at which some place has been observed by SWH crawlers. Wherever a revision context is expected in a path (i.e., a **/browse/revision/(sha1_git)/** path fragment) we can put in its stead a path fragment of the form **/origin/(origin_type)/url/(origin_url)/[/visit/(timestamp)/][?branch=(branch)]**. Such a fragment is resolved, internally by the SWH archive, to a revision *sha1_git* as follows: - if *timestamp* is absent: look for the most recent crawl of origin identified by *origin_type* and *origin_url* - if *timestamp* is given: look for the closest crawl of origin identified by *origin_type* and *origin_url* from timestamp *timestamp* - if *branch* is given as a query parameter: look for the branch *branch* - if *branch* is absent: look for branch "HEAD" or "master" - return the revision *sha1_git* pointed by the chosen branch The already mentioned URLs for revision contexts can therefore be alternatively specified by users as: * :http:get:`/browse/origin/[(origin_type)/url/](origin_url)/directory/[(path)/]` * :http:get:`/browse/origin/[(origin_type)/url/](origin_url)/visit/(timestamp)/directory/[(path)/]` * :http:get:`/browse/origin/[(origin_type)/url/](origin_url)/content/(path)/` * :http:get:`/browse/origin/[(origin_type)/url/](origin_url)/visit/(timestamp)/content/(path)/` * :http:get:`/browse/origin/[(origin_type)/url/](origin_url)/log/` * :http:get:`/browse/origin/[(origin_type)/url/](origin_url)/visit/(timestamp)/log/` Typing: - *origin_type* corresponds to the type of the archived origin: *git*, *svn*, *hg*, *deposit*, *deb*, ... - *origin_url* corresponds to the URL the origin was crawled from, for instance https://github.com/(user)/(repo)/ - *branch* name is given as per the corresponding VCS (e.g., Git) as a query parameter to the requestes URL. - *timestamp* is given in a format as liberal as possible, to uphold the principle of least surprise. At the very minimum it is possible to enter timestamps as: - Unix epoch timestamp (see for instance the output of `date +%s`) - ISO 8601 timestamps (see for instance the output of `date -I`, `date -Is`) - YYYY[MM[DD[HH[MM[SS]]]]] ad-hoc format - YYYY[-MM[-DD[ HH:[MM:[SS:]]]]] ad-hoc format SWH Browse Urls --------------- .. include:: uri-scheme-browse-content.rst .. include:: uri-scheme-browse-directory.rst -.. include:: uri-scheme-browse-identifiers.rst - .. include:: uri-scheme-browse-origin.rst .. include:: uri-scheme-browse-person.rst .. include:: uri-scheme-browse-release.rst .. include:: uri-scheme-browse-revision.rst .. include:: uri-scheme-browse-snapshot.rst diff --git a/docs/uri-scheme-browse-identifiers.rst b/docs/uri-scheme-identifiers.rst similarity index 63% rename from docs/uri-scheme-browse-identifiers.rst rename to docs/uri-scheme-identifiers.rst index 511d495b..871d5c4f 100644 --- a/docs/uri-scheme-browse-identifiers.rst +++ b/docs/uri-scheme-identifiers.rst @@ -1,30 +1,30 @@ -SWH identifiers -^^^^^^^^^^^^^^^ +URI scheme for SWH identifiers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A subset of Software Heritage objects (contents, directories, releases and revisions) can be browsed using :ref:`persistent-identifiers`. Those identifiers are guaranteed to remain stable (persistent) over time. -.. http:get:: /browse/(swh_id)/ +.. http:get:: /(swh_id)/ End point to browse SWH objects using their persistent identifiers. A redirection to the adequate HTML view will be performed when reaching it. :param string swh_id: a persistent identifier for a SWH object (see the :ref:`persistent-identifiers` section to learn more about its syntax) :resheader Location: the redirection URL for browsing the SWH object associated to the provided identifier :statuscode 302: no error :statuscode 400: the provided identifier is malformed **Examples:** .. parsed-literal:: - :swh_web_browse:`swh:1:cnt:0ffd12d85cdec70c88e852fc3f5ea9fd342213cd` - :swh_web_browse:`swh:1:dir:db990da9af15427455ce7836ce2b8a34b9bf67f5` - :swh_web_browse:`swh:1:rel:a9b7e3f1eada90250a6b2ab2ef3e0a846cb16831` - :swh_web_browse:`swh:1:rev:f1b94134a4b879bc55c3dacdb496690c8ebdc03f` - :swh_web_browse:`swh:1:snp:673156c31a876c5b99b2fe3e89615529de9a3c44` + :swh_web:`swh:1:cnt:0ffd12d85cdec70c88e852fc3f5ea9fd342213cd` + :swh_web:`swh:1:dir:db990da9af15427455ce7836ce2b8a34b9bf67f5` + :swh_web:`swh:1:rel:a9b7e3f1eada90250a6b2ab2ef3e0a846cb16831` + :swh_web:`swh:1:rev:f1b94134a4b879bc55c3dacdb496690c8ebdc03f` + :swh_web:`swh:1:snp:673156c31a876c5b99b2fe3e89615529de9a3c44` diff --git a/swh/web/browse/views/identifiers.py b/swh/web/browse/identifiers.py similarity index 69% rename from swh/web/browse/views/identifiers.py rename to swh/web/browse/identifiers.py index 82543566..45709bba 100644 --- a/swh/web/browse/views/identifiers.py +++ b/swh/web/browse/identifiers.py @@ -1,56 +1,62 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.shortcuts import redirect from swh.model.identifiers import parse_persistent_identifier -from swh.web.browse.browseurls import browse_route from swh.web.common.utils import reverse from swh.web.common.exc import BadInputExc, handle_view_exception -@browse_route(r'(?Pswh:[0-9]+:[a-z]+:[0-9a-f]+)/', - view_name='browse-swh-id') def swh_id_browse(request, swh_id): """ Django view enabling to browse the SWH archive using :ref:`persistent-identifiers`. - The url that points to it is :http:get:`/browse/(swh_id)/`. + The url that points to it is :http:get:`/(swh_id)/`. """ try: swh_id_parsed = parse_persistent_identifier(swh_id) object_type = swh_id_parsed['object_type'] object_id = swh_id_parsed['object_id'] view_url = None + query_params = request.GET.copy() + if 'origin' in swh_id_parsed['metadata']: + query_params['origin'] = swh_id_parsed['metadata']['origin'] if object_type == 'cnt': query_string = 'sha1_git:' + object_id + fragment = '' + if 'lines' in swh_id_parsed['metadata']: + lines = swh_id_parsed['metadata']['lines'].split('-') + fragment += '#L' + lines[0] + if len(lines) > 1: + fragment += '-L' + lines[1] view_url = reverse('browse-content', kwargs={'query_string': query_string}, - query_params=request.GET) + query_params=query_params) + fragment elif object_type == 'dir': view_url = reverse('browse-directory', kwargs={'sha1_git': object_id}, - query_params=request.GET) + query_params=query_params) elif object_type == 'rel': view_url = reverse('browse-release', kwargs={'sha1_git': object_id}, - query_params=request.GET) + query_params=query_params) elif object_type == 'rev': view_url = reverse('browse-revision', kwargs={'sha1_git': object_id}, - query_params=request.GET) + query_params=query_params) elif object_type == 'snp': view_url = reverse('browse-snapshot', kwargs={'snapshot_id': object_id}, - query_params=request.GET) + query_params=query_params) else: msg = '\'%s\' is not a valid SWH persistent identifier!' % swh_id raise BadInputExc(msg) except Exception as exc: return handle_view_exception(request, exc) return redirect(view_url) diff --git a/swh/web/browse/urls.py b/swh/web/browse/urls.py index 99f65719..175da2ee 100644 --- a/swh/web/browse/urls.py +++ b/swh/web/browse/urls.py @@ -1,49 +1,48 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.conf.urls import url from django.shortcuts import render import swh.web.browse.views.directory # noqa import swh.web.browse.views.content # noqa -import swh.web.browse.views.identifiers # noqa import swh.web.browse.views.origin # noqa import swh.web.browse.views.person # noqa import swh.web.browse.views.release # noqa import swh.web.browse.views.revision # noqa import swh.web.browse.views.snapshot # noqa from swh.web.browse.browseurls import BrowseUrls def _default_browse_view(request): return render(request, 'browse.html', {'heading': 'Browse', 'empty_browse': True}) def _browse_help_view(request): return render(request, 'browse-help.html', {'heading': 'Help'}) def _browse_search_view(request): return render(request, 'browse-search.html', {'heading': 'Search'}) def _browse_vault_view(request): return render(request, 'browse-vault-ui.html', {'heading': 'Vault'}) urlpatterns = [ url(r'^$', _default_browse_view, name='browse-mainpage'), url(r'^help/$', _browse_help_view, name='browse-help'), url(r'^search/$', _browse_search_view, name='browse-search'), url(r'^vault/$', _browse_vault_view, name='browse-vault') ] urlpatterns += BrowseUrls.get_url_patterns() diff --git a/swh/web/doc_config.py b/swh/web/doc_config.py index 7ccd576d..b6416bd6 100644 --- a/swh/web/doc_config.py +++ b/swh/web/doc_config.py @@ -1,34 +1,35 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from sphinxcontrib import httpdomain _swh_web_base_url = 'https://archive.softwareheritage.org' _swh_web_api_endpoint = 'api' _swh_web_api_version = 1 _swh_web_api_url = '%s/%s/%s/' % (_swh_web_base_url, _swh_web_api_endpoint, _swh_web_api_version) _swh_web_browse_endpoint = 'browse' _swh_web_browse_url = '%s/%s/' % (_swh_web_base_url, _swh_web_browse_endpoint) def customize_sphinx_conf(sphinx_conf): """ Utility function used to customize the sphinx doc build for swh-web globally (when building doc from swh-docs) or locally (when building doc from swh-web). Args: sphinx_conf (module): a reference to the sphinx conf.py module used to build the doc. """ # fix for sphinxcontrib.httpdomain 1.3 if 'Link' not in httpdomain.HEADER_REFS: httpdomain.HEADER_REFS['Link'] = httpdomain.IETFRef(5988, '5') + sphinx_conf.extlinks['swh_web'] = (_swh_web_base_url + '/%s', None) sphinx_conf.extlinks['swh_web_api'] = (_swh_web_api_url + '%s', None) sphinx_conf.extlinks['swh_web_browse'] = (_swh_web_browse_url + '%s', None) diff --git a/swh/web/tests/browse/views/test_identifiers.py b/swh/web/tests/browse/views/test_identifiers.py index 1389b84c..c39df00e 100644 --- a/swh/web/tests/browse/views/test_identifiers.py +++ b/swh/web/tests/browse/views/test_identifiers.py @@ -1,125 +1,144 @@ # Copyright (C) 2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information # flake8: noqa from unittest.mock import patch from nose.tools import istest from django.test import TestCase from swh.web.common.exc import BadInputExc from swh.web.common.utils import reverse from swh.web.tests.testbase import SWHWebTestBase from .data.content_test_data import stub_content_text_data from .data.directory_test_data import stub_root_directory_sha1 from .data.revision_test_data import revision_id_test from .data.release_test_data import stub_release swh_id_prefix = 'swh:1:' class SwhBrowseIdTest(SWHWebTestBase, TestCase): @istest def content_id_browse(self): cnt_sha1_git = stub_content_text_data['checksums']['sha1_git'] swh_id = swh_id_prefix + 'cnt:' + cnt_sha1_git url = reverse('browse-swh-id', kwargs={'swh_id': swh_id}) query_string = 'sha1_git:' + cnt_sha1_git content_browse_url = reverse('browse-content', kwargs={'query_string': query_string}) resp = self.client.get(url) self.assertEquals(resp.status_code, 302) self.assertEqual(resp['location'], content_browse_url) @istest def directory_id_browse(self): swh_id = swh_id_prefix + 'dir:' + stub_root_directory_sha1 url = reverse('browse-swh-id', kwargs={'swh_id': swh_id}) directory_browse_url = reverse('browse-directory', kwargs={'sha1_git': stub_root_directory_sha1}) resp = self.client.get(url) self.assertEquals(resp.status_code, 302) self.assertEqual(resp['location'], directory_browse_url) @istest def revision_id_browse(self): swh_id = swh_id_prefix + 'rev:' + revision_id_test url = reverse('browse-swh-id', kwargs={'swh_id': swh_id}) revision_browse_url = reverse('browse-revision', kwargs={'sha1_git': revision_id_test}) resp = self.client.get(url) self.assertEquals(resp.status_code, 302) self.assertEqual(resp['location'], revision_browse_url) query_params = {'origin_type': 'git', 'origin_url': 'https://github.com/webpack/webpack'} url = reverse('browse-swh-id', kwargs={'swh_id': swh_id}, query_params=query_params) revision_browse_url = reverse('browse-revision', kwargs={'sha1_git': revision_id_test}, query_params=query_params) resp = self.client.get(url) self.assertEquals(resp.status_code, 302) self.assertEqual(resp['location'], revision_browse_url) @istest def release_id_browse(self): swh_id = swh_id_prefix + 'rel:' + stub_release['id'] url = reverse('browse-swh-id', kwargs={'swh_id': swh_id}) release_browse_url = reverse('browse-release', kwargs={'sha1_git': stub_release['id']}) resp = self.client.get(url) self.assertEquals(resp.status_code, 302) self.assertEqual(resp['location'], release_browse_url) query_params = {'origin_type': 'git', 'origin_url': 'https://github.com/python/cpython'} url = reverse('browse-swh-id', kwargs={'swh_id': swh_id}, query_params=query_params) release_browse_url = reverse('browse-release', kwargs={'sha1_git': stub_release['id']}, query_params=query_params) resp = self.client.get(url) self.assertEquals(resp.status_code, 302) self.assertEqual(resp['location'], release_browse_url) @istest def bad_id_browse(self): swh_id = swh_id_prefix + 'foo:' + stub_release['id'] url = reverse('browse-swh-id', kwargs={'swh_id': swh_id}) resp = self.client.get(url) self.assertEquals(resp.status_code, 400) + @istest + def content_id_optional_parts_browse(self): + cnt_sha1_git = stub_content_text_data['checksums']['sha1_git'] + optional_parts = ';lines=4-20;origin=https://github.com/user/repo' + swh_id = swh_id_prefix + 'cnt:' + cnt_sha1_git + optional_parts + url = reverse('browse-swh-id', + kwargs={'swh_id': swh_id}) + + query_string = 'sha1_git:' + cnt_sha1_git + content_browse_url = reverse('browse-content', + kwargs={'query_string': query_string}, + query_params={'origin' : 'https://github.com/user/repo'}) + content_browse_url += '#L4-L20' + + resp = self.client.get(url) + + self.assertEquals(resp.status_code, 302) + self.assertEqual(resp['location'], content_browse_url) + diff --git a/swh/web/urls.py b/swh/web/urls.py index 7976eb81..6f7423ad 100644 --- a/swh/web/urls.py +++ b/swh/web/urls.py @@ -1,47 +1,51 @@ # Copyright (C) 2017-2018 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU Affero General Public License version 3, or any later version # See top-level LICENSE file for more information from django.conf import settings from django.conf.urls import ( url, include, handler400, handler403, handler404, handler500 ) from django.contrib.staticfiles.urls import staticfiles_urlpatterns from django.contrib.staticfiles.views import serve from django.shortcuts import render from django.views.generic.base import RedirectView from django_js_reverse.views import urls_js from swh.web.common.exc import ( swh_handle400, swh_handle403, swh_handle404, swh_handle500 ) +from swh.web.browse.identifiers import swh_id_browse + favicon_view = RedirectView.as_view(url='/static/img/icons/swh-logo-32x32.png', permanent=True) def default_view(request): return render(request, "homepage.html") urlpatterns = [ url(r'^favicon\.ico$', favicon_view), url(r'^api/', include('swh.web.api.urls')), url(r'^browse/', include('swh.web.browse.urls')), url(r'^$', default_view, name='swh-web-homepage'), - url(r'^jsreverse/$', urls_js, name='js_reverse') + url(r'^jsreverse/$', urls_js, name='js_reverse'), + url(r'^(?Pswh:[0-9]+:[a-z]+:[0-9a-f]+.*)/$', + swh_id_browse, name='browse-swh-id') ] # enable to serve compressed assets through django development server if settings.DEBUG: static_pattern = r'^%s(?P.*)$' % settings.STATIC_URL[1:] urlpatterns.append(url(static_pattern, serve)) else: urlpatterns += staticfiles_urlpatterns() handler400 = swh_handle400 # noqa handler403 = swh_handle403 # noqa handler404 = swh_handle404 # noqa handler500 = swh_handle500 # noqa