Page MenuHomeSoftware Heritage

Deprecate most of the /browse/origin/.* URLs
Open, NormalPublic

Description

swh-web historically declares and uses origin centric URLs but most of them are now totally redundant with other ones.

For instance, /browse/origin/directory/?origin_url=https://github.com/python/cpython&path=Include/internal is equivalent to
/browse/directory/ded25a0c00d4b1be1af5cb8a585de51bf5fad3be/?origin_url=https://github.com/python/cpython&path=Include/internal.

We should rather use URLs targeting archive first class citizens (content, directory, release, revision, snapshot) instead of those origin centric ones,
thus we should deprecate them and replace their implementation by HTTP redirections to equivalent ones.

The following origin centric URLs can be deprecated in favor of those declared in swh/web/browse/views/<object_type>.py:

  • /browse/origin/directory/
  • /browse/origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/
  • /browse/origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/directory/(?P<path>.+)/
  • /browse/origin/(?P<origin_url>.+)/directory/(?P<path>.+)/
  • /browse/origin/(?P<origin_url>.+)/directory/
  • /browse/origin/content/
  • /browse/origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/content/(?P<path>.+)/
  • /browse/origin/(?P<origin_url>.+)/content/(?P<path>.+)/
  • /browse/origin/(?P<origin_url>.+)/content/
  • /browse/origin/log/
  • /browse/origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/log/
  • /browse/origin/(?P<origin_url>.+)/log/
  • /browse/origin/branches/", view_name="browse-origin-branches
  • /browse/origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/branches/
  • /browse/origin/(?P<origin_url>.+)/branches/
  • /browse/origin/releases/
  • /browse/origin/(?P<origin_url>.+)/visit/(?P<timestamp>.+)/releases/
  • /browse/origin/(?P<origin_url>.+)/releases/

When deprecating those URLs, we must ensure that all arguments can still be passed as query parameters
to the redirection URLs.

Event Timeline

anlambert triaged this task as Normal priority.Sep 23 2021, 4:34 PM
anlambert created this task.

@anlambert This ticket might have some performance implications.
for eg: in the first case, to redirect /browse/origin/directory/?origin_url=<> to the root directory, we have to query the archive first. The obvious way would be to call the get_snapshot_context function.
https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/snapshot_context.py$395

The issue is, the redirected url (/browse/directory/<sha1>?origin_url=<>) will call the same function (get_snapshot_context) again. This is true for all the urls.
Do you see some other efficient alternative to fix this ticket?

The issue is, the redirected url (/browse/directory/<sha1>?origin_url=<>) will call the same function (get_snapshot_context) again. This is true for all the urls.
Do you see some other efficient alternative to fix this ticket?

@jayeshv, all costly storage requests result (getting origin visits, snapshot content and size) performed by the get_snapshot_context
function are cached at the django level so no need to worry about performance here as it has already been handled.

The issue is, the redirected url (/browse/directory/<sha1>?origin_url=<>) will call the same function (get_snapshot_context) again. This is true for all the urls.
Do you see some other efficient alternative to fix this ticket?

@jayeshv, all costly storage requests result (getting origin visits, snapshot content and size) performed by the get_snapshot_context
function are cached at the django level so no need to worry about performance here as it has already been handled.

Ok. Thanks

@anlambert do you think we can deprecate following routes as well? I think they can be redirected to the corresponding swh/web/browse/views/<object_type>.py routes.

  • /snapshot/(?P<snapshot_id>[0-9a-f]+)/content/
  • /snapshot/(?P<snapshot_id>[0-9a-f]+)/directory/(?P<path>.+)
  • / snapshot/(?P<snapshot_id>[0-9a-f]+)/directory/
  • `/snapshot/(?P<snapshot_id>[0-9a-f]+)/content/(?P<path>.+)/'

we can delete a lot of code from snapshot_context.py module then

Just to be clear, you're looking to keep these URL working, but turn them into redirects over to swhid-centric URLs with context parameters (and drop the original view code from these URLs), correct?

I'm asking this because using predictable origin-centric URLs is generally much more user friendly than having to use multiple APIs to look up the SWHID of a given object before being able to construct the URL, and one would have to always to dynamic API calls to generate the URL for browsing the "latest archival" of a given origin.

For instance, the "archived origin" SWH badge https://www.softwareheritage.org/2020/01/13/the-swh-badges-are-here/ uses an origin-centric URL.

In T3608#71802, @olasd wrote:

Just to be clear, you're looking to keep these URL working, but turn them into redirects over to swhid-centric URLs with context parameters (and drop the original view code from these URLs), correct?

@olasd Correct, We are just doing a permanent redirect. In fact, we will run the same code twice because of this redirect (most notably get_snapshot_context function). @anlambert told it is fine as we are caching that.

I'm trying to reduce complexity here by deleting a lot of code that I hope is not used after this redirection.

In T3608#71803, @olasd wrote:

I'm asking this because using predictable origin-centric URLs is generally much more user friendly than having to use multiple APIs to look up the SWHID of a given object before being able to construct the URL, and one would have to always to dynamic API calls to generate the URL for browsing the "latest archival" of a given origin.

For instance, the "archived origin" SWH badge https://www.softwareheritage.org/2020/01/13/the-swh-badges-are-here/ uses an origin-centric URL.

@olasd If you want a sneak peek, here is a wip diff https://forge.softwareheritage.org/D6430
Ideally, no existing route will fail because of this fix.

Awesome, thanks for confirming this!

Just to be clear, you're looking to keep these URL working, but turn them into redirects over to swhid-centric URLs with context parameters (and drop the original view code from these URLs), correct?

Yes the idea is to reorganize browse URLs as most of them are now redundant and remove kind of duplicated code in views implementation.

I'm asking this because using predictable origin-centric URLs is generally much more user friendly than having to use multiple APIs to look up the SWHID of a given object before being able to construct the URL, and one would have to always to dynamic API calls to generate the URL for browsing the "latest archival" of a given origin.

It will still be possible to use that kind of URLs, for instance /browse/directory?origin_url=<origin_url> will redirect to
/browse/directory/<root_sha1_git>?origin_url=<origin_url>&snapshot=<last_snapshot>
and will be equivalent to /browse/origin/directory?origin_url=<origin_url>

For instance, the "archived origin" SWH badge https://www.softwareheritage.org/2020/01/13/the-swh-badges-are-here/ uses an origin-centric URL.

Yes because it makes sense in that case.

@anlambert do you think we can deprecate following routes as well? I think they can be redirected to the corresponding swh/web/browse/views/<object_type>.py routes.

  • /snapshot/(?P<snapshot_id>[0-9a-f]+)/content/
  • /snapshot/(?P<snapshot_id>[0-9a-f]+)/directory/(?P<path>.+)
  • / snapshot/(?P<snapshot_id>[0-9a-f]+)/directory/
  • `/snapshot/(?P<snapshot_id>[0-9a-f]+)/content/(?P<path>.+)/'

we can delete a lot of code from snapshot_context.py module then

Yes we can, the snapshot should be provided as query parameter in other views in that case.

Yes we can, the snapshot should be provided as query parameter in other views in that case.

Thanks. Then we can delete most of the code in https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/snapshot_context.py

Testing will be a bit tricky, I will try to add some generic cypress test.

Yes we can, the snapshot should be provided as query parameter in other views in that case.

Thanks. Then we can delete most of the code in https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/snapshot_context.py

Testing will be a bit tricky, I will try to add some generic cypress test.

Yes, that's the idea, removing duplicated code.

For the tests, you should simply test redirections work as expected and replace the origin centric URLs by their SWHID counterparts in the tests code I think