Page MenuHomeSoftware Heritage

Review browse endpoints for origin URLs ending with a slash
Closed, MigratedEdits Locked

Description

The origins we have imported from Google Code end with a slash.

For instance: https://archive.softwareheritage.org/api/1/origin/hg/url/https://atlasmapper.googlecode.com/hg/

Some of the browse URLs succeed, e.g. https://archive.softwareheritage.org/browse/origin/hg/url/https://atlasmapper.googlecode.com/hg//, but only with a double slash at the end. Trying to go further fails with a None-related error.

We should review all navigation paths containing origin URLs to make sure they work with URLs ending with a slash.

Event Timeline

olasd triaged this task as High priority.Mar 23 2018, 5:28 PM
olasd created this task.

At one point, I considered suggesting that we urlencode all the slashes that are part of "url parameters", but WSGI is expected to decode those before passing them on to the application... We'd need to check whether django actually does this - ref: https://github.com/pallets/flask/issues/900

The encountered issues are now fixed and updated swh-web is now deployed to production. Turns out the origin_url parameter was correctly parsed by django but the handle of null dates coming from swh-storage was not. For instance, there is no dates associated to the releases of https://atlasmapper.googlecode.com/hg/ (see https://archive.softwareheritage.org/browse/origin/hg/url/https://atlasmapper.googlecode.com/hg//visit/2016-05-03T15:16:32/releases/). An exception was raised when trying to convert the input dates and so the requested browse views were not displayed.