Page MenuHomeSoftware Heritage

api: Return absolute URIs in JSON responses
ClosedPublic

Authored by anlambert on Dec 20 2019, 4:27 PM.

Details

Summary

Ensure that all URIs returned in JSON responses of the Web API are absolute (prefixed
with protocol and domains).

That commit consists in:

  • adding a request parameter to our custom django reverse function in order to build an abolute URI before returning it
  • providing the input HTTP requests as parameter to reverse in API endpoints implementation
  • centralizing all enrich functions adding URIs in API responses in the swh.web.api.utils module
  • modifying tests implementation to check all returned URIs are absolute
  • adding some new and missing tests related to these improvements
  • modifying enrich function tests to remove hardcoded data in favor of hypothesis

Regarding the HTML responses of the API (when querying it through a browser), the current
behavior remains unchanged: URIs are not displayed as absolute.

Related to T2154

Warning: That diff is quite large but it mainly consists of passing a new request parameter to
the reverse function. I did not want to handle one endpoint at a time and prefer to process
all of them in one pass.

Diff Detail

Repository
rDWAPPS Web applications
Branch
api-return-full-uris
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 9941
Build 14692: Cypress tests for swh-web diffsJenkins
Build 14691: tox-on-jenkinsJenkins
Build 14690: arc lint + arc unit

Event Timeline

I'm not a big fan of using absolute URLs. It makes the code more verbose, tests more complex, and prevents clients from using proxies. Why do we want/need that?

Assuming we do want it:

Some of the enrich_* functions .copy() their arguments before modifying it, but some don't. This is inconsistent (but I don't know which way is better)

I don't think the request should be explicitly passed from templates to filters; templates should only be about presentation. What about using this instead? https://stackoverflow.com/a/2160298

I'm not a big fan of using absolute URLs. It makes the code more verbose, tests more complex, and prevents clients from using proxies. Why do we want/need that?

The rationale is explained in T2147 and T2154. In short: (1) ease of client implementation, (2) uniformity with the rest of the world.

Some of the enrich_* functions .copy() their arguments before modifying it, but some don't. This is inconsistent (but I don't know which way is better)

Indeed, the copy operation is not needed so I will remove it.

I don't think the request should be explicitly passed from templates to filters; templates should only be about presentation. What about using this instead? https://stackoverflow.com/a/2160298

A template filter can take arguments so I do not see why we could not provide the input request to it. By providing the request, it enables to reliably strip
the protocol and domain from the urls to display (as there is no interest to display absolute urls in HTML views of the API responses).

Update: Rebase and remove not needed copy operations in enrich functions.

What I meant is that the request object appears in template files (eg. {{ header_value | urlize_header_links:request | safe }}), while it should be abstracted away.

Also, when re-reading that part of the code, I realized that urlize_header_links turns absolute links to relative links, which I don't think it should do, because it means the apidoc will show something different than what would be returned by the API

What I meant is that the request object appears in template files (eg. {{ header_value | urlize_header_links:request | safe }}), while it should be abstracted away.

It appears in template because the django.template.context_processors.request context processor is used (this is a default in django settings).
For instance it enables to display the request type and path in each HTML view of the API responses.

Also, when re-reading that part of the code, I realized that urlize_header_links turns absolute links to relative links, which I don't think it should do, because it means the apidoc will show something different than what would be returned by the API

Indeed, it makes sense. Let's display them as absolute then.

Update: Display absolute URIS also in HTML views of the API responses.

This revision is now accepted and ready to land.Jan 7 2020, 3:22 PM
This revision was landed with ongoing or failed builds.Jan 7 2020, 3:23 PM
This revision was automatically updated to reflect the committed changes.