Public API v2
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	vlorentz
	Jun 14 2019, 12:06 PM

Description

Motivation: We want to stop encoding arguments in the request path, and use query parameters instead. This makes more sense for HTTP (and the various proxy layers we have), and is the only way to properly give an origin URL as argument to the API (%2F and / are indistinguishable in Django's request routing). And this will require changing the whole structure of the API.

This meta task lists all other breaking changes we want to include in version 2 of the public API.

Use query parameters instead of encoding arguments in the request path
No leak of the origin id (use only origin URLs)
Use SWHIDs everywhere (core SWHIDs, without qualifiers)
Compatibility with at least one well-known API specification format (OpenAPI, SPARQL, ...)
Consistent pagination used across all endpoints
Authentication
Standardize "batch invocation" of endpoints on multiple objects
Consistent results for the same object accessed via different endpoints (e.g. /revision/<rev>/directory and /directory/<dir_id> do not return the same type of result; one is a superset of the other).
Future-proofing, wrt. changes of hash algorithms (currently sha1_git)
Consider dropping /revision/log/ (?) (see T2450)

Revisions and Commits

rDWAPPS Web applications
	Abandoned	D4629 [POC] OpenAPI and Django REST Framework to specify / implement API v2

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T2194 Archive Integration (Web API)
Migrated	gitlab-migration	T2195 Web API 2
Migrated	gitlab-migration	T1805 Public API v2
Migrated	gitlab-migration	T1912 Support origin pagination without origin ids
Migrated	gitlab-migration	T1970 Web API: make /origin/ return the swh:1:ori:... PID
Migrated	gitlab-migration	T1510 Have a look at openAPI and decide whether we want to follow these specs

Event Timeline

vlorentz triaged this task as Normal priority.Jun 14 2019, 12:06 PM

vlorentz created this task.

vlorentz updated the task description. (Show Details)Jun 17 2019, 12:31 PM

vlorentz updated the task description. (Show Details)Jun 17 2019, 1:12 PM

vlorentz mentioned this in T1816: Make all clients of swh-storage use origin URL as identifier and use visit types instead of origin types.Jun 17 2019, 4:20 PM

zack updated the task description. (Show Details)Jun 21 2019, 5:41 PM

zack updated the task description. (Show Details)Jun 22 2019, 11:21 PM

vlorentz claimed this task.Jun 24 2019, 2:34 PM

vlorentz removed vlorentz as the assignee of this task.Jul 3 2019, 3:35 PM

vlorentz mentioned this in T1892: Cassandra as a storage backend.Jul 8 2019, 1:25 PM

vlorentz added a subtask: T1912: Support origin pagination without origin ids.Jul 11 2019, 4:46 PM

vlorentz added a subtask: T1970: Web API: make /origin/ return the swh:1:ori:... PID.Aug 27 2019, 10:09 AM

zack closed subtask T1970: Web API: make /origin/ return the swh:1:ori:... PID as Wontfix.Nov 20 2019, 12:48 PM

douardda updated the task description. (Show Details)Jan 17 2020, 10:21 AM

douardda updated the task description. (Show Details)

douardda mentioned this in T2195: Web API 2.Jan 20 2020, 2:04 PM

vlorentz renamed this task from Public API v2 (meta task) to Public API v2.Jan 22 2020, 4:23 PM

vlorentz added a project: meta-task.

vlorentz added a parent task: T2195: Web API 2.Jan 22 2020, 4:44 PM

vlorentz updated the task description. (Show Details)Jan 29 2020, 3:43 PM

vlorentz updated the task description. (Show Details)Feb 18 2020, 4:54 PM

zack mentioned this in T2435: Prepare support of new hashing algorithms for browsing objects.Jun 5 2020, 3:01 PM

zack updated the task description. (Show Details)Jun 12 2020, 5:43 PM

zack updated the task description. (Show Details)

zack mentioned this in T2450: Fix pagination of the /revision/<rev>/log/ public API.Jun 12 2020, 10:01 PM

vlorentz closed subtask T1912: Support origin pagination without origin ids as Resolved.Jun 29 2020, 2:13 PM

Rereading this task, I have a few comments/questions.

Item 3 Use SWHID everywhere - does the new API handle SWHID qualifiers too? always? sometimes? Or are qualifiers actually "the way" to specify arguments ? Are there needed endpoints that would not fit (besides query execution related arguments like pagination support etc.)?
Using SWHID will work well to navigate the archive (since we only manipulate existing archive objects, so everything have a SWHID), but not for a "discovery" usage (looking for objects within a range, matching a pattern, etc).

Maybe those 2 use cases should be considered separately.

Item 4 Use a well known API tool - I'm pretty sure SPARQL is out of reach for us, so I'd go for OpenAPI. So a decent first step would be to begin writing an OpenAPI definition for this new API.

Items 5, 6, 7 aka pagination, auth and batches - I believe these come naturally with item 4 (specification wise)

Item 8 Consistent results w.r.t. access path - with the rise of "SWHID everywhere" and the new existence of "SWHID with context", how does this point plays? Don't we want dedicated views/results when using a contextualised SWHID?

Item 9 Future proof w.r.t. hash algos - unless I'm wrong, from the API point of view, and according item 3, this is a matter of SWHID specification, not an API one.

Overall, I'm not sure how far to go with this "use SWHID everywhere". In fact, using SWHID, we don't even need most entry points (/content, /snapshot, etc.). Keeping them is a bit redundant. But can we imagine an API based on SWHID only?

Then, during this API v2 design session, should be considered:

what part of the API v1 we need? can we get rid of some? (e.g. /revision/log as pointed in item 10)
are there missing endpoints?
who are current users for this API? who are the expected new users?
what do they need?

And before any jump to writing code, a few client scenarios should be implemented using this new API, to check how convenient it is.

I suspect when this task was initially submitted we didn't have yet SWHID with qualifiers :)
From the point of view of the APIv2, given v1 was using only hashes, for feature parity we should indeed only need SWHID without qualifiers, i.e., "core" SWHIDs. (I'm gonna edit the task description to reflect that.) Thanks for noticing this!

Good point about /content v. /directory etc. I agree that switching to SWHIDs everywhere they seem redundant. That's similar with something I've experienced when writing the Python Web client. There I went for (1) a generic get() method that takes a SWHID and return any kind of object, together with (2) type-specific methods (revision(), directory()), etc. The advantage of having both is that, on the one hand, you have type checking that will help avoiding to pass the wrong type of SWHID if your application, say, only ever want to deal with revisions. On the other hand you have a natural namespace for additional type-specific methods. For instance, if we get rid of /revision, where do we put /revision/log (maybe not the greatest of examples as we are considering dropping that method, but other "rich" methods can easily show up for any kind of object, I think).

zack updated the task description. (Show Details)Jun 30 2020, 2:43 PM

Items 5, 6, 7 aka pagination, auth and batches - I believe these come naturally with item 4 (specification wise)

They don't. OpenAPI is a specification to describe APIs, and it contains absolutely nothing about pagination or batches.

But they have to be taken into account when writing the OpenAPI description of the API, if that's what you meant

KShivendu added a subscriber: KShivendu.Feb 23 2021, 7:38 PM

rdicosmo merged a task: T2195: Web API 2.Mar 8 2021, 10:10 AM

rdicosmo added a subtask: T1510: Have a look at openAPI and decide whether we want to follow these specs.Mar 8 2021, 10:19 AM

KShivendu mentioned this in T3123: double slash (/) in origin url leading to unexpected behaviours.Mar 13 2021, 12:49 PM

rdicosmo mentioned this in T3134: SWHID v2.Mar 15 2021, 8:09 PM

jayeshv added a subscriber: jayeshv.Jun 23 2021, 5:46 PM

anlambert added a revision: D4629: [POC] OpenAPI and Django REST Framework to specify / implement API v2.Jul 1 2021, 1:45 PM

douardda mentioned this in T2196: Batch APIs.Sep 20 2021, 11:54 AM

douardda closed subtask T1510: Have a look at openAPI and decide whether we want to follow these specs as Resolved.

In T1805#45984, @vlorentz wrote:

Items 5, 6, 7 aka pagination, auth and batches - I believe these come naturally with item 4 (specification wise)

They don't. OpenAPI is a specification to describe APIs, and it contains absolutely nothing about pagination or batches.

note: old comment I did not submit and forgot from back then

sure, but

pagination comes from proper usage of links https://swagger.io/docs/specification/links/
batches comes from proper usage of parameter serialization https://swagger.io/docs/specification/serialization/

it's true these do not come "for free" but I still have the impression there is an "Open API way" of handling these and we should stick to them.

One point I'm not sure how/if we want to pay attention to: using the query parameters approach limit the capabilities of the batch invocation mech (for batches input, due to limited and poorly standardized url size limit).

So do we want to also have endpoints which support "parameters" given in the request's payload (typically to support big list of swhids)?

In T1805#70880, @douardda wrote:

pagination comes from proper usage of links https://swagger.io/docs/specification/links/

batches comes from proper usage of parameter serialization https://swagger.io/docs/specification/serialization/

it's true these do not come "for free" but I still have the impression there is an "Open API way" of handling these and we should stick to them.

Thanks for these comments. I agree that we should look into these kind of best-practices to implement our requirements on top of the something like Open API, given it seems to be the current state-of-the-art.

One point I'm not sure how/if we want to pay attention to: using the query parameters approach limit the capabilities of the batch invocation mech (for batches input, due to limited and poorly standardized url size limit).

So do we want to also have endpoints which support "parameters" given in the request's payload (typically to support big list of swhids)?

I think so, yes. In fact, that is already how we do things with the /known endpoint. Given it needs to take in "a lot" of SWHIDs, they are passed as POST payload. I think it would make sense to generalize this approach for all methods that need to be able to handle batches (maybe all of them, maybe not, but that's a separate question).

jayeshv mentioned this in T3405: GraphQL apis for SWH.Feb 14 2022, 2:50 PM

gitlab-migration changed the status of subtask T1510: Have a look at openAPI and decide whether we want to follow these specs from Resolved to Migrated.Jan 8 2023, 4:26 PM

gitlab-migration changed the status of subtask T1970: Web API: make /origin/ return the swh:1:ori:... PID from Wontfix to Migrated.

This task has been migrated to GitLab.

gitlab-migration changed the status of subtask T1912: Support origin pagination without origin ids from Resolved to Migrated.Jan 8 2023, 9:59 PM

Public API v2Closed, MigratedEdits LockedActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Public API v2
Closed, MigratedEdits Locked
Actions

Related Objects
Search...