Page MenuHomeSoftware Heritage

Web API: do not leak internal, non-intrinsic origin identifiers
Closed, ResolvedPublic

Description

As a by-product of the discussion in T1731, we determined that leaking integer identifiers for origins was not a good idea in the first place. We should stop doing that, by returning only the information that an origin is known in the response for [[ https://archive.softwareheritage.org/api/1/origin/ | /origin ]], and expecting clients to still provide full-URLs to subsequent end points to specify an URL (e.g., [[ https://archive.softwareheritage.org/api/1/origin/visit/ | /origin/visit ]]).

It is yet undecided whether we want to do that by breaking backward compatibility for API version 1 (it's beta anyway…) or if this should be our "chance" to start version 2 of the API.

Event Timeline

zack triaged this task as Low priority.Jun 7 2019, 3:38 PM
zack created this task.
zack added a subscriber: anlambert.Jan 29 2020, 2:20 PM

@anlambert is this done and, if so, can you close it?

Related comments:

type (string): the type of software origin (deprecated value; types are now associated to visits instead of origins)
id (number): the origin unique identifier (deprecated value; you should only refer to origins based on their URL)

So I think (a) this task should be closed, (b) those two fields should be removed from the doc.
(But I could totally be missing context here!)

@zack, I take care of cleaning the Web API documentation and then we can indeed close that task as resolved.

Quick follow up on this, we are still leaking origin ids for the [[ https://archive.softwareheritage.org/api/1/origins/ | /origins/ ]] endpoint. There is also still an origin_from parameter to handle pagination.

I known that OpenAIRE is using that endpoint and they already contacted me when we removed the origin type attribute as it broke their code.

I propose to remove those id leaks from that endpoint and contact OpenAIRE to tell them to use the Link header from now on to paginate
the results.

zack added a comment.Jan 29 2020, 10:48 PM

I propose to remove those id leaks from that endpoint and contact OpenAIRE to tell them to use the Link header from now on to paginate
the results.

Sounds good.

As a minor variation I propose to take this chance to streamline & generalize the process of informing people about API changes. Rather than mailing OpenAIRE folks only, let's mail swh-devel with an appropriate message, say, 1 week before the change. (Given it's the first time we can also inform the OpenAIRE folks about this, but making clear in the future announcements will be sent via the list.)

anlambert closed this task as Resolved.Feb 10 2020, 1:33 PM
anlambert claimed this task.

Closing this as resolved as all internal origin identifiers are now more leaked in the Web API responses. I also informed OpenAIRE by mail about the change.