Page MenuHomeSoftware Heritage

Add a public API endpoint to get the metadata of an origin
Closed, MigratedEdits Locked

Description

The indexer storage API provides an endpoint to get the metadata (origin_intrinsic_metadata_get).

It would be useful to have this endpoint public, similarly to origin_intrinsic_metadata_search_fulltext.

This requires changing code only in swh-web; that is very similar to what is already done for the fulltext search, except using origin ids as arguments instead of text.

Event Timeline

vlorentz triaged this task as Normal priority.Mar 28 2019, 2:40 PM
vlorentz created this task.

If this issue is open, I can work on this. I believe I have to add the endpoint in /swh-web/swh/web/api/views/origin.py. I will probably have to populate the db with an origin and test the api by making http requests, or is there a better way?

This is will also require adding a function to in swh/web/common/service.py that makes the query to swh-storage. The response will then have to be converted to a json response. I would like to take this up and add this api. Please suggest if there is wrong with the changes I am suggesting.

In T1613#33881, @twitu wrote:

If this issue is open, I can work on this. I believe I have to add the endpoint in /swh-web/swh/web/api/views/origin.py.

Correct

I will probably have to populate the db with an origin and test the api by making http requests, or is there a better way?

It's already populated for the unit tests, to test origin_intrinsic_metadata_search_fulltext.

In T1613#33907, @twitu wrote:

This is will also require adding a function to in swh/web/common/service.py that makes the query to swh-storage.

No, to make the query to swh-indexer's storage (like origin_intrinsic_metadata_search_fulltext does).

Ok I'll try to implement and test this by tomorrow.

I wan to clarify the return value for this api. In swh-storage, table origin_metadata contains the metadata of an origin for a visit, listing etc. The primary key is an auto incremented value. This means that there can be multiple entries for a single origin_id. What is the expected JSON response for this API? I can include origin_id and a list of metadata values, is there anything else to return?

Don't use the origin_metadata of swh-storage, it contains a different kind of metadata that we are not interested in for now.

This task is completed it can be closed.

vlorentz assigned this task to twitu.