Page MenuHomeSoftware Heritage

Web API endpoint /known/
ClosedPublic

Authored by DanSeraf on Fri, Jan 24, 3:36 PM.

Details

Summary

New endpoint to check if a list of PIDs exist in the archive (T2176)

Get information about the Software Heritage persistent identifiers:

  • input: a JSON list of SWH PIDs
  • output: a dictionary with: pid as key and a boolean value['found'] to check if the

corrisponding pid exists or not

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

DanSeraf created this revision.Fri, Jan 24, 3:36 PM
DanSeraf edited the summary of this revision. (Show Details)Fri, Jan 24, 3:38 PM
vlorentz requested changes to this revision.Fri, Jan 24, 4:08 PM
vlorentz added a subscriber: vlorentz.
vlorentz added inline comments.
swh/web/api/views/known.py
15 ↗(On Diff #9215)

This method should probably be in swh/web/api/views/identifiers.py.

39 ↗(On Diff #9215)

You should use the parser from swh.model instead of manually splitting. (and pass the parsed object to group_swh_persistent_identifiers, so it doesn't have to parse it again)

42 ↗(On Diff #9215)

response is unassigned if the method is not POST.

swh/web/common/utils.py
423–425 ↗(On Diff #9215)

You should add a small wrapper around parse_persistent_identifier to handle the error, that can be shared with resolve_swh_persistent_id.

swh/web/tests/common/test_service.py
881–899

You should also add a test for when they are not (all) missing.

swh/web/tests/common/test_utils.py
117–126 ↗(On Diff #9215)

this should be in swh/web/tests/api/test_identifiers.py

This revision now requires changes to proceed.Fri, Jan 24, 4:08 PM
DanSeraf added inline comments.Fri, Jan 24, 5:28 PM
swh/web/api/views/known.py
42 ↗(On Diff #9215)

There is no other method available, should i assigned it anyways?

DanSeraf updated this revision to Diff 9227.Fri, Jan 24, 6:49 PM
  • known moved to identifiers view, parse pids before grouping them
  • test present pids
DanSeraf updated this revision to Diff 9228.Mon, Jan 27, 11:51 AM
  • /known/ endpoint tests
  • test_service: testing missing hashes lookup with present pids
vlorentz requested changes to this revision.Mon, Jan 27, 12:27 PM

You shouldn't remove the handling of ValidationError exceptions; because it raises an error 500 Internal Server Error, which should only be returned if there is a bug in the server code.

Instead you should raise a BadInputExc, like your code initially did.

swh/web/api/views/identifiers.py
88–92 ↗(On Diff #9228)

shorter version:

persistent_ids = [parse_persistent_identifier(pid) for pid in request.data]
94 ↗(On Diff #9228)

You can use a dict comprehension:

response = {str(pid): {'found': False} for pid in persistent_ids}
102–106 ↗(On Diff #9228)

You can assign the dicts with content directly:

response was already initialized with found set to False for all pids, so you only need to do this:

for pid in persistent_ids:
    if pid.object_id not in missing_hashes:
        response[str(pid)]['found'] = True
swh/web/api/views/known.py
42 ↗(On Diff #9215)

You should return an error 405 Method Not Allowed.

swh/web/common/service.py
1176

We're trying to use type annotations in new functions, so it should be:

def lookup_missing_hashes(grouped_pids: Dict[str, List[bytes]]) -> Set[str]:
This revision now requires changes to proceed.Mon, Jan 27, 12:27 PM
DanSeraf updated this revision to Diff 9236.Mon, Jan 27, 2:32 PM
  • returning 400 instead of 500 for invalid pid, type annotation on lookup_missing_hashes
  • rebase
zack requested changes to this revision.Mon, Jan 27, 2:52 PM
zack added a subscriber: zack.
zack added inline comments.
swh/web/api/views/identifiers.py
82 ↗(On Diff #9236)

the indentation of the dictionary keys do not look right here

83 ↗(On Diff #9236)

I don't like "found" as a key here, as the name of the action is neither "search" nor "find" here.

I think it should be:

  • either "known: true/false",
  • or "status: known/unknown"

The latter would allow to convey more information in the future, e.g., an object that is present but hidden.

AFAICT this would be the first time we convey a non-boolean status of objects in a public API, so we probably do not want to go there; we can reconsider for API 2.0. Hence, my proposal is "known: true/false" for now.

(Naming is the most difficult problem in CS (cit.) *g*)

swh/web/common/service.py
1190–1200

these should be either a series of if/elif/elif/elif, or a dictionary with a single lookup

swh/web/common/utils.py
403–404 ↗(On Diff #9236)

the indentation of the dictionary keys do not look right here

This revision now requires changes to proceed.Mon, Jan 27, 2:52 PM
DanSeraf updated this revision to Diff 9237.Mon, Jan 27, 3:07 PM
minor changes, rebase
DanSeraf updated this revision to Diff 9240.Mon, Jan 27, 3:15 PM
fix /known/ test expected values
vlorentz requested changes to this revision.Mon, Jan 27, 3:44 PM
  • returning 400 instead of 500 for invalid pid, type annotation on lookup_missing_hashes

This part is ok now, but cf my earlier comment: "You should add a small wrapper around parse_persistent_identifier to handle the error, that can be shared with resolve_swh_persistent_id."

swh/web/api/views/identifiers.py
83 ↗(On Diff #9236)

The latter would allow to convey more information in the future, e.g., an object that is present but hidden.

It would imply breaking the API compatibility, so no.

This revision now requires changes to proceed.Mon, Jan 27, 3:44 PM
zack added inline comments.Mon, Jan 27, 3:59 PM
swh/web/api/views/identifiers.py
83 ↗(On Diff #9236)

It would imply breaking the API compatibility, so no.

This is not the place to have this conversation, but nack. Because (1) with API version 1 we make no API compatibility promises yet (that will happen starting version 2); and (2) adding fields to an existing dictionary should be possible (and documented) as a way to evolve an API that clients should expect.

  • returning 400 instead of 500 for invalid pid, type annotation on lookup_missing_hashes

This part is ok now, but cf my earlier comment: "You should add a small wrapper around parse_persistent_identifier to handle the error, that can be shared with resolve_swh_persistent_id."

Should i handle the error in parse_persistent_identifier while creating the pid?

Should i handle the error in parse_persistent_identifier while creating the pid?

No, parse_persistent_identifier is in swh.model, which shouldn't deal with webserver stuff (and BadInputExc is an HTTP error). You should add a function in swh-web that calls parse_persistent_identifier, catches ValidationError, and re-raises a BadInputExc.

DanSeraf updated this revision to Diff 9242.Mon, Jan 27, 4:58 PM
utils.py: function to get a parsed persistent identifier
zack accepted this revision as: zack.Mon, Jan 27, 7:47 PM

The only method accepted by the endpoint is POST so it returns 405 for other methods

Then you don't need if request.method == 'POST':, right?

douardda added inline comments.
swh/web/api/views/known.py
42 ↗(On Diff #9215)

I agree this is the job of the framework (since the method has already been declared as POST only) to return the proper 405, but also that the if request.method == 'POST': should not be there at all.

vlorentz accepted this revision.Tue, Jan 28, 1:37 PM
This revision is now accepted and ready to land.Tue, Jan 28, 1:37 PM