Page MenuHomeSoftware Heritage

Add an endpoint to list and access raw extrinsic metadata.
ClosedPublic

Authored by vlorentz on Jun 15 2021, 4:50 PM.

Details

Test Plan

Tests failing because of dependency on swh-storage

Diff Detail

Repository
rDWAPPS Web applications
Branch
remd-api
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 22038
Build 34277: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 34276: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D5875 (id=21045)

Rebasing onto 01ffb31ff7...

First, rewinding head to replay your work on top of it...
Applying: Add an endpoint to list and access raw extrinsic metadata.
Changes applied before test
commit 4d286468c1296869e361a96c36e2313fb671e54d
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jun 15 16:49:53 2021 +0200

    Add an endpoint to list and access raw extrinsic metadata.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/878/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/878/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 15 2021, 4:52 PM
Harbormaster failed remote builds in B22038: Diff 21045!
vlorentz edited the test plan for this revision. (Show Details)

looks rather good to me, couple of suggestions/remarks/questions inline.

swh/web/api/views/metadata.py
24

do you need to mention the swhid in the url?

I don't really know if it's compatible with the other suggestion on the url below.

That would match what we do with the content api (the raw endpoint).

60

maybe "trim" the authority_str because that could pass with malformed " authority-str" or some other form?

132

to match what we do with content...

swh/web/common/converters.py
300–308

there, compacted ;)

swh/web/api/views/metadata.py
24

I want to add origins next, so I wanted to differentiate the two URLs.

That would match what we do with the content api (the raw endpoint).

Which is exactly what I didn't want, actually.

/raw/ at the end of "/metadata/extrinsic/(?P<target>{SWHID_RE})/raw/" because it implies there is another endpoint "/metadata/extrinsic/(?P<target>{SWHID_RE})/"; but there isn't, and probably never will.

anlambert added inline comments.
swh/web/api/views/metadata.py
1

Copyright (C) 2021

24

How about using /metadata/extrinsic/(?P<target>{SWHID_RE})/ for that endpoint and /metadata/extrinsic/(?P<target>{SWHID_RE})/raw for the one getting the raw metadata bytes ? Raw implies a non JSON response format from my point of view.

42

You should add a description of the JSON response here.

70

why the from None here ?

144–147

return HttpResponse(...)

swh/web/api/views/metadata.py
24

but this endpoint is to get raw_extrinsic_metadata objects...

70

No reason in particular, I just don't see why we should attach the original traceback

swh/web/api/views/metadata.py
27

Use the swh.web.api.apidoc.apidoc decorator here to register endpoint documentation.

@api_doc("/metadata/extrinsic/raw/swhid/")
30

.. http:get:: /api/1/metadata/extrinsic/raw/swhid/(target)/

134

Use the swh.web.api.apidoc.apidoc decorator here to register endpoint documentation.

@api_doc("/metadata/extrinsic/raw/get/")
vlorentz added inline comments.
swh/web/api/views/metadata.py
134

No, it's not documented on purpose because users shouldn't build this URL themselves

  • apply some comments
  • change URL scheme

Build has FAILED

Patch application report for D5875 (id=21082)

Rebasing onto 1bd55d031e...

First, rewinding head to replay your work on top of it...
Applying: Add an endpoint to list and access raw extrinsic metadata.
Changes applied before test
commit b34132f12f162d448cd7e1b2d6845c767d90c4c4
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jun 15 16:49:53 2021 +0200

    Add an endpoint to list and access raw extrinsic metadata.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/880/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/880/console

Build has FAILED

Patch application report for D5875 (id=21083)

Rebasing onto 1bd55d031e...

Current branch diff-target is up to date.
Changes applied before test
commit 6cbe8079a4ed8a9f2742291b4c84f18e9f6e2eab
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jun 15 16:49:53 2021 +0200

    Add an endpoint to list and access raw extrinsic metadata.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/881/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/881/console

Build has FAILED

Patch application report for D5875 (id=21117)

Rebasing onto 1bd55d031e...

Current branch diff-target is up to date.
Changes applied before test
commit 7c722d889d28d8c6fb2b4ed676fffa10452ad42c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jun 15 16:49:53 2021 +0200

    Add an endpoint to list and access raw extrinsic metadata.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/882/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/882/console

add tests for parameter checks

Build has FAILED

Patch application report for D5875 (id=21122)

Rebasing onto 1bd55d031e...

Current branch diff-target is up to date.
Changes applied before test
commit 4c22e06e79387cdc43a237c16a2cfd01777250ba
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jun 15 16:49:53 2021 +0200

    Add an endpoint to list and access raw extrinsic metadata.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/883/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/883/console

swh/web/api/views/metadata.py
74

You need to remove the leading slash here otherwise the following URL is generated `

/api/1//raw-extrinsic-metadata/swhid/swh:1:dir:a2faa28028657859c16ff506924212b33f0e1307/?authority=forge%20https://pypi.org/ ` which leads to a 404.
swh/web/api/views/metadata.py
138

Pass the input request object as keyword parameter here, it will return a full URL.

Current JSON output is:

[
    {
        "authority": {
            "type": "forge",
            "url": "https://pypi.org/"
        },
        "discovery_date": "2021-05-07T04:31:14+00:00",
        "fetcher": {
            "name": "swh.loader.package.pypi.loader.PyPILoader",
            "version": "0.22.0"
        },
        "format": "pypi-project-json",
        "metadata_url": "/api/1/raw-extrinsic-metadata/get/107328dd41f23e8d4addf35a7091e435153a676e/",
        "origin": "https://pypi.org/project/swh.core/",
        "revision": "swh:1:rev:48371365859f4724c609895caaab362168623507",
        "target": "swh:1:dir:a2faa28028657859c16ff506924212b33f0e1307"
    }
]
swh/web/api/views/metadata.py
174

How about setting a default filename for the blob to download as metadata id by adding that parameter ?

headers={"Content-disposition": f"attachment; filename={id}"}
vlorentz marked 2 inline comments as done.

fix URL roots in documentation and returned objects, as requested

swh/web/api/views/metadata.py
174

I have a better idea; do you mind if I do it in a future diff?

Build has FAILED

Patch application report for D5875 (id=21256)

Rebasing onto 1bd55d031e...

Current branch diff-target is up to date.
Changes applied before test
commit 00f08cea038d6d5e1ed6998db4ba9c61b18ec399
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jun 15 16:49:53 2021 +0200

    Add an endpoint to list and access raw extrinsic metadata.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/892/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/892/console

Looks good to me ! Time to tag swh-storage in order for the build to succeed.

This revision is now accepted and ready to land.Jun 25 2021, 11:11 AM

Build has FAILED

Patch application report for D5875 (id=21261)

Rebasing onto 1bd55d031e...

Current branch diff-target is up to date.
Changes applied before test
commit 785719a16a25bed9967c5086e781b77dc09e89f5
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jun 15 16:49:53 2021 +0200

    Add an endpoint to list and access raw extrinsic metadata.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/901/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/901/console

Build is green

Patch application report for D5875 (id=21287)

Rebasing onto f9f1e969e9...

First, rewinding head to replay your work on top of it...
Applying: Add an endpoint to list and access raw extrinsic metadata.
Changes applied before test
commit 18632aae5f60e056a35d7932ec6ffb19f83f8252
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Jun 15 16:49:53 2021 +0200

    Add an endpoint to list and access raw extrinsic metadata.

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/905/ for more details.

aea8ce1614421670f589dfe5a28b2750610a5a14