Page MenuHomeSoftware Heritage

Make load-deposit and check-deposit URL argument absolute
ClosedPublic

Authored by douardda on Dec 11 2019, 10:56 AM.

Details

Summary

ensuring these generalted URLs are correct and resolvable (eg. when the WSGI
app is mounted on a different path than /).

This needed to move the post_deposit_save code from a signal directly
within the view, so we have a request from which we can forge absolute URLs.

Note: this is needed in order to be able to have a fully functional deposit
in the docker dev environment.

Test Plan

tests should pass ok however no test exists for the interaction with the
scheduler, so...

Diff Detail

Repository
rDDEP Push deposit
Branch
master
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 9873
Build 14577: tox-on-jenkinsJenkins
Build 14576: arc lint + arc unit

Event Timeline

tests should pass ok however no test exists for the interaction with the
scheduler, so...

By adding the swh_scheduler fixture in one test (which adds a deposit), you could either:

  • check the check_task_id, load_task_id are set
  • introspect the swh_scheduler.search_task('check-deposit') tasks to ensure there is something [1]

[1] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/cgit/tests/test_lister.py$53-55

tests should pass ok however no test exists for the interaction with the
scheduler, so...

By adding the swh_scheduler fixture in one test (which adds a deposit), you could either:

  • check the check_task_id, load_task_id are set
  • introspect the swh_scheduler.search_task('check-deposit') tasks to ensure there is something [1]

[1] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/cgit/tests/test_lister.py$53-55

Yep that's my plan

ardumont added inline comments.
swh/deposit/api/common.py
180

I just realized something about absolute and relative urls.

Do you have a sample of the url before and after?

Currently, relative url are computed and passed to worker as:.
'/1/private/<collection>/<deposit-id>/check'

The worker is then by its configuration completing that part when doing its job.

Thus, if by absolute, you mean something like:

'https://<server>/1/private/...

That will currently work in jenkins.

But that won't work in real life for the workers though (dev, staging, prod).

That's some caveat we spoke about some time ago.

This revision now requires changes to proceed.Dec 12 2019, 11:38 AM

don't remember what I did change in here...

Let phab tell me :-)

swh/deposit/api/common.py
180

Ok so absolute URLs are indeed URLs starting with http(s):// scheme.

I don't really see why it should break worker job (dev, staging, prod).

FTR I have this working in docker dev on my machine.
I may require some configuration adjustments when deployed.

In the following diffs I also have code to be able to test the interaction with the scheduler, which should also improve a bit the reliability of the stack.

swh/deposit/api/common.py
180

I don't really see why it should break worker job (dev, staging, prod).

well, we already agreed some time ago it should not.
(I'm just telling you the current state, i never fixed that nor see any diffs going that way yet)

In the current code of the deposit checker, it will break.

(This is no longer true on the deposit loader so \m/, checked the scheduler db, see below)

The deposit client (used by the checker) expects from its settings the absolute urls of the deposit server it's supposed to connect to. It then expects a relative url to fill in the blank to answer correctly.

See some swh-scheduler' tasks sample (prod):

softwareheritage-scheduler=> select arguments from task where type='check-deposit' limit 10;
                                  arguments
------------------------------------------------------------------------------
 {"args": [], "kwargs": {"deposit_check_url": "/1/private/other-client/352/check/"}}
 {"args": [], "kwargs": {"deposit_check_url": "/1/private/hal/351/check/"}}

In the following diffs I also have code to be able to test the interaction with the scheduler, which should also improve a bit the reliability of the stack.

yes, server side, eveything was tested but that part indeed iirc.
so cool.

swh/deposit/api/common.py
180

Ok so absolute URLs are indeed URLs starting with http(s):// scheme.

well, we already agreed some time ago it should not.

Clarifying...
I mean we agreed on no longer using relative urls.
And that this is what the current diff does.

We should use absolute urls.
But that diff won't work if we deploy it today (and that's how i see a diff by default, something that can land and be deployed).

That's all i'm trying to convey.

This needs to include the necessary fix on the deposit client used by the checker.
As of today, there is only one user of the deposit client (private deposit client), it's the checker.
(the loader used to use it but no longer, it defines its own [1])

[1] and in the end, there will be a slight impact there as well, still related to relative url... (it does the same stuff to speak with the private api part to update the deposit's status, i can take care of this i think, just tell me if you want me to ;).

swh/deposit/api/common.py
180

[1] and in the end, there will be a slight impact there as well, still related to relative url... (it does the same stuff to speak with the private api part to update the deposit's status, i can take care of this i think, just tell me if you want me to ;).

never mind that part, i think it's fine as is.

concerns lifted by irc discussion and D2472?id=8779#inline-16469's code which does the right thing.

In [2]: urljoin('https://deposit.softwareheritage.org/1/', 'https://deposit.softwareheritage.org/1/something')
Out[2]: 'https://deposit.softwareheritage.org/1/something'
This revision is now accepted and ready to land.Dec 20 2019, 3:15 PM