Page MenuHomeSoftware Heritage

common/origin_save: Prevent erroneous save code now status overriding
ClosedPublic

Authored by anlambert on Mar 29 2021, 6:22 PM.

Details

Summary

Loading status for a save code now request is first retrieved from the
scheduler database.

When a visit date for a save code now request has been found, the visit
status is then checked to possibly update the final status of the
request to failed (when an origin is not found for instance).

However to avoid sending too many costly queries to the storage,
the visit date and status are only checked once and stored to
swh-web database.

So ensure to not override a final loading task status (succeeded or
failed) to avoid invalid save code now request reporting.

Diff Detail

Repository
rDWAPPS Web applications
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5378 (id=19257)

Rebasing onto ce7dceb4d9...

Current branch diff-target is up to date.
Changes applied before test
commit 046c3aed556df19af3c0e51a230eaa3746f7a671
Author: Antoine Lambert <antoine.lambert@inria.fr>
Date:   Mon Mar 29 18:14:21 2021 +0200

    common/origin_save: Prevent erroneous save code now status overriding
    
    Loading status for a save code now request is first retrieved from the
    scheduler database.
    
    When a visit date for a save code now request has been found, the visit
    status is then checked to possibly update the final status of the
    request to failed (when an origin is not found for instance).
    
    However to avoid sending too many costly queries to the storage,
    the visit date and status are only checked once and stored to
    swh-web database.
    
    So ensure to not override a final loading task status (succeeded or
    failed) to avoid invalid save code now request reporting.

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/639/ for more details.

ardumont added inline comments.
swh/web/tests/common/test_origin_save.py
303

what was the value here prior to the fix?

swh/web/tests/common/test_origin_save.py
303

SAVE_TASK_SUCCEEDED ...

The issue appeared since the exploitation of visit statuses (not_found for instance).
Because it is costly to get the visit date and status associated to a request, that operation
is performed once and status stored to django database.

But prior getting a visit date, the overall save code now request status is retrieved
from scheduler database (scheduled, running). So the final request status could
be overridden.

ardumont added inline comments.
swh/web/tests/common/test_origin_save.py
303

ok, thx

This revision is now accepted and ready to land.Mar 30 2021, 11:06 AM