Page MenuHomeSoftware Heritage

common/origin_save: Update missing information when available
ClosedPublic

Authored by ardumont on May 27 2021, 11:35 AM.

Details

Summary

Due to some internal details [1], it happens that save code now requests are updated
with missing information. Broadening a bit the filtering on the save code now requests
to list for update, we can try and update those information more frequently (even when
the main information is already updated). This way, the save code now requests will
eventually be updated fully when the information lateness subsides.

[1] Replication lag

Related to T3346

Test Plan

tox

Diff Detail

Repository
rDWAPPS Web applications
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 21654
Build 33654: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 33653: arc lint + arc unit

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.May 27 2021, 11:36 AM
Harbormaster failed remote builds in B21651: Diff 20689!

Might as well compare the date since we have it nearby

Build has FAILED

Patch application report for D5788 (id=20690)

Rebasing onto 21616f5635...

Current branch diff-target is up to date.
Changes applied before test
commit c4318b034ddcc37daac58b2400189ac6e819d1be
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu May 27 11:25:06 2021 +0200

    common/origin_save: Update missing information when available
    
    Due to some internal details [1], it happens that save code now requests are updated
    with missing information. Broadening a bit the filtering on the save code now requests
    to list for update, we can try and update those information more frequently (even when
    the main information is already updated). This way, the save code now requests will
    eventually be updated fully when the information lateness subsides.
    
    [1] Replication lag
    
    Related to T3346

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/848/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/848/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 27 2021, 12:16 PM
Harbormaster failed remote builds in B21654: Diff 20692!

Build was aborted

Patch application report for D5788 (id=20692)

Rebasing onto 21616f5635...

Current branch diff-target is up to date.
Changes applied before test
commit ae4b69078661057149782965ff60a863b219d8f4
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu May 27 11:25:06 2021 +0200

    common/origin_save: Update missing information when available
    
    Due to some internal details [1], it happens that save code now requests are updated
    with missing information. Broadening a bit the filtering on the save code now requests
    to list for update, we can try and update those information more frequently (even when
    the main information is already updated). This way, the save code now requests will
    eventually be updated fully when the information lateness subsides.
    
    [1] Replication lag
    
    Related to T3346

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/850/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/850/console

Build is green

Patch application report for D5788 (id=20696)

Rebasing onto 21616f5635...

Current branch diff-target is up to date.
Changes applied before test
commit 15168bc3cca7aaca24240917681cab937c25235d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu May 27 11:25:06 2021 +0200

    common/origin_save: Update missing information when available
    
    Due to some internal details [1], it happens that save code now requests are updated
    with missing information. Broadening a bit the filtering on the save code now requests
    to list for update, we can try and update those information more frequently (even when
    the main information is already updated). This way, the save code now requests will
    eventually be updated fully when the information lateness subsides.
    
    [1] Replication lag
    
    Related to T3346

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/851/ for more details.

This revision is now accepted and ready to land.May 27 2021, 2:45 PM
swh/web/common/origin_save.py
615

Those 2 are wrongs, this should be an OR between those 2...

F(visit_date__isnull=True)|F(visit_status__isnull=True)
swh/web/common/origin_save.py
615

Ah right, the visit date can be None while visit status is set according to that code.
Anyway this is an edge case as save code now reporting looks correct in production.
Django Q objects should be used to build the query then.

ardumont added inline comments.
swh/web/common/origin_save.py
615

Thanks ;)

@vsellier noticed some misshaped save code now requests, that should be the reason ^ (heads up ;)

swh/web/common/origin_save.py
615

I messed up the code link in my previous comment, here is the good one.

swh/web/common/origin_save.py
615

@vsellier took care of it in D5810 \o/