Page MenuHomeSoftware Heritage

replay: Replay origin-visit and origin-visit-status
ClosedPublic

Authored by ardumont on Jun 12 2020, 12:02 PM.

Details

Summary

This now uses the respective origin-visit-add and origin-visit-status-add
endpoints.

I'll clean up storage from origin-visit-upsert [2] (and origin-visit-update [1]) in
later diffs.

Related to T2310

[1] D3276

[2] D3281

Test Plan

tox

Diff Detail

Repository
rDSTO Storage manager
Branch
origin-visit-upsert
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 12805
Build 19486: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 19485: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D3273 (id=11600)

Rebasing onto 37c45300bc...

Current branch diff-target is up to date.
Changes applied before test
commit e519f6b3094320d1bb3aedbfc96af4819fae7652
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jun 11 17:16:29 2020 +0200

    replay: Migrate to use origin visit add instead of upsert
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/261/ for more details.

So how does the replayer create origin-visit-statuses?

So how does the replayer create origin-visit-statuses?

heh, thanks, i realize i'm missing an elif here!

Note that origin-visit-add currently adds the origin-visit-status for creation though.

Nonetheless, we should consume the object_type origin-visit-status indeed.

Fix missing origin-visit-status dealings

Use the wrong --head flag interval as i switched branch ¯\_(ツ)_/¯

  • Rework commit message
  • Remove added origin-visit-status object-type (forgot to remove it)
ardumont retitled this revision from replay: Migrate to use origin visit add instead of upsert to replay: Replay origin-visit and origin-visit-status.Jun 12 2020, 1:38 PM
ardumont edited the summary of this revision. (Show Details)
ardumont added inline comments.
swh/storage/replay.py
125

I used the dict because otherwise mypy would be a tad annoying me ;)

Build is green

Patch application report for D3273 (id=11608)

Rebasing onto 37c45300bc...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-262-D3273.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/262/ for more details.

Build was aborted

Patch application report for D3273 (id=11609)

Rebasing onto 37c45300bc...

Current branch diff-target is up to date.
Changes applied before test
commit 4541ae40ee940ac5f193aa372020ec74ff80034b
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jun 11 17:16:29 2020 +0200

    replay: Migrate to use origin visit add instead of upsert
    
    Related to T2310

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/263/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/263/console

Build is green

Patch application report for D3273 (id=11610)

Rebasing onto 37c45300bc...

Current branch diff-target is up to date.
Changes applied before test
commit 5e096bb4cf1b0236eb6d2cd30848a35a99b793ad
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jun 11 17:16:29 2020 +0200

    replay: Replay origin-visit and origin-visit-status
    
    This now uses the respective origin-visit-add and origin-visit-status-add
    endpoints.
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/264/ for more details.

could you add a test for statuses?

could you add a test for statuses?

my current understanding is that it's tested already but sure, might as well make it explicit.

s/obj_models/model_objs/?

sure ;)

my current understanding is that it's tested already but sure, might as well make it explicit.

If it was, you would have caught the missing handling of statuses

my current understanding is that it's tested already but sure, might as well make it explicit.

If it was, you would have caught the missing handling of statuses

ugh, right! That's... not false! ;D
It's writing visit-statuses but it's not asserting anything about them at the end of tests.

Currently fighting the new assertions which for now demonstrates more many origin-visit-statuses are written than expected.
Semi-expectedly since origin-visit-add writes visit-status as well...

I think the rest is in-memory implementation specific which is annoying.
(~> I've no idea yet if i messed up the in-memory implementation or not)

ardumont edited the summary of this revision. (Show Details)
  • Adapt according to review
  • Start checking the visit-status writes (which fails for now, the error will be shared that way ;)

Build has FAILED

Patch application report for D3273 (id=11614)

Rebasing onto 37c45300bc...

Current branch diff-target is up to date.
Changes applied before test
commit a9ea6235af9cdf8a08f4650f97ac20feac9f8480
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jun 11 17:16:29 2020 +0200

    replay: Replay origin-visit and origin-visit-status
    
    This now uses the respective origin-visit-add and origin-visit-status-add
    endpoints.
    
    Related to T2310

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/265/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/265/console

  • Rebase on top of diff that should make the build ok
  • test_storage_play_anonymized: Skip origin-visit-status as well (aligning with origin-visit)
  • Drop the origin-visit-update part which is no longer relevant there

Depends on D3278

swh/storage/tests/test_kafka_writer.py
63 ↗(On Diff #11621)

maybe i should drop that when we remove origin-visit-update [1]?

[1] D3276

Remove the part touching the origin-visit-update call, i'll do that in the
dedicated diff.

Build is green

Patch application report for D3273 (id=11621)

Could not rebase; Attempt merge onto 33efdb0dd4...

Updating 33efdb0..0d970bc
Fast-forward
 swh/storage/in_memory.py               |  6 +++--
 swh/storage/replay.py                  | 15 ++++++-----
 swh/storage/tests/test_kafka_writer.py |  8 +-----
 swh/storage/tests/test_replay.py       |  7 +++---
 swh/storage/tests/test_storage.py      | 46 +++++++++++++++++++++++++---------
 swh/storage/validate.py                | 32 +++++++++++++++--------
 6 files changed, 74 insertions(+), 40 deletions(-)
Changes applied before test
commit 0d970bcde0ab3b24d4fe03fe05940923fac74888
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jun 11 17:16:29 2020 +0200

    replay: Replay origin-visit and origin-visit-status
    
    This now uses the respective origin-visit-add and origin-visit-status-add
    endpoints.
    
    Related to T2310

commit 1d8a54ba8312d15c9fd843efe3754691576398af
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Jun 13 08:40:29 2020 +0200

    in_memory: Make origin-visit-status-add do on conflict ignore
    
    Aligning its behavior with the other backend.
    
    Related to T2310

commit 874da2dee1957ac5e9f5b17a81e11d9d9e8690f0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Jun 13 08:37:57 2020 +0200

    Start migrating the validate proxy toward using BaseModel objects
    
    This will allow to progress incrementally towards removing it.
    
    When it allows to use BaseModel objects everywhere (and tests in test_storage
    are adapted to use this property), it will be time to remove it entirely (as
    it's only used in test).
    
    It's preparatory work for future diffs.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/270/ for more details.

I think the rest is in-memory implementation specific which is annoying.
(~> I've no idea yet if i messed up the in-memory implementation or not)

in-memory implementation was not respecting the on-conflict ignore insertion
the other backends do. So here goes D3278 to fix that.

Build has FAILED

Patch application report for D3273 (id=11622)

Could not rebase; Attempt merge onto 33efdb0dd4...

Updating 33efdb0..645f104
Fast-forward
 swh/storage/in_memory.py          |  6 +++--
 swh/storage/replay.py             | 15 ++++++++-----
 swh/storage/tests/test_replay.py  |  7 +++---
 swh/storage/tests/test_storage.py | 46 +++++++++++++++++++++++++++++----------
 swh/storage/validate.py           | 32 ++++++++++++++++++---------
 5 files changed, 73 insertions(+), 33 deletions(-)
Changes applied before test
commit 645f104bd401dc0399bbdc61349210ad81300f88
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jun 11 17:16:29 2020 +0200

    replay: Replay origin-visit and origin-visit-status
    
    This now uses the respective origin-visit-add and origin-visit-status-add
    endpoints.
    
    Related to T2310

commit 1d8a54ba8312d15c9fd843efe3754691576398af
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Jun 13 08:40:29 2020 +0200

    in_memory: Make origin-visit-status-add do on conflict ignore
    
    Aligning its behavior with the other backend.
    
    Related to T2310

commit 874da2dee1957ac5e9f5b17a81e11d9d9e8690f0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Jun 13 08:37:57 2020 +0200

    Start migrating the validate proxy toward using BaseModel objects
    
    This will allow to progress incrementally towards removing it.
    
    When it allows to use BaseModel objects everywhere (and tests in test_storage
    are adapted to use this property), it will be time to remove it entirely (as
    it's only used in test).
    
    It's preparatory work for future diffs.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/271/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/271/console

Build is green

Patch application report for D3273 (id=11622)

Could not rebase; Attempt merge onto 33efdb0dd4...

Updating 33efdb0..645f104
Fast-forward
 swh/storage/in_memory.py          |  6 +++--
 swh/storage/replay.py             | 15 ++++++++-----
 swh/storage/tests/test_replay.py  |  7 +++---
 swh/storage/tests/test_storage.py | 46 +++++++++++++++++++++++++++++----------
 swh/storage/validate.py           | 32 ++++++++++++++++++---------
 5 files changed, 73 insertions(+), 33 deletions(-)
Changes applied before test
commit 645f104bd401dc0399bbdc61349210ad81300f88
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jun 11 17:16:29 2020 +0200

    replay: Replay origin-visit and origin-visit-status
    
    This now uses the respective origin-visit-add and origin-visit-status-add
    endpoints.
    
    Related to T2310

commit 1d8a54ba8312d15c9fd843efe3754691576398af
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Jun 13 08:40:29 2020 +0200

    in_memory: Make origin-visit-status-add do on conflict ignore
    
    Aligning its behavior with the other backend.
    
    Related to T2310

commit 874da2dee1957ac5e9f5b17a81e11d9d9e8690f0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Sat Jun 13 08:37:57 2020 +0200

    Start migrating the validate proxy toward using BaseModel objects
    
    This will allow to progress incrementally towards removing it.
    
    When it allows to use BaseModel objects everywhere (and tests in test_storage
    are adapted to use this property), it will be time to remove it entirely (as
    it's only used in test).
    
    It's preparatory work for future diffs.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/273/ for more details.

This revision is now accepted and ready to land.Jun 15 2020, 11:02 AM

Build is green

Patch application report for D3273 (id=11637)

Rebasing onto 0183fec91d...

Current branch diff-target is up to date.
Changes applied before test
commit 2bcbc82b708550415ce23a8a91830ad2b75f848c
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Thu Jun 11 17:16:29 2020 +0200

    replay: Replay origin-visit and origin-visit-status
    
    This now uses the respective origin-visit-add and origin-visit-status-add
    endpoints.
    
    Related to T2310

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/279/ for more details.