Page MenuHomeSoftware Heritage

Do not always auto-create an OriginVisitStatus object in origin_visit_add()
ClosedPublic

Authored by douardda on Jul 1 2022, 5:53 PM.

Details

Summary

when the OriginVisit object given as argument to be inserted already
have its visit id set (which is usually the case in a replayer-like
session), it makes no sense to auto-add the first OriginVisitStatus
objects related to this visit; this behavior is expected only when the
origin_visit_add() is called from a loading session.

Depends in D8067.

Diff Detail

Unit TestsFailed

TimeTest
3 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.algos.test_snapshot::test_snapshot_get_latest
swh_storage = <swh.storage.in_memory.InMemoryStorage object at 0x7f56f2bcd2b0> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f56f2bcd668>
2 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.algos.test_snapshot::test_snapshot_get_latest_none
swh_storage = <swh.storage.in_memory.InMemoryStorage object at 0x7f570bb61470> sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f570bb7bf60>
111,652 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_kafka_writer::test_storage_direct_writer
kafka_prefix = 'ghbabszquc', kafka_server = '127.0.0.1:45525' consumer = <cimpl.Consumer object at 0x7f56f1890ea8>
14,029 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_replay::test_storage_replay_with_collision
replayer_storage_and_client = (<swh.storage.in_memory.InMemoryStorage object at 0x7f56f7bb6be0>, <swh.journal.client.JournalClient object at 0x7f56f7b5e198>) caplog = <_pytest.logging.LogCaptureFixture object at 0x7f594d63c9e8>
14,052 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_replay::test_storage_replayer
replayer_storage_and_client = (<swh.storage.in_memory.InMemoryStorage object at 0x7f56f323b278>, <swh.journal.client.JournalClient object at 0x7f56f19c93c8>) caplog = <_pytest.logging.LogCaptureFixture object at 0x7f56f325da20>
View Full Test Results (7 Failed · 1,595 Passed · 44 Skipped)

Event Timeline

Build has FAILED

Patch application report for D8068 (id=29109)

Could not rebase; Attempt merge onto c19f53f194...

Updating c19f53f1..07889e9c
Fast-forward
 requirements-swh.txt                               |  2 +-
 swh/storage/postgresql/storage.py                  | 36 ++++++++++----
 swh/storage/pytest_plugin.py                       |  2 +-
 swh/storage/tests/storage_tests.py                 | 57 +++++++++++++++-------
 swh/storage/tests/test_postgresql.py               |  5 ++
 swh/storage/tests/test_postgresql_flavor_mirror.py | 37 ++++++++++++++
 .../tests/test_postgresql_flavor_readreplica.py    | 37 ++++++++++++++
 7 files changed, 147 insertions(+), 29 deletions(-)
 create mode 100644 swh/storage/tests/test_postgresql_flavor_mirror.py
 create mode 100644 swh/storage/tests/test_postgresql_flavor_readreplica.py
Changes applied before test
commit 07889e9c1100a5a4eaeb9f6deb38ea59e13050db
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not allways auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

commit 47caf04ea5e9d47c976200fff7d61164985a4390
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 14:56:47 2022 +0200

    Add a Storage.flavor property to the postgresql backend
    
    and add tests for 'mirror' and 'read_replica' flavors.

commit a00650ea0c34a98653d24693ce07555f55d816f4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 14:50:28 2022 +0200

    Update pytest_plugin for swh.core 2.10

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1624/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1624/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 1 2022, 6:04 PM
Harbormaster failed remote builds in B30201: Diff 29109!

Build is green

Patch application report for D8068 (id=29109)

Could not rebase; Attempt merge onto c19f53f194...

Updating c19f53f1..07889e9c
Fast-forward
 requirements-swh.txt                               |  2 +-
 swh/storage/postgresql/storage.py                  | 36 ++++++++++----
 swh/storage/pytest_plugin.py                       |  2 +-
 swh/storage/tests/storage_tests.py                 | 57 +++++++++++++++-------
 swh/storage/tests/test_postgresql.py               |  5 ++
 swh/storage/tests/test_postgresql_flavor_mirror.py | 37 ++++++++++++++
 .../tests/test_postgresql_flavor_readreplica.py    | 37 ++++++++++++++
 7 files changed, 147 insertions(+), 29 deletions(-)
 create mode 100644 swh/storage/tests/test_postgresql_flavor_mirror.py
 create mode 100644 swh/storage/tests/test_postgresql_flavor_readreplica.py
Changes applied before test
commit 07889e9c1100a5a4eaeb9f6deb38ea59e13050db
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not allways auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

commit 47caf04ea5e9d47c976200fff7d61164985a4390
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 14:56:47 2022 +0200

    Add a Storage.flavor property to the postgresql backend
    
    and add tests for 'mirror' and 'read_replica' flavors.

commit a00650ea0c34a98653d24693ce07555f55d816f4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 14:50:28 2022 +0200

    Update pytest_plugin for swh.core 2.10

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1626/ for more details.

Build is green

Patch application report for D8068 (id=29176)

Rebasing onto 47caf04ea5...

Current branch diff-target is up to date.
Changes applied before test
commit c1923721a7cae2c2b0260ef4e95d5bc7bb5ae005
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1627/ for more details.

This is only tested indirectly by origin_visit_get_latest's tests. Could you add a new test, based on origin_visit_status_get()?

swh/storage/postgresql/storage.py
1051–1067

Journal writes should happen as soon as we have the visit id, in case of crashes.

douardda marked an inline comment as done.

fixes suggested by vlorentz

douardda retitled this revision from postgresql: Do not allways auto-create an OriginVisitStatus object in origin_visit_add() to postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add().Jul 6 2022, 3:28 PM
This revision is now accepted and ready to land.Jul 6 2022, 3:36 PM

Build has FAILED

Patch application report for D8068 (id=29184)

Rebasing onto 47caf04ea5...

Current branch diff-target is up to date.
Changes applied before test
commit fc68a10657af95ae91386f44b3b3ab46c603058e
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1629/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1629/console

fix cassandra's origin_visit_add() to match pg

Build has FAILED

Patch application report for D8068 (id=29190)

Rebasing onto 47caf04ea5...

Current branch diff-target is up to date.
Changes applied before test
commit e99ae18bb5554f8ffc21fd95876b3236c98b109e
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1630/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1630/console

implement this behavior also in cassandra/in-memory backend

and update tests accordingly.

Build is green

Patch application report for D8068 (id=29194)

Rebasing onto 47caf04ea5...

Current branch diff-target is up to date.
Changes applied before test
commit e0825acb7e178ec7dfa96993fe6041d302361ca0
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    do not always auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.
    
    Adapt tests accordingly -- several tests did depend on the auto-add
    behavior of the origin_visit_add method for OriginVisit objects which
    visit_id is given in the test dataset.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1631/ for more details.

vlorentz retitled this revision from postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add() to Do not always auto-create an OriginVisitStatus object in origin_visit_add().
olasd added a subscriber: olasd.

LGTM, thanks!