Page MenuHomeSoftware Heritage

Do not always auto-create an OriginVisitStatus object in origin_visit_add()
ClosedPublic

Authored by douardda on Jul 1 2022, 5:53 PM.

Details

Summary

when the OriginVisit object given as argument to be inserted already
have its visit id set (which is usually the case in a replayer-like
session), it makes no sense to auto-add the first OriginVisitStatus
objects related to this visit; this behavior is expected only when the
origin_visit_add() is called from a loading session.

Depends in D8067.

Diff Detail

Repository
rDSTO Storage manager
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D8068 (id=29109)

Could not rebase; Attempt merge onto c19f53f194...

Updating c19f53f1..07889e9c
Fast-forward
 requirements-swh.txt                               |  2 +-
 swh/storage/postgresql/storage.py                  | 36 ++++++++++----
 swh/storage/pytest_plugin.py                       |  2 +-
 swh/storage/tests/storage_tests.py                 | 57 +++++++++++++++-------
 swh/storage/tests/test_postgresql.py               |  5 ++
 swh/storage/tests/test_postgresql_flavor_mirror.py | 37 ++++++++++++++
 .../tests/test_postgresql_flavor_readreplica.py    | 37 ++++++++++++++
 7 files changed, 147 insertions(+), 29 deletions(-)
 create mode 100644 swh/storage/tests/test_postgresql_flavor_mirror.py
 create mode 100644 swh/storage/tests/test_postgresql_flavor_readreplica.py
Changes applied before test
commit 07889e9c1100a5a4eaeb9f6deb38ea59e13050db
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not allways auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

commit 47caf04ea5e9d47c976200fff7d61164985a4390
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 14:56:47 2022 +0200

    Add a Storage.flavor property to the postgresql backend
    
    and add tests for 'mirror' and 'read_replica' flavors.

commit a00650ea0c34a98653d24693ce07555f55d816f4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 14:50:28 2022 +0200

    Update pytest_plugin for swh.core 2.10

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1624/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1624/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 1 2022, 6:04 PM
Harbormaster failed remote builds in B30201: Diff 29109!

Build is green

Patch application report for D8068 (id=29109)

Could not rebase; Attempt merge onto c19f53f194...

Updating c19f53f1..07889e9c
Fast-forward
 requirements-swh.txt                               |  2 +-
 swh/storage/postgresql/storage.py                  | 36 ++++++++++----
 swh/storage/pytest_plugin.py                       |  2 +-
 swh/storage/tests/storage_tests.py                 | 57 +++++++++++++++-------
 swh/storage/tests/test_postgresql.py               |  5 ++
 swh/storage/tests/test_postgresql_flavor_mirror.py | 37 ++++++++++++++
 .../tests/test_postgresql_flavor_readreplica.py    | 37 ++++++++++++++
 7 files changed, 147 insertions(+), 29 deletions(-)
 create mode 100644 swh/storage/tests/test_postgresql_flavor_mirror.py
 create mode 100644 swh/storage/tests/test_postgresql_flavor_readreplica.py
Changes applied before test
commit 07889e9c1100a5a4eaeb9f6deb38ea59e13050db
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not allways auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

commit 47caf04ea5e9d47c976200fff7d61164985a4390
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 14:56:47 2022 +0200

    Add a Storage.flavor property to the postgresql backend
    
    and add tests for 'mirror' and 'read_replica' flavors.

commit a00650ea0c34a98653d24693ce07555f55d816f4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 14:50:28 2022 +0200

    Update pytest_plugin for swh.core 2.10

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1626/ for more details.

Build is green

Patch application report for D8068 (id=29176)

Rebasing onto 47caf04ea5...

Current branch diff-target is up to date.
Changes applied before test
commit c1923721a7cae2c2b0260ef4e95d5bc7bb5ae005
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1627/ for more details.

This is only tested indirectly by origin_visit_get_latest's tests. Could you add a new test, based on origin_visit_status_get()?

swh/storage/postgresql/storage.py
1051–1067

Journal writes should happen as soon as we have the visit id, in case of crashes.

douardda marked an inline comment as done.

fixes suggested by vlorentz

douardda retitled this revision from postgresql: Do not allways auto-create an OriginVisitStatus object in origin_visit_add() to postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add().Jul 6 2022, 3:28 PM
This revision is now accepted and ready to land.Jul 6 2022, 3:36 PM

Build has FAILED

Patch application report for D8068 (id=29184)

Rebasing onto 47caf04ea5...

Current branch diff-target is up to date.
Changes applied before test
commit fc68a10657af95ae91386f44b3b3ab46c603058e
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1629/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1629/console

fix cassandra's origin_visit_add() to match pg

Build has FAILED

Patch application report for D8068 (id=29190)

Rebasing onto 47caf04ea5...

Current branch diff-target is up to date.
Changes applied before test
commit e99ae18bb5554f8ffc21fd95876b3236c98b109e
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1630/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1630/console

implement this behavior also in cassandra/in-memory backend

and update tests accordingly.

Build is green

Patch application report for D8068 (id=29194)

Rebasing onto 47caf04ea5...

Current branch diff-target is up to date.
Changes applied before test
commit e0825acb7e178ec7dfa96993fe6041d302361ca0
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jul 1 17:46:48 2022 +0200

    do not always auto-create an OriginVisitStatus object in origin_visit_add()
    
    when the OriginVisit object given as argument to be inserted already
    have its visit id set (which is usually the case in a replayer-like
    session), it makes no sense to auto-add the first OriginVisitStatus
    objects related to this visit; this behavior is expected only when the
    origin_visit_add() is called from a loading session.
    
    Adapt tests accordingly -- several tests did depend on the auto-add
    behavior of the origin_visit_add method for OriginVisit objects which
    visit_id is given in the test dataset.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1631/ for more details.

vlorentz retitled this revision from postgresql: Do not always auto-create an OriginVisitStatus object in origin_visit_add() to Do not always auto-create an OriginVisitStatus object in origin_visit_add().
olasd added a subscriber: olasd.

LGTM, thanks!