Depends on D4866
Details
- Reviewers
vlorentz olasd - Group Reviewers
Reviewers - Commits
- rDSCHa5fb291703d2: backend: Make origin_visit_stats_upsert a batch api
tox
Diff Detail
- Repository
- rDSCH Scheduling utilities
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 18396 Build 28428: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 28427: arc lint + arc unit
Event Timeline
Build has FAILED
Patch application report for D4868 (id=17242)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..254b5af Fast-forward sql/updates/19.sql | 4 + swh/scheduler/backend.py | 74 +++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 89 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 349 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 93 +++++++- 8 files changed, 593 insertions(+), 36 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit 254b5af2a44e96ac65eb939dfd5dece9f5d54f20 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api commit 7f4d22a42213378b1e5e85e921c61bfa62be2868 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/126/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/126/console
Build is green
Patch application report for D4868 (id=17246)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..737432a Fast-forward sql/updates/19.sql | 4 + swh/scheduler/backend.py | 74 +++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 95 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 349 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 93 +++++++- 8 files changed, 599 insertions(+), 36 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit 737432a58b3a09b01c2039beb5b8140e6dbc4618 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit 7f4d22a42213378b1e5e85e921c61bfa62be2868 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/127/ for more details.
Thanks for handling this! I hope this will make the simulator happier!
I don't see any tests directly calling the upsert function with a list of visit_stats. I guess that's handled through the journal client tests? The straight port of the tests that used the "unary" form of the function is definitely useful, but it might make sense to have some small tests calling that function with multiple arguments directly, ideally with data that would trigger the cardinality violation.
Thanks for handling this! I hope this will make the simulator happier!
Yes, that's what we are aiming at ;)
I don't see any tests directly calling the upsert function with a list of visit_stats. I guess that's handled through the journal client tests? The straight port of the tests that used the "unary" form of the function is definitely useful, but it might make sense to have some small tests calling that function with multiple arguments directly, ideally with data that would trigger the cardinality violation.
Indeed, i just upgraded the existing ones.
Thanks for pointing it out.
I foresee 2 new tests now.
One passing to deal with multiple inserts in one call.
Another failing with the cardinality thing.
Thanks again.
swh/scheduler/journal_client.py | ||
---|---|---|
63–97 | Those dicts are intermediary objects. It's not clear whether adding creation of intermediary objects (attr.evolve) is beneficial here instead of simply mutating dicts. |
failing test on the cardinality exception is a PITA, i'm currently stuck on and i'm forced to do ugly stuff [1]
E swh.core.api.RemoteException: <RemoteException 500 CardinalityViolation: ['ON CONFLICT DO UPDATE command cannot affect row a second time\nHINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.\n']>
swh/scheduler/api/client.py | ||
---|---|---|
40 | ugly stuff. |
Build is green
Patch application report for D4868 (id=17249)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..affad0c Fast-forward sql/updates/19.sql | 4 + swh/scheduler/api/client.py | 26 ++- swh/scheduler/backend.py | 74 +++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 95 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 349 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 157 ++++++++++++- 9 files changed, 685 insertions(+), 40 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit affad0c73e7bba77aaee65687cda235b9d9dc911 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit 7f4d22a42213378b1e5e85e921c61bfa62be2868 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/128/ for more details.
swh/scheduler/api/client.py | ||
---|---|---|
36 |
Build is green
Patch application report for D4868 (id=17255)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..2c2b0d1 Fast-forward sql/updates/19.sql | 4 + swh/scheduler/api/client.py | 26 ++- swh/scheduler/backend.py | 73 +++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 98 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 356 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 157 ++++++++++++- 9 files changed, 694 insertions(+), 40 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit 2c2b0d1478d2d9a1b32da86b80af093e43081aad Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit a916b4a8d3fbca565a9ec5abead5bdeb0e346845 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/130/ for more details.
Build is green
Patch application report for D4868 (id=17256)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..ae5d743 Fast-forward sql/updates/19.sql | 4 + swh/scheduler/api/client.py | 24 +- swh/scheduler/backend.py | 73 +++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 98 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 356 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 157 ++++++++++++- 9 files changed, 692 insertions(+), 40 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit ae5d7434eee201df689353c3bf6fe1dd59e68ee7 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit a916b4a8d3fbca565a9ec5abead5bdeb0e346845 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/131/ for more details.
Build is green
Patch application report for D4868 (id=17257)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..2ab2aa9 Fast-forward sql/updates/19.sql | 4 + swh/scheduler/api/client.py | 24 +- swh/scheduler/backend.py | 73 +++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 98 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 356 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 155 ++++++++++++- 9 files changed, 691 insertions(+), 39 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit 2ab2aa95674d624791f42fc6e12b8c30ce52282b Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit a916b4a8d3fbca565a9ec5abead5bdeb0e346845 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/132/ for more details.
swh/scheduler/api/client.py | ||
---|---|---|
23–42 | It's not enough because this code is missing from swh/scheduler/api/server.py: @app.errorhandler(CardinalityViolation) def argument_error_handler(exception): return error_handler(exception, encode_data, status_code=400) However, CardinalityViolation is an SQL-specific error, so it should not leak through the API. You should make the backend raise a subclass of SchedulerException instead. |
swh/scheduler/api/client.py | ||
---|---|---|
23–42 | to be more specific, reraise_exceptions is only read if it's a 4xx HTTP status; but without the above code, the RPC server returns a 5xx error because it's an unexpected exception from its point of view. |
swh/scheduler/api/client.py | ||
---|---|---|
23–42 | Thanks for the heads up for the first part i was missing. Your last suggestion sounds better to my ears indeed. I'll adapt. Thanks |
adapt according to review
Raise SchedulerException instead of leaking a postgres exception.
This voids the needs to tamper with api.client code.
Build is green
Patch application report for D4868 (id=17258)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..83b3bca Fast-forward sql/updates/19.sql | 4 + swh/scheduler/backend.py | 85 ++++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 98 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 356 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 156 ++++++++++++- 8 files changed, 679 insertions(+), 40 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit 83b3bcacfa65204117fbc9b877e7d7d52cd86644 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit a916b4a8d3fbca565a9ec5abead5bdeb0e346845 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/133/ for more details.
swh/scheduler/backend.py | ||
---|---|---|
824 | ;) |
Use simpler repr call
swh/scheduler/backend.py | ||
---|---|---|
824 | yes, i thought of it. I guess the thought process must be, if a standard known already exists, then go with it ;) Thanks. |
Build is green
Patch application report for D4868 (id=17262)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..5a1600e Fast-forward sql/updates/19.sql | 4 + swh/scheduler/backend.py | 85 ++++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 98 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 356 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 156 ++++++++++++- 8 files changed, 679 insertions(+), 40 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit 5a1600e22032082765b9624c647c96a268f5c14e Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit a916b4a8d3fbca565a9ec5abead5bdeb0e346845 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/134/ for more details.
Build is green
Patch application report for D4868 (id=17270)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..5a1600e Fast-forward sql/updates/19.sql | 4 + swh/scheduler/backend.py | 85 ++++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 98 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 356 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 156 ++++++++++++- 8 files changed, 679 insertions(+), 40 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit 5a1600e22032082765b9624c647c96a268f5c14e Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit a916b4a8d3fbca565a9ec5abead5bdeb0e346845 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/136/ for more details.
Build is green
Patch application report for D4868 (id=17271)
Could not rebase; Attempt merge onto ca45d40f2a...
Updating ca45d40..a5fb291 Fast-forward sql/updates/19.sql | 4 + swh/scheduler/backend.py | 85 ++++--- swh/scheduler/interface.py | 4 +- swh/scheduler/journal_client.py | 98 ++++++++ swh/scheduler/model.py | 10 + swh/scheduler/sql/30-schema.sql | 6 +- swh/scheduler/tests/test_journal_client.py | 356 +++++++++++++++++++++++++++++ swh/scheduler/tests/test_scheduler.py | 155 ++++++++++++- 8 files changed, 678 insertions(+), 40 deletions(-) create mode 100644 swh/scheduler/journal_client.py create mode 100644 swh/scheduler/tests/test_journal_client.py
Changes applied before test
commit a5fb291703d2f5de52450c326eebe1df07639714 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Jan 14 18:38:05 2021 +0100 backend: Make origin_visit_stats_upsert a batch api Related to T2967 commit 608aa2075619bbc9f5c7dab56d1c37a560f13e64 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Jan 13 13:03:28 2021 +0100 Populate origin_visit_stats table out of the origin_visit_status topic The snapshot is used to determine the "eventful/uneventful" nature of the origin visit status. When no snapshot is provided, the visit is considered as failed so the last_failed column is updated. As there is no time guarantee when reading message from the topic, the code tries to keep the data in the most timely ordered as possible. Only most recent information is kept. Related to T2967
See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/137/ for more details.