Depends on D6165.
Details
- Reviewers
vlorentz - Group Reviewers
Reviewers - Commits
- rDPROVe649205e2589: Add support for remote backend on existing storage tests
Diff Detail
- Repository
- rDPROV Provenance database
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build is green
Patch application report for D6339 (id=23214)
Could not rebase; Attempt merge onto 4c087ea0ec...
Updating 4c087ea..801e8b8 Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 5 + requirements-test.txt | 2 +- requirements.txt | 2 + swh/provenance/__init__.py | 28 +- swh/provenance/api/client.py | 541 +++++++++++++++++++- swh/provenance/api/server.py | 848 +++++++++++++++++++++++++++++--- swh/provenance/cli.py | 39 +- swh/provenance/graph.py | 9 + swh/provenance/interface.py | 20 + swh/provenance/mongo/backend.py | 41 +- swh/provenance/origin.py | 17 +- swh/provenance/postgresql/archive.py | 15 +- swh/provenance/postgresql/provenance.py | 40 +- swh/provenance/provenance.py | 149 +++--- swh/provenance/revision.py | 31 +- swh/provenance/storage/archive.py | 15 +- swh/provenance/tests/conftest.py | 84 ++-- tox.ini | 3 +- 20 files changed, 1624 insertions(+), 272 deletions(-)
Changes applied before test
commit 801e8b8e7fc909d6a167fb4987f10fb9a521f1c5 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit 5992c08ff5c8f4eb2cdd8c5148375a050a4da738 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 7513e6eff6979f8014dc23ebe0ea5c9e937de53c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit af1fe50f5a6548c6f4c31adbcf8d5124796d691b Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 61b427c0956bd35596213df7c0f4655966227449 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:59:38 2021 +0200 Make old StatsD metrics style compliant with the rest of the module commit d9a00102c66284f358c6ced5e3fdf1057a9ba62d Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 14:08:10 2021 +0200 Add StatsD support to graph submodule Time stats of graphs creation and counter of amount of invalidated isochrone frontiers commit 0cf3d9185f3eb8528c9cf2031ea8f94d83977ca2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 13:53:57 2021 +0200 Add StatsD support to provenance storage implementations commit 0160d4f7c3cfc3f0193b729f0c04bd2ff7ad7129 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:21:42 2021 +0200 Add StatsD support to provenance backend commit 4f6bf0a4670e69730e47f519ac8bca6673be29f6 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:17:34 2021 +0200 Split `Provenance::flush` method in two (one per layer) commit 8d401db34539f5df2ce2bd37080ec8ae1557417b Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 1 11:27:02 2021 +0200 Remove old client/server storage based on `swh.core.api.RPCClient` This implementation was a first attempt for conflict resolution that didn't worked as expected. commit 846b20e0e9995a13591a1641bf92036ff3764be5 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Sep 24 11:08:08 2021 +0200 Add `open`/`close` methods to both `ProvenanceInterface` and `ProvenanceStorageInterface` The idea is to have a mechanism to explicitly allocate/release resources when needed. commit 6c3071493b5d3f187113493275d402a27866da95 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 16:14:10 2021 +0200 Rename remote storage backend classes Make names consistent with the naming convention used for other components.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/414/ for more details.
IMHO This diff should be squashed in D6165 (it's really part of the work adding the rabbitmq-based backend).
Also I don't see the reason for the .gitignore part. Does these tests generate new files (to be ignored) in the git working directory?
Build was aborted
Patch application report for D6339 (id=23276)
Could not rebase; Attempt merge onto 4c087ea0ec...
Updating 4c087ea..43ae89e Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 2 +- requirements.txt | 3 +- swh/provenance/__init__.py | 28 +- swh/provenance/api/client.py | 557 ++++++++++++++++++++- swh/provenance/api/server.py | 846 +++++++++++++++++++++++++++++--- swh/provenance/cli.py | 97 ++-- swh/provenance/graph.py | 9 + swh/provenance/interface.py | 47 +- swh/provenance/mongo/backend.py | 61 ++- swh/provenance/origin.py | 17 +- swh/provenance/postgresql/archive.py | 15 +- swh/provenance/postgresql/provenance.py | 58 ++- swh/provenance/provenance.py | 165 ++++--- swh/provenance/revision.py | 31 +- swh/provenance/storage/archive.py | 15 +- swh/provenance/tests/conftest.py | 81 +-- tox.ini | 3 +- 20 files changed, 1729 insertions(+), 315 deletions(-)
Changes applied before test
commit 43ae89e5a7c85d95557deca669fdfae875333268 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit cd4056be39e8152276cc5d65f39d512021714d84 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 2eaf7200e8a97e42b291313f04cf24ae6c6ce9f2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit d1913496c81be2fae1eff1cf93a17c8439991706 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 04ff73ea98f5f239cee6a126c75767f4617e330c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:59:38 2021 +0200 Make old StatsD metrics style compliant with the rest of the module commit 1bd6b22aae6a356e18f65005fc7e1c162e6f38c6 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 14:08:10 2021 +0200 Add StatsD support to graph submodule Time stats of graphs creation and counter of amount of invalidated isochrone frontiers commit 1ad78362fb415ea1d88a1d416da9991896e68d43 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 13:53:57 2021 +0200 Add StatsD support to provenance storage implementations commit e2a1843d5ebe01a9cdfe46b6b74dde1e293b8c01 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:21:42 2021 +0200 Add StatsD support to provenance backend commit 246e55f9b7e3475ea4509e08370827a3190db916 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:17:34 2021 +0200 Split `Provenance::flush` method in two (one per layer) commit f0210c3753c3a4122ee3c54f7fac97d170a142fa Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Sep 24 11:08:08 2021 +0200 Add `open`/`close` methods to both `ProvenanceInterface` and `ProvenanceStorageInterface` This allows to have an explicit mechanism to allocate/release resources when needed. The necessary methods for the classes implementing these interfaces to be turned in contexts managers are added as well (ie. `__enter__`/`__exit__`). commit 172e327c25883bee768a9c16b850ce6aab7e2eb2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 16:14:10 2021 +0200 Remove remote provenance storage based on `swh.core.api.RPCClient` This implementation was a first attempt for conflict resolution that didn't worked as expected.
Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/426/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/426/console
Build was aborted
Patch application report for D6339 (id=23278)
Could not rebase; Attempt merge onto 4c087ea0ec...
Updating 4c087ea..43ae89e Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 2 +- requirements.txt | 3 +- swh/provenance/__init__.py | 28 +- swh/provenance/api/client.py | 557 ++++++++++++++++++++- swh/provenance/api/server.py | 846 +++++++++++++++++++++++++++++--- swh/provenance/cli.py | 97 ++-- swh/provenance/graph.py | 9 + swh/provenance/interface.py | 47 +- swh/provenance/mongo/backend.py | 61 ++- swh/provenance/origin.py | 17 +- swh/provenance/postgresql/archive.py | 15 +- swh/provenance/postgresql/provenance.py | 58 ++- swh/provenance/provenance.py | 165 ++++--- swh/provenance/revision.py | 31 +- swh/provenance/storage/archive.py | 15 +- swh/provenance/tests/conftest.py | 81 +-- tox.ini | 3 +- 20 files changed, 1729 insertions(+), 315 deletions(-)
Changes applied before test
commit 43ae89e5a7c85d95557deca669fdfae875333268 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit cd4056be39e8152276cc5d65f39d512021714d84 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 2eaf7200e8a97e42b291313f04cf24ae6c6ce9f2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit d1913496c81be2fae1eff1cf93a17c8439991706 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 04ff73ea98f5f239cee6a126c75767f4617e330c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:59:38 2021 +0200 Make old StatsD metrics style compliant with the rest of the module commit 1bd6b22aae6a356e18f65005fc7e1c162e6f38c6 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 14:08:10 2021 +0200 Add StatsD support to graph submodule Time stats of graphs creation and counter of amount of invalidated isochrone frontiers commit 1ad78362fb415ea1d88a1d416da9991896e68d43 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 13:53:57 2021 +0200 Add StatsD support to provenance storage implementations commit e2a1843d5ebe01a9cdfe46b6b74dde1e293b8c01 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:21:42 2021 +0200 Add StatsD support to provenance backend commit 246e55f9b7e3475ea4509e08370827a3190db916 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:17:34 2021 +0200 Split `Provenance::flush` method in two (one per layer) commit f0210c3753c3a4122ee3c54f7fac97d170a142fa Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Sep 24 11:08:08 2021 +0200 Add `open`/`close` methods to both `ProvenanceInterface` and `ProvenanceStorageInterface` This allows to have an explicit mechanism to allocate/release resources when needed. The necessary methods for the classes implementing these interfaces to be turned in contexts managers are added as well (ie. `__enter__`/`__exit__`). commit 172e327c25883bee768a9c16b850ce6aab7e2eb2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 16:14:10 2021 +0200 Remove remote provenance storage based on `swh.core.api.RPCClient` This implementation was a first attempt for conflict resolution that didn't worked as expected.
Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/427/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/427/console
Build was aborted
Patch application report for D6339 (id=23278)
Could not rebase; Attempt merge onto 4c087ea0ec...
Updating 4c087ea..43ae89e Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 2 +- requirements.txt | 3 +- swh/provenance/__init__.py | 28 +- swh/provenance/api/client.py | 557 ++++++++++++++++++++- swh/provenance/api/server.py | 846 +++++++++++++++++++++++++++++--- swh/provenance/cli.py | 97 ++-- swh/provenance/graph.py | 9 + swh/provenance/interface.py | 47 +- swh/provenance/mongo/backend.py | 61 ++- swh/provenance/origin.py | 17 +- swh/provenance/postgresql/archive.py | 15 +- swh/provenance/postgresql/provenance.py | 58 ++- swh/provenance/provenance.py | 165 ++++--- swh/provenance/revision.py | 31 +- swh/provenance/storage/archive.py | 15 +- swh/provenance/tests/conftest.py | 81 +-- tox.ini | 3 +- 20 files changed, 1729 insertions(+), 315 deletions(-)
Changes applied before test
commit 43ae89e5a7c85d95557deca669fdfae875333268 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit cd4056be39e8152276cc5d65f39d512021714d84 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 2eaf7200e8a97e42b291313f04cf24ae6c6ce9f2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit d1913496c81be2fae1eff1cf93a17c8439991706 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 04ff73ea98f5f239cee6a126c75767f4617e330c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:59:38 2021 +0200 Make old StatsD metrics style compliant with the rest of the module commit 1bd6b22aae6a356e18f65005fc7e1c162e6f38c6 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 14:08:10 2021 +0200 Add StatsD support to graph submodule Time stats of graphs creation and counter of amount of invalidated isochrone frontiers commit 1ad78362fb415ea1d88a1d416da9991896e68d43 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 13:53:57 2021 +0200 Add StatsD support to provenance storage implementations commit e2a1843d5ebe01a9cdfe46b6b74dde1e293b8c01 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:21:42 2021 +0200 Add StatsD support to provenance backend commit 246e55f9b7e3475ea4509e08370827a3190db916 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:17:34 2021 +0200 Split `Provenance::flush` method in two (one per layer) commit f0210c3753c3a4122ee3c54f7fac97d170a142fa Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Sep 24 11:08:08 2021 +0200 Add `open`/`close` methods to both `ProvenanceInterface` and `ProvenanceStorageInterface` This allows to have an explicit mechanism to allocate/release resources when needed. The necessary methods for the classes implementing these interfaces to be turned in contexts managers are added as well (ie. `__enter__`/`__exit__`). commit 172e327c25883bee768a9c16b850ce6aab7e2eb2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 16:14:10 2021 +0200 Remove remote provenance storage based on `swh.core.api.RPCClient` This implementation was a first attempt for conflict resolution that didn't worked as expected.
Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/428/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/428/console
this should be squashed with the previous diff, and still my previous question about .gitignore
swh/provenance/tests/conftest.py | ||
---|---|---|
71 | what's the %2f doing here? |
IMHO This diff should be squashed in D6165 (it's really part of the work adding the rabbitmq-based backend).
We actually decided to separate them in different diff so the implementation and the test configuration can be reviewed separately.
Also I don't see the reason for the .gitignore part. Does these tests generate new files (to be ignored) in the git working directory?
Yes, there are a bunch on temp files that appear and might not be deleted if the test fails. Also there were some missing ignores from before, like .mypy_cache
swh/provenance/tests/conftest.py | ||
---|---|---|
71 | is the virtual host to connect to (the URL encoding for /): https://pika.readthedocs.io/en/stable/examples/using_urlparameters.html |
Build was aborted
Patch application report for D6339 (id=23300)
Could not rebase; Attempt merge onto 4c087ea0ec...
Updating 4c087ea..377e0ea Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 4 + requirements-test.txt | 2 +- requirements.txt | 3 +- swh/provenance/__init__.py | 28 +- swh/provenance/api/client.py | 557 ++++++++++++++++++++- swh/provenance/api/server.py | 846 +++++++++++++++++++++++++++++--- swh/provenance/cli.py | 97 ++-- swh/provenance/graph.py | 9 + swh/provenance/interface.py | 47 +- swh/provenance/mongo/backend.py | 61 ++- swh/provenance/origin.py | 17 +- swh/provenance/postgresql/archive.py | 15 +- swh/provenance/postgresql/provenance.py | 58 ++- swh/provenance/provenance.py | 165 ++++--- swh/provenance/revision.py | 31 +- swh/provenance/storage/archive.py | 15 +- swh/provenance/tests/conftest.py | 81 +-- tox.ini | 3 +- 20 files changed, 1731 insertions(+), 315 deletions(-)
Changes applied before test
commit 377e0ea2e63cb15e0054728c7401d2f6136dea6e Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit cd4056be39e8152276cc5d65f39d512021714d84 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 2eaf7200e8a97e42b291313f04cf24ae6c6ce9f2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit d1913496c81be2fae1eff1cf93a17c8439991706 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 04ff73ea98f5f239cee6a126c75767f4617e330c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:59:38 2021 +0200 Make old StatsD metrics style compliant with the rest of the module commit 1bd6b22aae6a356e18f65005fc7e1c162e6f38c6 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 14:08:10 2021 +0200 Add StatsD support to graph submodule Time stats of graphs creation and counter of amount of invalidated isochrone frontiers commit 1ad78362fb415ea1d88a1d416da9991896e68d43 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 13:53:57 2021 +0200 Add StatsD support to provenance storage implementations commit e2a1843d5ebe01a9cdfe46b6b74dde1e293b8c01 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:21:42 2021 +0200 Add StatsD support to provenance backend commit 246e55f9b7e3475ea4509e08370827a3190db916 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Sep 27 15:17:34 2021 +0200 Split `Provenance::flush` method in two (one per layer) commit f0210c3753c3a4122ee3c54f7fac97d170a142fa Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Sep 24 11:08:08 2021 +0200 Add `open`/`close` methods to both `ProvenanceInterface` and `ProvenanceStorageInterface` This allows to have an explicit mechanism to allocate/release resources when needed. The necessary methods for the classes implementing these interfaces to be turned in contexts managers are added as well (ie. `__enter__`/`__exit__`). commit 172e327c25883bee768a9c16b850ce6aab7e2eb2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 16:14:10 2021 +0200 Remove remote provenance storage based on `swh.core.api.RPCClient` This implementation was a first attempt for conflict resolution that didn't worked as expected.
Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/429/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/429/console
Build is green
Patch application report for D6339 (id=23313)
Could not rebase; Attempt merge onto 04ff73ea98...
Updating 04ff73e..f1842b9 Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 564 ++++++++++++++++++++++++++++ swh/provenance/api/server.py | 786 ++++++++++++++++++++++++++++++++++++++- swh/provenance/cli.py | 26 +- swh/provenance/tests/conftest.py | 24 +- tox.ini | 3 +- 11 files changed, 1411 insertions(+), 11 deletions(-)
Changes applied before test
commit f1842b95799b6816e760f44086ab166e17c98321 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit cd4056be39e8152276cc5d65f39d512021714d84 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 2eaf7200e8a97e42b291313f04cf24ae6c6ce9f2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit d1913496c81be2fae1eff1cf93a17c8439991706 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/431/ for more details.
Build is green
Patch application report for D6339 (id=23454)
Could not rebase; Attempt merge onto 04ff73ea98...
Updating 04ff73e..e1f8d82 Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 582 +++++++++++++++++ swh/provenance/api/server.py | 794 +++++++++++++++++++++++- swh/provenance/cli.py | 26 +- swh/provenance/model.py | 6 +- swh/provenance/provenance.py | 11 +- swh/provenance/tests/conftest.py | 24 +- swh/provenance/tests/test_provenance_storage.py | 21 +- swh/provenance/util.py | 15 + tox.ini | 3 +- 15 files changed, 1463 insertions(+), 38 deletions(-) create mode 100644 swh/provenance/util.py
Changes applied before test
commit e1f8d828fadefe4ea28d29972ae8f79300d01d54 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit 3f56270a0f912e909312e52f864081bf6720cfce Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 11 16:06:03 2021 +0200 Send several items per message in the remote provenance storage commit 03dc27f6f2eb1d99084fbc3a3f9ecdaa7c9edb27 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:49:44 2021 +0200 Fix config file parsing for server initilization commit 2759b4977f933be21951ea90fd70f7c16c69aea1 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:41:42 2021 +0200 Improve routing key computation for paths commit f146fac61a1ac44f489739caba3fe2b2f21de8d3 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 327be11571eef3e44c38705990f9c931661a7591 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit bccbf59fcb16d8727d347c3d7b9a623704a80467 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 3e87301a2868a7a9aa42403e150a60489f22708e Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:40:39 2021 +0200 Move path normalization function to `util` submodule commit 2c9ef5673b369f2baa83b11ec9256c6aafc3a855 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Oct 5 12:01:25 2021 +0200 Remove direct dependencies on deprecated `swh.model.identifiers` module
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/437/ for more details.
Build is green
Patch application report for D6339 (id=23518)
Could not rebase; Attempt merge onto 3e87301a28...
Updating 3e87301..bdb9b3c Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 582 ++++++++++++++++++++++++++++ swh/provenance/api/server.py | 794 ++++++++++++++++++++++++++++++++++++++- swh/provenance/cli.py | 26 +- swh/provenance/sql/30-schema.sql | 20 +- swh/provenance/sql/40-funcs.sql | 50 +-- swh/provenance/tests/conftest.py | 24 +- swh/provenance/util.py | 5 + tox.ini | 3 +- 14 files changed, 1477 insertions(+), 46 deletions(-)
Changes applied before test
commit bdb9b3cece0a69c7ab7e3acc0bcffdef884c54cb Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit b5a1d8414f313af6a59211afa66816228c14172c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 11 16:06:03 2021 +0200 Send several items per message in the remote provenance storage commit 5eb1fc59d3d34890f66f43314217412310fd919a Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:49:44 2021 +0200 Fix config file parsing for server initilization commit ce649ddb6db25dc3e68f128a7ae6174b8b31e8a0 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:41:42 2021 +0200 Improve routing key computation for paths commit 87108b1ea2ceb9909d4bbec6012004acdf96c08a Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit e6588cb24368b5edbc8bfdb82b5a4ba9b690d444 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit 5a85b6c17f172315e784724158b7e08b6bdf9c61 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 37da3774d8dc34365b7b1cbed469d970c51ecc58 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 14 12:03:47 2021 +0200 Improve PostgreSQL storage scheme for the `with-path-denormalized` flavor Previous version was storing arrays of strings representing tuples for the denormalized relations (`dst` and `loc` of the relation resp.). While that simplified the check for duplicates, it turned out to be very inefficient in terms of disk usage. The new version has two distinct lists if `bigint` (ie. internal ids) for `dst` and `loc` resp. To check for duplicates the lists should be zipped, and repeated tuples filtered.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/445/ for more details.
Build is green
Patch application report for D6339 (id=23636)
Could not rebase; Attempt merge onto 3e87301a28...
Updating 3e87301..ccd546c Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 588 +++++++++++++++++++++++ swh/provenance/api/server.py | 808 +++++++++++++++++++++++++++++++- swh/provenance/cli.py | 31 +- swh/provenance/graph.py | 2 +- swh/provenance/postgresql/provenance.py | 29 +- swh/provenance/provenance.py | 63 +++ swh/provenance/sql/30-schema.sql | 20 +- swh/provenance/sql/40-funcs.sql | 50 +- swh/provenance/tests/conftest.py | 24 +- swh/provenance/util.py | 5 + tox.ini | 3 +- 17 files changed, 1588 insertions(+), 54 deletions(-)
Changes applied before test
commit ccd546c04e85ea9c9952b350623da1db3297be58 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit effb5b099a9e6928da42cdb491db532d6a75e988 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 18 11:52:04 2021 +0200 Export batch size and prefetch count as parameters for remote storage commit 6e11a8e528850ae05375243469ed74ab9e0956ee Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 11 16:06:03 2021 +0200 Send several items per message in the remote provenance storage commit d18e8bd4e9ca7814c9cdbfa7a21c155a3d7d3a08 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:49:44 2021 +0200 Fix config file parsing for server initilization commit 018ab5106e5ac9be36830c8242f1d94c229b878a Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:41:42 2021 +0200 Improve routing key computation for paths commit 647d0ae75b85043b0c2ef0f528be9f6891c91ce9 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit cdddb2d573763ab005c7d3c754c5d85a263220e9 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit 35f03480581d52d1b7b705d0b974151fa49ba546 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 8168ab4fc3f0fc3556623dd3de854f222ffe5d7e Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 14 12:03:47 2021 +0200 Improve PostgreSQL storage scheme for the `with-path-denormalized` flavor Previous version was storing arrays of strings representing tuples for the denormalized relations (`dst` and `loc` of the relation resp.). While that simplified the check for duplicates, it turned out to be very inefficient in terms of disk usage. The new version has two distinct lists if `bigint` (ie. internal ids) for `dst` and `loc` resp. To check for duplicates the lists should be zipped, and repeated tuples filtered. commit c7ae90e08b39919da9d67ad3436a71d47a6ad5e7 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 18 12:10:10 2021 +0200 Add metrics on retries when flushing cache on the provenance backend commit bfea53a97c588aa85ddd2ea93fa3dcf17b34a6a4 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Oct 19 16:12:23 2021 +0200 Export page size as a parameter for postgresql storage
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/460/ for more details.
Build is green
Patch application report for D6339 (id=23661)
Could not rebase; Attempt merge onto ef49e3100c...
Updating ef49e31..9004970 Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 588 ++++++++++++++++++++++++++++ swh/provenance/api/server.py | 808 ++++++++++++++++++++++++++++++++++++++- swh/provenance/cli.py | 31 +- swh/provenance/sql/30-schema.sql | 20 +- swh/provenance/sql/40-funcs.sql | 50 ++- swh/provenance/tests/conftest.py | 24 +- swh/provenance/util.py | 5 + tox.ini | 3 +- 14 files changed, 1501 insertions(+), 47 deletions(-)
Changes applied before test
commit 9004970f0c78901871f87d4091508387115ff4d3 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit b2803436de6bc67c6d0dc01b5624ebba18689ca2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 18 11:52:04 2021 +0200 Export batch size and prefetch count as parameters for remote storage commit 3fee8fbfbdaac3ed1adb5003adb32c52c99c8d37 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 11 16:06:03 2021 +0200 Send several items per message in the remote provenance storage commit d8c0aae0093f700f5e362d60fe8cf6b51f374fa2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:49:44 2021 +0200 Fix config file parsing for server initilization commit 8e0d98fb67ace71342867b68c0703b551e47e7f0 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:41:42 2021 +0200 Improve routing key computation for paths commit 1e66d6940749e33ff4442dfe8c2495567b6c50a5 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 5e8c8a54cdac94a2a41662468f33893df97e7c6b Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit 2a5fe87fbd76aa30e44ea7703a85d5a4b70e574c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 62884e23dd1164274fd89a09acedae8977a8e0f3 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 14 12:03:47 2021 +0200 Improve PostgreSQL storage scheme for the `with-path-denormalized` flavor Previous version was storing arrays of strings representing tuples for the denormalized relations (`dst` and `loc` of the relation resp.). While that simplified the check for duplicates, it turned out to be very inefficient in terms of disk usage. The new version has two distinct lists if `bigint` (ie. internal ids) for `dst` and `loc` resp. To check for duplicates the lists should be zipped, and repeated tuples filtered.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/465/ for more details.
Build is green
Patch application report for D6339 (id=23907)
Could not rebase; Attempt merge onto ef49e3100c...
Updating ef49e31..9a1a616 Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 597 +++++++++++++++++++++ swh/provenance/api/server.py | 808 ++++++++++++++++++++++++++++- swh/provenance/cli.py | 31 +- swh/provenance/sql/30-schema.sql | 20 +- swh/provenance/sql/40-funcs.sql | 50 +- swh/provenance/tests/conftest.py | 24 +- swh/provenance/tests/data/generate_repo.py | 2 +- swh/provenance/util.py | 5 + tox.ini | 3 +- 15 files changed, 1511 insertions(+), 48 deletions(-)
Changes applied before test
commit 9a1a6169375b3591b81162dc0bce7f9c3d735e6c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit 9358df82cc7255340caadaa13ae3b53fbe5e1cc7 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 28 13:59:00 2021 +0200 Improve timeout logic on remote storage client side commit aa8dc0ea8f67748e53076f2143ba2f6dad150498 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 18 11:52:04 2021 +0200 Export batch size and prefetch count as parameters for remote storage commit a9bc8845740f18bcf4befe9c521c2b1b8c4fd769 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 11 16:06:03 2021 +0200 Send several items per message in the remote provenance storage commit fa5c6b763913bef84a128d152cb25f081edf399d Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:49:44 2021 +0200 Fix config file parsing for server initilization commit eaf8ad8026de592629d8c9286cf19db2690acfa0 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:41:42 2021 +0200 Improve routing key computation for paths commit 4243290997d281ece591c711e6748de341599e2d Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit df083f60f1eeeb9257992a639c9c1a9937ce62f4 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit 69596d600a120c13d0cd2ed0d4e48584e8b9dc7c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 743b5954068fcc98203d9d254c53c076856e3426 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 14 12:03:47 2021 +0200 Improve PostgreSQL storage scheme for the `with-path-denormalized` flavor Previous version was storing arrays of strings representing tuples for the denormalized relations (`dst` and `loc` of the relation resp.). While that simplified the check for duplicates, it turned out to be very inefficient in terms of disk usage. The new version has two distinct lists if `bigint` (ie. internal ids) for `dst` and `loc` resp. To check for duplicates the lists should be zipped, and repeated tuples filtered. commit 30d8899bcfd60019b84064eba6916af0b2b5173e Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 28 13:58:32 2021 +0200 Fix `yaml.load` deprecated warning
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/475/ for more details.
Build is green
Patch application report for D6339 (id=24268)
Could not rebase; Attempt merge onto 94baaab052...
Updating 94baaab..584845d Fast-forward swh/provenance/archive.py | 2 +- swh/provenance/cli.py | 4 +- swh/provenance/graph.py | 3 +- swh/provenance/model.py | 4 +- swh/provenance/postgresql/archive.py | 15 +++---- swh/provenance/provenance.py | 77 +++++++++++++----------------------- swh/provenance/revision.py | 12 ++++-- swh/provenance/storage/archive.py | 16 ++++---- swh/provenance/tests/conftest.py | 34 +++++++++------- 9 files changed, 81 insertions(+), 86 deletions(-)
Changes applied before test
commit 584845d3715ea6c536e7cf5f697cac628032416f Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 28 14:21:52 2021 +0200 Add support to filter files a minimum size The idea is to be able to filter files that are not meaningful from the provenance point of view. For instance, the empty file. This modification allows to define a minimum size for files to be considered for the provenance index. commit 966fe3e8d506ce8b4fddf6e9ad29db4dae9943ab Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Nov 23 16:11:09 2021 +0100 Reorder flushing operations to avoid unnecessary updated in the storage commit 62a31f6f986bb38ced99331ab66eb0717600ea5b Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Nov 24 11:10:40 2021 +0100 Rework conftest and improve type annotations
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/486/ for more details.
Build is green
Patch application report for D6339 (id=24272)
Could not rebase; Attempt merge onto 94baaab052...
Updating 94baaab..35a7b75 Fast-forward .gitignore | 4 +- mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 597 ++++++++++++++++++++++++++ swh/provenance/api/server.py | 808 ++++++++++++++++++++++++++++++++++- swh/provenance/archive.py | 2 +- swh/provenance/cli.py | 35 +- swh/provenance/graph.py | 3 +- swh/provenance/model.py | 4 +- swh/provenance/postgresql/archive.py | 15 +- swh/provenance/provenance.py | 77 ++-- swh/provenance/revision.py | 12 +- swh/provenance/sql/30-schema.sql | 20 +- swh/provenance/sql/40-funcs.sql | 50 ++- swh/provenance/storage/archive.py | 16 +- swh/provenance/tests/conftest.py | 58 ++- swh/provenance/util.py | 5 + tox.ini | 3 +- 21 files changed, 1591 insertions(+), 133 deletions(-)
Changes applied before test
commit 35a7b7595642453cfd9cf50e7d45e1f8b3959380 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit 81ccb6a310249e96e6393fd183dd36af66421083 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 28 13:59:00 2021 +0200 Improve timeout logic on remote storage client side commit bdfc2c23c18a3db5b32edc53172797b510442c7c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 18 11:52:04 2021 +0200 Export batch size and prefetch count as parameters for remote storage commit 785c156f7148bf20c0b1606736a4d9b99f701d7e Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 11 16:06:03 2021 +0200 Send several items per message in the remote provenance storage commit ea93e1f94cff82372d1236158e8c4c36ff2747cd Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:49:44 2021 +0200 Fix config file parsing for server initilization commit aa9f923259352d59a4825b17196c1f9df9ae4c9d Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:41:42 2021 +0200 Improve routing key computation for paths commit f5a548512b4d5f602bd5dbea5d66705325ef3da1 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 9f765cf93dd47d92dfd170fc14aacc69aa102a8a Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit 0eacff7a66a5a117b6d81d97d1a554d6cab4920c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently. commit 579c3bd35e5668ad9ef5fea58c20d5c66e5699f2 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 14 12:03:47 2021 +0200 Improve PostgreSQL storage scheme for the `with-path-denormalized` flavor Previous version was storing arrays of strings representing tuples for the denormalized relations (`dst` and `loc` of the relation resp.). While that simplified the check for duplicates, it turned out to be very inefficient in terms of disk usage. The new version has two distinct lists if `bigint` (ie. internal ids) for `dst` and `loc` resp. To check for duplicates the lists should be zipped, and repeated tuples filtered. commit 584845d3715ea6c536e7cf5f697cac628032416f Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 28 14:21:52 2021 +0200 Add support to filter files a minimum size The idea is to be able to filter files that are not meaningful from the provenance point of view. For instance, the empty file. This modification allows to define a minimum size for files to be considered for the provenance index. commit 966fe3e8d506ce8b4fddf6e9ad29db4dae9943ab Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Nov 23 16:11:09 2021 +0100 Reorder flushing operations to avoid unnecessary updated in the storage commit 62a31f6f986bb38ced99331ab66eb0717600ea5b Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Nov 24 11:10:40 2021 +0100 Rework conftest and improve type annotations
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/490/ for more details.
Build is green
Patch application report for D6339 (id=24324)
Could not rebase; Attempt merge onto 579c3bd35e...
Updating 579c3bd..67e76c8 Fast-forward .gitignore | 4 +- docs/storage/remote.rst | 340 +++++++++++++++++++++ mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 463 ++++++++++++++++++++++++++++ swh/provenance/api/server.py | 646 ++++++++++++++++++++++++++++++++++++++- swh/provenance/cli.py | 31 +- swh/provenance/tests/conftest.py | 24 +- swh/provenance/util.py | 5 + tox.ini | 3 +- 13 files changed, 1517 insertions(+), 14 deletions(-) create mode 100644 docs/storage/remote.rst
Changes applied before test
commit 67e76c86b7174e3167a888284a4c933c10f4f3d9 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit b6db561c102db1536fc956a4e5be5efbbd372e08 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Nov 26 16:21:19 2021 +0100 Add documentation for the remote storage backend Clean up code commit 81ccb6a310249e96e6393fd183dd36af66421083 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Thu Oct 28 13:59:00 2021 +0200 Improve timeout logic on remote storage client side commit bdfc2c23c18a3db5b32edc53172797b510442c7c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 18 11:52:04 2021 +0200 Export batch size and prefetch count as parameters for remote storage commit 785c156f7148bf20c0b1606736a4d9b99f701d7e Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Mon Oct 11 16:06:03 2021 +0200 Send several items per message in the remote provenance storage commit ea93e1f94cff82372d1236158e8c4c36ff2747cd Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:49:44 2021 +0200 Fix config file parsing for server initilization commit aa9f923259352d59a4825b17196c1f9df9ae4c9d Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Oct 8 14:41:42 2021 +0200 Improve routing key computation for paths commit f5a548512b4d5f602bd5dbea5d66705325ef3da1 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Wed Sep 15 13:39:59 2021 +0200 Improve server/client shoutdown logic and error handling Add StatsD support to client to be compliant with the other provenance storage implementations commit 9f765cf93dd47d92dfd170fc14aacc69aa102a8a Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Aug 31 13:36:34 2021 +0200 Rework `ProvenanceStorageRabbitMQWorker` to handle connection loss Use `pika.SelectConnection` and make an explicit handle of its life-cycle. Improve connection error handling on both client and server side. Change the RabbitMQ scheme to use 5 exchanges (one per entity + location). Each exchange handles all entity related insertions, dispatching to different queues depending on the requested `ProvenanceStorageInterface` methods (16 queues per methods). For instance, the `content` exchange handles all requests for `content_add` and `relation_add` for both relations `CNT_EARLY_IN_REV` and `CNT_IN_DIR` (ie. relations with content as source). In each case, requests are forwarded to 1 of 16 possible workers, depending on the sha1 id of the content. commit 0eacff7a66a5a117b6d81d97d1a554d6cab4920c Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Get methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Set methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple processes to handle independent requests concurrently.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/493/ for more details.
Build is green
Patch application report for D6339 (id=24327)
Could not rebase; Attempt merge onto 579c3bd35e...
Updating 579c3bd..da3e59b Fast-forward .gitignore | 4 +- docs/storage/remote.rst | 340 +++++++++++++++++++++ mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 463 ++++++++++++++++++++++++++++ swh/provenance/api/server.py | 646 ++++++++++++++++++++++++++++++++++++++- swh/provenance/cli.py | 31 +- swh/provenance/tests/conftest.py | 24 +- swh/provenance/util.py | 5 + tox.ini | 3 +- 13 files changed, 1517 insertions(+), 14 deletions(-) create mode 100644 docs/storage/remote.rst
Changes applied before test
commit da3e59b92c7964dde4d4afe047a42d466caa9893 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit ecaaeb2e57762f109ddb6f3a6e8529cd48c86085 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Write methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Read methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple sub-processes to handle independent requests concurrently.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/495/ for more details.
Build is green
Patch application report for D6339 (id=24330)
Could not rebase; Attempt merge onto 579c3bd35e...
Updating 579c3bd..5f1d1b1 Fast-forward .gitignore | 4 +- docs/storage/remote.rst | 720 +++++++++++++++++++++++++++++++++++++++ mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 463 +++++++++++++++++++++++++ swh/provenance/api/server.py | 646 ++++++++++++++++++++++++++++++++++- swh/provenance/cli.py | 31 +- swh/provenance/tests/conftest.py | 24 +- swh/provenance/util.py | 5 + tox.ini | 3 +- 13 files changed, 1897 insertions(+), 14 deletions(-) create mode 100644 docs/storage/remote.rst
Changes applied before test
commit 5f1d1b13504ee534f3726a8002d7d3a4609ac1c3 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit 84a62793d5de0b51aa7706b04a773366f5411391 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Write methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Read methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple sub-processes to handle independent requests concurrently.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/497/ for more details.
Build is green
Patch application report for D6339 (id=24359)
Could not rebase; Attempt merge onto 579c3bd35e...
Updating 579c3bd..4eecec5 Fast-forward .gitignore | 4 +- docs/storage/remote.rst | 720 +++++++++++++++++++++++++++++++++++++++ mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 466 +++++++++++++++++++++++++ swh/provenance/api/server.py | 666 +++++++++++++++++++++++++++++++++++- swh/provenance/cli.py | 31 +- swh/provenance/tests/conftest.py | 24 +- swh/provenance/util.py | 5 + tox.ini | 3 +- 13 files changed, 1920 insertions(+), 14 deletions(-) create mode 100644 docs/storage/remote.rst
Changes applied before test
commit 4eecec5edb9139e945912ff39c7d0f412d3c9977 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit f8411a522e937b961d7fc8270c653d83039268fd Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Write methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Read methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple sub-processes to handle independent requests concurrently.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/500/ for more details.
Build is green
Patch application report for D6339 (id=24367)
Could not rebase; Attempt merge onto 579c3bd35e...
Updating 579c3bd..e649205 Fast-forward .gitignore | 4 +- docs/storage/remote.rst | 724 +++++++++++++++++++++++++++++++++++++++ mypy.ini | 3 + pytest.ini | 2 + requirements-test.txt | 1 + requirements.txt | 1 + swh/provenance/__init__.py | 8 + swh/provenance/api/client.py | 466 +++++++++++++++++++++++++ swh/provenance/api/server.py | 666 ++++++++++++++++++++++++++++++++++- swh/provenance/cli.py | 31 +- swh/provenance/tests/conftest.py | 24 +- swh/provenance/util.py | 5 + tox.ini | 38 +- 13 files changed, 1959 insertions(+), 14 deletions(-) create mode 100644 docs/storage/remote.rst
Changes applied before test
commit e649205e258980cafbeb1db7d18f9c6efb1a8e76 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Tue Sep 21 16:13:53 2021 +0200 Add support for remote backend on existing storage tests commit a6cc3e4daf228ce9c124712b93c4749b16e65ce1 Author: Andres Ezequiel Viso <aeviso@softwareheritage.org> Date: Fri Aug 20 12:21:27 2021 +0200 Add new RabbitMQ-based client/server API Write methods in the `ProvenanceStorageInterface` are called through a server that guarantees conflict-free writings to the underlying database. Read methods are called directly from the client to avoid RCP overhead for reads. The server spawns multiple sub-processes to handle independent requests concurrently.
See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/505/ for more details.