this is the accumulation of changes that have been tested in production
on mmca and met over the past weeks. A lot of this has been pair-programmed, and
the tests still pass, so we're probably in good shape.
git log origin/master.. says:
commit 8f476d494b4aeab6e0cd6a7adb5f2bce095e8c60 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 13:37:12 2022 +0200 swhgraph: handle empty responses When the visit_edges response is empty, swh.graph.client generates an empty tuple, which can't be unpacked. Work around the issue. swh/provenance/swhgraph/archive.py | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) commit edf00f88894fb9cf407017944dc5cd751b012357 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 13:36:39 2022 +0200 Use proper signatures in journal_client We're always passing the provenance-internal object types, not those of swh.storage. swh/provenance/journal_client.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 08de80b680bdf008f9a1f45805f2d54a7a397549 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 11:26:01 2022 +0200 origin layer: retrieve multiple levels of revision history at once Replace `revision_get_parents` with `revision_get_some_outbound_edges`, which can optionally retrieve more levels of history than just a single one. This allows us to do way fewer queries on the swh.graph or swh.storage backend if the revision exists there. The swh.storage backend does limited recursion, so we still process the origin in multiple steps to fetch the whole history. swh/provenance/archive.py | 15 +++++---- swh/provenance/graph.py | 43 +++++++++++++------------- swh/provenance/interface.py | 10 +++--- swh/provenance/journal_client.py | 1 - swh/provenance/model.py | 20 +----------- swh/provenance/multiplexer/archive.py | 28 ++++++++++------- swh/provenance/origin.py | 26 ++++++---------- swh/provenance/postgresql/archive.py | 27 +++++++++------- swh/provenance/provenance.py | 28 ++++++++--------- swh/provenance/storage/archive.py | 12 ++++--- swh/provenance/swhgraph/archive.py | 23 ++++++++------ swh/provenance/tests/test_archive_interface.py | 29 ++++++++++------- 12 files changed, 130 insertions(+), 132 deletions(-) commit 68e1907e7f37863d732edcb6211be893df94b9c7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 11:03:12 2022 +0200 Appease pyright by ensuring target_type is bound swh/provenance/tests/test_archive_interface.py | 2 ++ 1 file changed, 2 insertions(+) commit d935abf431df5105fec8422e87eb5ee47d3c177a Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 11:01:37 2022 +0200 Rename origin.proceed_origin to origin.process_origin swh/provenance/origin.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) commit 2ac46f58346f7c3763f1263109885fea6797e155 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Aug 3 18:21:32 2022 +0200 multiplexer: add endpoint counts per backend swh/provenance/__init__.py | 6 ++- swh/provenance/multiplexer/archive.py | 61 +++++++++++++++++++------- swh/provenance/tests/test_archive_interface.py | 4 +- swh/provenance/tests/test_init.py | 6 ++- 4 files changed, 57 insertions(+), 20 deletions(-) commit 8d323c322df2bf9a429a1329de6c87636927df19 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:46:20 2022 +0200 journal client: only use the provenance context manager once The context manager for the provenance storage rabbitmq client doesn't like being used multiple times over the lifetime of a process. Only use it once in the cli of the journal client. swh/provenance/cli.py | 6 ++++-- swh/provenance/journal_client.py | 6 ++---- 2 files changed, 6 insertions(+), 6 deletions(-) commit f5f8555f8e3d8c72a5d51f4a10d0b761e74c97fe Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:45:27 2022 +0200 provenance: lower the cache thresholds Instead of flushing if any entry is over the threshold, flush when the cumulative count goes over. swh/provenance/provenance.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) commit 4b3de6177b4f2c5b45dede931004c719fdfb0f7d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:44:47 2022 +0200 revision: only trigger partial flushes when necessary swh/provenance/revision.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) commit 9c936c39779cdb42b0f8f1a40df23d2de3032dfb Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:43:13 2022 +0200 revision: sort batches by date, improve logging, add incremental flushing swh/provenance/revision.py | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) commit 5b66b98e62c50c5958936adcc3b0ab651fb2d279 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:40:59 2022 +0200 revision: capture datetime exceptions with sentry swh/provenance/journal_client.py | 3 +++ 1 file changed, 3 insertions(+) commit af09058f0a80aac79a4e477fb2f7bd9800e3603f Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:40:26 2022 +0200 revision: don't process revisions before the epoch swh/provenance/journal_client.py | 7 +++++++ 1 file changed, 7 insertions(+) commit 3473d4af62d85255845aafc1def6c591090062e7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:39:30 2022 +0200 revision: don't process revisions with unknown dates swh/provenance/journal_client.py | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) commit d7d0c3d876059abe6a1d60a6c38ed4245e1b58c9 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:34:21 2022 +0200 postgresql archive: add support for partially copied databases The incremental copy of the archive to mmca is not atomic: the directory table needs to be copied first, then the directory_entry_* tables need to be updated. This means that the client can view inconsistent entries, where the directory has been synced but not all the entry rows. We return an empty list when one of these bogus entries is detected. This allows smooth fallback to the main database through the multiplexer. swh/provenance/postgresql/archive.py | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) commit 95eb9622a00ce99d089bb9accdaed0bdbf1bdc37 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:33:28 2022 +0200 postgresql archive: don't use custom types The partial copy of the archive on mmca doesn't have them anyway. swh/provenance/postgresql/archive.py | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) commit 34a9a1ac220bfabdda26b243c79742bdab090d76 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:32:09 2022 +0200 Remove sneaky caches in the postgresql archive implementation mypy.ini | 3 --- requirements.txt | 1 - swh/provenance/postgresql/archive.py | 3 --- 3 files changed, 7 deletions(-) commit bae8f4afda455ca28e64e54f1c9c37c6af2214b6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:29:45 2022 +0200 rabbitmq: Extend timeouts for reception of acks The retry logic is not very refined, extending the timeouts makes more sense. swh/provenance/api/client.py | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) commit 1efc40c7917feaedfa1204b6e4e395d41530d14c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:28:31 2022 +0200 rabbitmq: close the consumer only after all acks are received This is not quite working but it seems to reduce issues on worker termination a bit. swh/provenance/api/client.py | 63 ++++++++++++++++++++++++++++---------------- 1 file changed, 41 insertions(+), 22 deletions(-) commit ef7cd991712e47a14d7877f726f427a9de22e545 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:14:58 2022 +0200 Improve logging in the API client and the revision layer swh/provenance/api/client.py | 39 +++++++++++++++++++++++---------------- swh/provenance/provenance.py | 2 +- swh/provenance/revision.py | 12 ++++++++++++ 3 files changed, 36 insertions(+), 17 deletions(-) commit 3edf3690258b9e61de5452967c6ee178120276e7 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 16:53:11 2022 +0200 Add systemd notification support mypy.ini | 3 +++ swh/provenance/cli.py | 15 +++++++++++++++ swh/provenance/journal_client.py | 9 +++++++++ 3 files changed, 27 insertions(+) commit 5cadb13de9eb27b309d2ada3df54dc86452785b3 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 16:54:27 2022 +0200 Try to avoid some circular imports swh/provenance/__init__.py | 2 +- swh/provenance/api/server.py | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) commit 98254d2e930f639c7b1fdb3c27f5eb2a668b857d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Aug 12 17:17:11 2022 +0200 blacken swhgraph/archive.py swh/provenance/swhgraph/archive.py | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)