IMHO This diff should be squashed in D6165 (it's really part of the work adding the rabbitmq-based backend).
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Oct 5 2021
rebase
In D6334#166259, @douardda wrote:looks ok to me. Just one question, why do you need __future__.annotation?
Oct 4 2021
rebase
rebase
rebase
rebase
rebase
rebase
rebase
turn backend classes into context managers
rebase
rebase
squash with D6272
squash with D6273
Sep 30 2021
rebase
rebase
Sep 29 2021
rebase
rebase
rebase
move later in the commit history
rebase
rebase
rebase
rebase
move earlier in the commit history
rebase
rebase
rebase
rebase
rebase
rebase
rebase
Add open method and refactor storage classes initialization
Sep 28 2021
rebase
rebase
rebase
rebase
rebase
Sep 27 2021
rebase
fix commit message
fix commit message
rebase
split
Sep 24 2021
In D6165#164547, @olasd wrote:Thanks for this massive implementation work!
I still want to do a deeper dive in this code (and give others the chance to do so), but I think that before that, and now that bugs and wrinkles have been ironed out and this code seems to be working, we need a large pass of updating the docstrings to describe the actual behavior of the code.
I expect a lot of this is present inside the hedgedoc document, so you should try to land it as documentation at the same time as this code.
When reading this diff, I would like to find the following:
- a description of all threads and subprocesses (on the client and server side), as well as their associated workflows (who does what)
- a description of how RabbitMQ queues and exchanges are handled (the request queues, the response queues, the way the acknowledgements are managed)
- a description of how objects are serialised to be passed on to the queues
- a description of what queues feed to what server processes, and how the messages are "bundled" before being sent to the database
- a list of "tunables" (number of queues, batch sizes, timeouts, etc.) to watch out for
I would suggest documenting the "lifecycle" of the client and server threads/processes, for instance by writing a summarised list of all the methods that are called in sequence, on initialization of the classes, with how the callbacks mesh together.
When this lifecycle doc is available (centrally), I think most of the "boilerplate" documentation that's been pulled from the pika example code can go away (with a shorter reference to the full lifecycle documentation).
rebase
In D6273#164555, @olasd wrote:I would suggest squashing D6272 and this together to land them at the same time.
I think you can remove types-werkzeug from requirements-test.txt. I'm not sure you can drop the http extra from swh.core dependencies in requirements-swh.txt, as the serialization/deserialization scaffolding is still in use in the rabbitmq backend.
In D6334#164535, @olasd wrote:Thanks!
I still think that the postgres and mongodb close methods on ProvenanceStorage instances should be shutting down their respective database connections.
I remember that you didn't want to do that because currently the database connection is passed to the class opened already, which is at least consistent.
However, would it make sense to instead have the storage classes take connection parameters and handle connecting to the database themselves (and therefore having their close methods close the database connections)?
rebase
rebase
rebase
- Add new RabbitMQ-based client/server API
- Rework ProvenanceStorageRabbitMQWorker to handle connection loss
- Improve server/client shoutdown logic and error handling
rebase
Sep 23 2021
rebase
rebase
rebase
- Add support for remote backend on existing storage tests
In D6165#164092, @vlorentz wrote:It could be something like these:
- https://docs.softwareheritage.org/devel/architecture/metadata.html
- https://docs.softwareheritage.org/devel/architecture/overview.html
- https://docs.softwareheritage.org/devel/swh-model/data-model.html
You can start from your Hedgedoc document, remove the description of the current state, but keep the description of the new design and the rationale
rebase
rebase
rebase
In D6165#164089, @vlorentz wrote:In D6165#163810, @aeviso wrote:It's actually explained in the document:
Oh, sorry, I missed the hedgedoc link. I only looked in the repo and the diff's content.
Could you document the new design in this diff too, in the docs/ folder?
Sep 22 2021
In D6165#163784, @vlorentz wrote:What do you mean by assigning requests randomly?
The way gunicorn assigns requests to worker processes in the existing RPC server
How do you guarantee conflict resolution that way?
What conflict resolution? Sorry I didn't really follow swh-provenance's development, and I can't find this in the documentation
In D6165#163629, @vlorentz wrote:What is the reason for this change? Is it more efficient assign requests to workers based on ID rather than randomly?
In D6272#163627, @vlorentz wrote:Why the renaming? And it's an RPC API, not REST
In D6273#163625, @vlorentz wrote:Why?
Sep 20 2021
I'm not really convinced about adding this test, it essentially recreates situations that are already tested in the test_provenance_storage test (but there it is done for all backends at once).
Also, performing direct queries to the mongodb object breaks the abstraction layer which means any future refactoring will require reimplementing this tests.
I rather design test to be independent of the actual implementation, so that they check that the class behave as expected from a semantic point of view.