In D6165#163629, @vlorentz wrote:What is the reason for this change? Is it more efficient assign requests to workers based on ID rather than randomly?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Sep 22 2021
Sep 22 2021
Sep 21 2021
Sep 21 2021
some more :-)
douardda added inline comments to D6310: opam: Move the state initialization into the get_pages method.
Sep 20 2021
Sep 20 2021
LGTM, but how is the new opam_root option expected to be set (in production I mean)?
I'm not done yet but here is first review on my side.
douardda closed T1510: Have a look at openAPI and decide whether we want to follow these specs, a subtask of T1805: Public API v2, as Resolved.
not useful as a dedicated task, see T1805 for the main discussion one on this subject
douardda requested changes to D6300: Capture missing revision <-> hgnode-id scenario in a xfail test.
I don't understand what exactly is (not) tested here. What does "anomad-d" stand for BTW?
Sep 16 2021
Sep 16 2021
douardda added inline comments to D6281: converters: Recompute hashes and check they match the originals.
douardda committed rDENV57ad032071ff: docker: document some useful kafka management commands in the README file (authored by douardda).
docker: document some useful kafka management commands in the README file
douardda committed rDENVe24535cc0064: docker: wrap long cli command lines in the README file (authored by douardda).
docker: wrap long cli command lines in the README file
fix indentation (tab->ws) and a few typos
Sep 14 2021
Sep 14 2021
douardda committed rDENVb0f07795ddff: docker: Document how to consume kafka topics from the host (authored by douardda).
docker: Document how to consume kafka topics from the host
douardda committed rDENVf612427f663d: docker: allow kafka to be consumed from the host (authored by douardda).
docker: allow kafka to be consumed from the host
closed by 94be817f869409c64415b181824071d2998e33d5
closed by a3c1f39013bae1a6982140d51d8bb443dc1b5c9c
Keep port 5092 exposed on host
douardda committed rDDATASET94be817f8694: Commit kafka messages which offset has reach the high limit (authored by douardda).
Commit kafka messages which offset has reach the high limit
douardda committed rDDATASETa3c1f39013ba: Add a JournalClientOffsetRanges.unsubscribe() method (authored by douardda).
Add a JournalClientOffsetRanges.unsubscribe() method
Sep 13 2021
Sep 13 2021
Fix a missing f-string prefix
Add a bit of documentation in the README file on how to consume kafka from the host
It's not worth the trouble, and there is a better solution (server-side)
In D6234#161606, @vlorentz wrote:You could also add a command in swh-dataset's entrypoint.sh that calls whatever Kafka's script does
In D6234#161506, @vlorentz wrote:In D6234#161491, @douardda wrote:So either I kill this diff or it stays "intricate" with the setup of the consumer (so the whole journalprocessor.py)
Note: this feature is mainly useful for testing purpose IMHO, so I suppose it's not that critical to keep it, I just find it handy when "playing" with swh dataset export
Meh. How much easier does it make testing, compared to using Kafka's CLI (from the linked comment)?
rebase
in favor of D6247 because phab/arcanist won't let me update this later any more (sorry)
douardda committed rDDATASET358d84938d01: Reduce the size of the progress bar (authored by douardda).
Reduce the size of the progress bar
douardda committed rDDATASET47713ee38c94: Make sure the progress bar for the export reaches 100% (authored by douardda).
Make sure the progress bar for the export reaches 100%
douardda committed rDDATASET2760e322af7c: Simplify the lo/high partition offset computation (authored by douardda).
Simplify the lo/high partition offset computation
douardda committed rDDATASETd07b2a632256: Explicitly close the temporary kafka consumer in `get_offsets` (authored by douardda).
Explicitly close the temporary kafka consumer in `get_offsets`
douardda committed rDDATASETe47a3db1287b: Use proper signature for JournalClientOffsetRanges.process() (authored by douardda).
Use proper signature for JournalClientOffsetRanges.process()
attempt to trick phab/arcanist
rebase
Rebase (remove D6234 from dependencies)
In D6234#161331, @douardda wrote:In D6234#161233, @vlorentz wrote:Can we keep the reset stuff outside the journalprocessor.py logic? It's already complex enough
I'll give it a try
Sep 10 2021
Sep 10 2021
rebase, fix typos, squash revisions
rebase and fix --reset help messsage
rebase
Add an explicit "skipped" message if a nothin is to be consumed for a topic
douardda added inline comments to D6235: Commit kafka messages wich offset has reach the high limit.
In D6234#161233, @vlorentz wrote:Can we keep the reset stuff outside the journalprocessor.py logic? It's already complex enough
In D6235#161311, @vlorentz wrote:lags reported by cmak was completely inconsistent
only because you have a small dataset, right?
With a larger one, the last batch of each partition should have a negligeable size.
In D6235#161236, @vlorentz wrote:There's a bunch of typos in your commit/diff msg: "wich", "oef", "ony", "ALL offsets that needs to be", "stash" -> "squash"
this is necessary to ensure these messages are committed in kafka,
otherwise, since the (considered) empty partition is unsubscribed from,
it never gets committed in JournalClient.handle_messages() (since this
later only commit assigned partitions).Why is this a problem?
Sep 9 2021
Sep 9 2021
add forgotten revision: Reduce the size of the progress bar
Please use imperative style in the got commit message
https://chris.beams.io/posts/git-commit/
Sep 3 2021
Sep 3 2021
Sep 1 2021
Sep 1 2021
do we need the "list of forks" if we keep the "fork of what"? I mean these are the 2 ends of the fork relation, right?
Aug 30 2021
Aug 30 2021
yes the idea is to have a beefy enough machine to perform full-size experiments on, that can then be (part of) the production infrastructure dedicated to the provenance index.
Aug 13 2021
Aug 13 2021
douardda added inline comments to D5818: send-to-celery: Add more options to allow scheduling of edge case origins.
In D6084#157322, @aeviso wrote:For the fix of revision_get, there should be a test.
The test is coming later from @jayeshv mongodb branch.
Please don't mix fixes with codestyling/renaming revisions in a single diff, it makes the review much harder.
And we could also use zfs-backed thin provisionning for the / of workers to save storage space (and possibly help to ensure consistency of deployed workers... not extra convinced of this later point)
In T3444#68653, @vlorentz wrote:but that requires some more storage on hypervisors we currently don't have
Don't the hypervisors also serve as OSDs? We could just get a disk per hypervisor (partially?) out of the ceph cluster and use it for the workers' /tmp, or even their whole disk.
douardda accepted D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.
but anyway, it looks fine to me
douardda added inline comments to D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.
one other improvement may be to modify a bit the profile of the workers (to reduce the load on the ceph cluster):
- lower the replication factor for workers' volumes (or even use local storage, but that requires some more storage on hypervisors we currently don't have),
- (probably not very relevant but) stop having swap on workers (since this swap end up being on the ceph volume, so replicated etc.) (oh this has been done already, good)
Aug 12 2021
Aug 12 2021
douardda added inline comments to D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID.
In D6071#157080, @aeviso wrote:
- the use of newly introduced as_dict() methods seems unrelated here; unless I'm mistaken, the purpose if this change is better assertion reports by pytest on failure; if so, it should be presented as this in a dedicated revision
This method is only used for test purposes but it doesn't make sense without the refactoring (the complete HistoryGraph class was not even present prior to the refactoring),
Aug 11 2021
Aug 11 2021
A few remarks:
Aug 10 2021
Aug 10 2021
douardda updated the task description for T3085: Complete and updated copy of the archive on S3 (objects+graph).
douardda added a parent task for T1954: Up-to-date objstorage mirror on S3: T3477: Add alerting when the copy to S3 starts lagging.
douardda updated the task description for T3085: Complete and updated copy of the archive on S3 (objects+graph).
well this task should be closed, and a new subtask could be added for the alerting
unless I'm mistaken, this task can be closed now, it looks to have reached a steady state where the lag is near 0
Aug 9 2021
Aug 9 2021
Aug 6 2021
Aug 6 2021
I've been thinking a bit about the refactoring of the ProvenanceStorageServer as described in the doc, with a series of queues between the public API and the backend database.