See also T2706
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Oct 26 2020
In D4082#107701, @vlorentz wrote:In D4082#107582, @douardda wrote:Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)
Why? This change won't break any code using KafkaJournalWriter
Oct 23 2020
closed by 2b869aa7d30d099ed6146d9f8dc667cd7a8eefc3
Also the commit message should give a bit more information on what this new script is needed for, maybe with a usage example.
This defines a bunch of commands. When and how should "I" use them?
please do not put the "depends on Dxxx" line in the git commit message.
ok on the diff itself, but why is this new example repo needed for? This should be explained in the commit message. (the "why"! always insist on the "why" rather than (or in addition to) the "what" in your commit messages, please.
ok but please properly document arguments in docstrings.
Oct 22 2020
globally ok, but please add a README file as suggested in the previous comment
Would be nice to have a README file in tests/data explaining what these json files are and how to produce them.
Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)
This globally LGTM but there is this path encoding issue. The 2 new functions in from_disk.py should take a bytes argument instead of a str one.
Oct 21 2020
Oct 19 2020
Oct 16 2020
Same as before but with 1M (fresh) sha1s:
Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:
- a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
- a second dataset has been created (with the order by sha256 part to spread the sha1s)
- but results are a mix hot/cold cache tests
Oct 15 2020
Some results:
Current benchmarck scenario:
Oct 14 2020
Oct 13 2020
I'm mostly OK with this now, so I'll make it "accepted", but please refactor a bit the cli_run_[n]ok() helper functions before landing it.
In D4193#104860, @ardumont wrote:
Oct 9 2020
same
oh so much yes!
Oct 8 2020
Overall ok, but I would have preferred the renaming be in a dedicated revision, separated from type annotation fixes/additions.
otherwise fine with me
does this requires the plugin's entrypoint in swh.core be removed ? (eg. because of swh.core.pytest_plugin being loaded twice or something like that) or is it safe to apply and use with a swh.core that still declates its pytest_plugin an entrypoint?
In D4078#103969, @douardda wrote:In D4078#103968, @vlorentz wrote:yes, it would make sense for values. Do you want to open a task for that?
you read my mind :-)
Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).
In D4078#103941, @vlorentz wrote:In D4078#103938, @douardda wrote:maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.
I don't know, I just copied what we were already doing in swh-journal. Dicts have the nice property of being somewhat "self-documenting" though.
In D4078#103968, @vlorentz wrote:yes, it would make sense for values. Do you want to open a task for that?
In D4078#103941, @vlorentz wrote:In D4078#103938, @douardda wrote:maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.
I don't know, I just copied what we were already doing in swh-journal. Dicts have the nice property of being somewhat "self-documenting" though.
In D4078#103940, @douardda wrote:Also (most probably dumb idea, writing as it pops in my mind), wouldn't it make sense to add some kind of 'per-object class model version' in the key?
This would prevent compacting away old versions of objects. Is this something we want?
In D4194#103939, @vlorentz wrote:microsecond in postgres, millisecond in cassandra.
The split in 2 revisions is not mandatory, just sayin' for good measure.
looks good (did not even notice toDict() is not even a recursive method! so this dict_nodes really makes no sense at all).
In D4078#103938, @douardda wrote:maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.
maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.
dates are not unique (ie. multiple visits can share a date, and they
do in practice); and visit statuses already use visit ids in their
unique key.
In D4193#103804, @zack wrote:Thanks, even though this is a little bit disturbing discrepancy wrt swh-scanner exclusion mechanism,
can you please remove the "noise" added by arc in the commit message? And update it (still the previous option name in there).
Oct 7 2020
Oct 2 2020
Maybe starting a pad/hackmd document would be easier at this point?
In D4131#102303, @douardda wrote:this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying
this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying
In D3334#95217, @vlorentz wrote:@douardda ping?
Oct 1 2020
Sep 30 2020
Listed (oneshot full + recurring incremental) and loaded (as far as I can tell).
Sep 29 2020
(just updated my swh-env, now I see where this diff comes from :-) )
sure
LGTM (except the "rm -f ../$module.log " for which I am not convinced it's a good idea)
I've sent an email to the fsfe.
Sep 28 2020
Can this be closed now? What's missing? Adding a listing task?
Sep 25 2020
and with the pytest.ini hunk we don't need a (non working) dependency on swh.core[testing]
add a precision in the ci message for the pytest.ini hunk
In D4045#100032, @ardumont wrote:ah yeah, it'd be best to align indeed.
Sep 24 2020
crumbs everywhere
sure
In D4012#99525, @olasd wrote:I don't think the origin url and visit type should be sent in the task result; they're arguments of the task already.
If we want them logged by the worker when the task ends (which I agree would be useful), then we should improve logging on the worker/celery side to show some of the task arguments (for instance, if there's a "url" argument) instead / in addition of the task id.