globally ok, but please add a README file as suggested in the previous comment
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Oct 23 2020
Oct 22 2020
Would be nice to have a README file in tests/data explaining what these json files are and how to produce them.
Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)
This globally LGTM but there is this path encoding issue. The 2 new functions in from_disk.py should take a bytes argument instead of a str one.
Oct 21 2020
Oct 19 2020
Oct 16 2020
Same as before but with 1M (fresh) sha1s:
Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:
- a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
- a second dataset has been created (with the order by sha256 part to spread the sha1s)
- but results are a mix hot/cold cache tests
Oct 15 2020
Some results:
Current benchmarck scenario:
Oct 14 2020
Oct 13 2020
I'm mostly OK with this now, so I'll make it "accepted", but please refactor a bit the cli_run_[n]ok() helper functions before landing it.
In D4193#104860, @ardumont wrote:
Oct 9 2020
same
oh so much yes!
Oct 8 2020
Overall ok, but I would have preferred the renaming be in a dedicated revision, separated from type annotation fixes/additions.
otherwise fine with me
does this requires the plugin's entrypoint in swh.core be removed ? (eg. because of swh.core.pytest_plugin being loaded twice or something like that) or is it safe to apply and use with a swh.core that still declates its pytest_plugin an entrypoint?
In D4078#103969, @douardda wrote:In D4078#103968, @vlorentz wrote:yes, it would make sense for values. Do you want to open a task for that?
you read my mind :-)
Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).
In D4078#103941, @vlorentz wrote:In D4078#103938, @douardda wrote:maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.
I don't know, I just copied what we were already doing in swh-journal. Dicts have the nice property of being somewhat "self-documenting" though.
In D4078#103968, @vlorentz wrote:yes, it would make sense for values. Do you want to open a task for that?
In D4078#103941, @vlorentz wrote:In D4078#103938, @douardda wrote:maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.
I don't know, I just copied what we were already doing in swh-journal. Dicts have the nice property of being somewhat "self-documenting" though.
In D4078#103940, @douardda wrote:Also (most probably dumb idea, writing as it pops in my mind), wouldn't it make sense to add some kind of 'per-object class model version' in the key?
This would prevent compacting away old versions of objects. Is this something we want?
In D4194#103939, @vlorentz wrote:microsecond in postgres, millisecond in cassandra.
The split in 2 revisions is not mandatory, just sayin' for good measure.
looks good (did not even notice toDict() is not even a recursive method! so this dict_nodes really makes no sense at all).
In D4078#103938, @douardda wrote:maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.
maybe stupid question, but why using dict as unique key (in many model classes)? Why not use a tuple? I mean it seems to me that such a UID should be usable as dict keys or in a set directly.
dates are not unique (ie. multiple visits can share a date, and they
do in practice); and visit statuses already use visit ids in their
unique key.
In D4193#103804, @zack wrote:Thanks, even though this is a little bit disturbing discrepancy wrt swh-scanner exclusion mechanism,
can you please remove the "noise" added by arc in the commit message? And update it (still the previous option name in there).
Oct 7 2020
Oct 2 2020
Maybe starting a pad/hackmd document would be easier at this point?
In D4131#102303, @douardda wrote:this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying
this is debatable, but it does "normalize" the given url, so it does something. I agress the https:// auto-add prefix is strange, but the trailing / still brings value. For example there are listers that do not implement this, so if you create a listing task with url=https://somehere.org/api/v1 it will fail because it will forge invalid urls (missing the trailing /).
[edit] and I find this very annoying
In D3334#95217, @vlorentz wrote:@douardda ping?
Oct 1 2020
Sep 30 2020
Listed (oneshot full + recurring incremental) and loaded (as far as I can tell).
Sep 29 2020
(just updated my swh-env, now I see where this diff comes from :-) )
sure
LGTM (except the "rm -f ../$module.log " for which I am not convinced it's a good idea)
I've sent an email to the fsfe.
Sep 28 2020
Can this be closed now? What's missing? Adding a listing task?
Sep 25 2020
and with the pytest.ini hunk we don't need a (non working) dependency on swh.core[testing]
add a precision in the ci message for the pytest.ini hunk
In D4045#100032, @ardumont wrote:ah yeah, it'd be best to align indeed.
Sep 24 2020
crumbs everywhere
sure
In D4012#99525, @olasd wrote:I don't think the origin url and visit type should be sent in the task result; they're arguments of the task already.
If we want them logged by the worker when the task ends (which I agree would be useful), then we should improve logging on the worker/celery side to show some of the task arguments (for instance, if there's a "url" argument) instead / in addition of the task id.
This is probably mostly deprecated now we have mypy & al. Also the reporting via warnings-ng-plugin may not be such a priority now.
I'd like to close this task, but unfortunately:
I guess we can say that, let's close this.
I guess this can be closed now
thx a lot
sure go