rebase + review comments
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 6 2020
split code blocks in 2 as requested by vlorentz
Nov 5 2020
It remains unclear to me how this diff addresses the original timeout problem. I see the beneficial effect of having a lazy loading mechanism and cache to reduce the load, but not to prevent the timeouts to occur in the first place (thus failing to fill the cache).
Nov 4 2020
I also think this external_identifier should go away, the spec is rich (aka complicated) enough without we adding some layers :-)
also note that making the slug a MUST (server-side) is not valid w.r.t. the specs ("The client MAY supply a Slug header")
same as the previous one, ok once green
I'm fine with this, can be landed as soon as tests are ok.
Any reason for not landing this? (@marmoute ? do you keep your request for changes?)
Nov 2 2020
Oct 30 2020
Oct 29 2020
Oct 28 2020
see comments
So the test fails on jenkins because they use the hg command from the system (since mercurial is oddly enough, not a dependency of swh-loader-mercurial) and on stretch, mercurial is 4.8
Using mercurial 5.5 is ok.
In D4354#108710, @moranegg wrote:Is this chapter saved anywhere? or is it so deprecated it shouldn't be saved?
rebase
Oct 27 2020
Define and use the SWH acronym
fixes and improvements suggested by maoranegg (big thx)
typos (thx ardumont)
I would expect the commit message to be a bot more explanatory: either this new test case adds some tested aspects that were not tested before, and it should mention it, or it does not, and it should also mention it explaining this new test is the base for futures extended ones in a more manageable way (what's the "updatable" stands for, if I get this right).
Oct 26 2020
should be ok now (even if via ImmutableDict :-) )
See also T2706
In D4082#107701, @vlorentz wrote:In D4082#107582, @douardda wrote:Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)
Why? This change won't break any code using KafkaJournalWriter
Oct 23 2020
closed by 2b869aa7d30d099ed6146d9f8dc667cd7a8eefc3
Also the commit message should give a bit more information on what this new script is needed for, maybe with a usage example.
This defines a bunch of commands. When and how should "I" use them?
please do not put the "depends on Dxxx" line in the git commit message.
ok on the diff itself, but why is this new example repo needed for? This should be explained in the commit message. (the "why"! always insist on the "why" rather than (or in addition to) the "what" in your commit messages, please.
ok but please properly document arguments in docstrings.
Oct 22 2020
globally ok, but please add a README file as suggested in the previous comment
Would be nice to have a README file in tests/data explaining what these json files are and how to produce them.
Wouldn't it make a bit easier to name the generic version of the journal writer something like GenericKafkaJournalWriter and have KafkaJournalWriter = GenericKafkaJournalWriter[BaseModel] ? (for bw compat)
This globally LGTM but there is this path encoding issue. The 2 new functions in from_disk.py should take a bytes argument instead of a str one.
Oct 21 2020
Oct 19 2020
Oct 16 2020
Same as before but with 1M (fresh) sha1s:
Since the results on uffizi above did suffer from a few caveats, I've made a few more tests:
- a first result has been obtained with a dataset that had only objects stored on the XFS part of the objstorage
- a second dataset has been created (with the order by sha256 part to spread the sha1s)
- but results are a mix hot/cold cache tests
Oct 15 2020
Some results:
Current benchmarck scenario:
Oct 14 2020
Oct 13 2020
I'm mostly OK with this now, so I'll make it "accepted", but please refactor a bit the cli_run_[n]ok() helper functions before landing it.