Some flame graphs of storage was performed during the ingestion with 50 workers in //
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
All Stories
Oct 14 2021
Add the redirection setup as well
Oct 13 2021
+1 on dropping the / -> /devel/ redirect and have at / a landing page allowing to choose between the 3 bodies of documentation.
My script finished running on releases. Result: all 644k releases are recoverable (mostly just missing gpg signatures), except 75k whose origin does not exist anymore.
Some suggestions to attend to prior to merge but otherwise, great ;)
Yes, after a migration, the postgres version got upgraded.
It was missing the user role correctly configured in the db:
In D6458#167771, @olasd wrote:Yeah, sure, I don't have a problem with that.
In D6458#167737, @vlorentz wrote:@olasd Could you open a task, so anlambert can land this stack of diffs now before we discuss the next step?
Build is green
Build is green
Small discussion about a possible implementation of this:
16:49 <+olasd> ardumont: morane's point is not about having cross links to individual pages; it's about having an entry point and cross referencing the docs *instances* overall 16:49 <+olasd> intersphinx doesn't solve that ... 16:51 <+ardumont> for the main point, i recall we discussed a while back having an index page which would display the main doc instances (user, sysadm, devel) 16:51 <+moranegg> This is a good solution, if each page has a link to this parent page 16:51 <+ardumont> today we have a redirect from docs.s.o to docs.s.o/devel 16:52 <+ardumont> that may probably need to go away and have that main page instead
Rebase
Add missing tests_data parameter to snapshot_swhid fixture
In D6458#167702, @olasd wrote:Thanks for working on reducing the number of hypothesis fixtures!
I'm a bit concerned about the reproducibility of test results, given fixtures that pull random list elements, with no control on the sequence of test executions and on the seed of the python random module when the fixture is called. (Now that I've looked at swh.web.tests.data, I'm even more concerned :-))
I don't have an answer about "what to use?", unfortunately, except just going for exhaustive tests (i.e. running the test functions for *all* values of the origins in the test data set), which doesn't sound very compelling unless the size of the sample dataset is small, which doesn't look to be the case.
https://github.com/pytest-dev/pytest/issues/5463 has some background about concerns with respect to random seeding in tests.
Apart from that, I see that some of the function-level fixtures are doing ""heavy"" querying on the test data for information that is, in effect, static (e.g. the list of origins with more than two visits, etc.). I wonder if it would be possible to extract this logic to only run it once on initialization of the test data?
I initially wrote: we may want to initialize a single, module scoped seed_storage fixture with all data inserted, and make the storage fixture used by tests a function-scoped fixture which would clone this seed storage instance - I assume some tests have to *write* to the storage, so you can't just have one global read only storage fixture - but I now see that's what swh.web.tests.data does. Maybe _init_tests_data could be turned into that seed_storage module-scoped pytest fixture, instead of the current ad-hoc logic? This would also help us control the random seed used for generating the test data (allowing us to override it to reproduce test results)?
@olasd Could you open a task, so anlambert can land this stack of diffs now before we discuss the next step?
Oct 12 2021
Thanks for working on reducing the number of hypothesis fixtures!
Some runs with the fix:
It globally improves the stability of the benchmark by reducing the timeouts.
Build is green
forgotten print statement...
Remove an unnecessary linefeed
Build is green
create and lookup a Read Shard with a perfect hash
Build has FAILED
create and lookup a Read Shard with a perfect hash