- User Since
- Sep 7 2015, 3:25 PM (319 w, 22 h)
Sent a summary of this discussion to the swh-devel list for input:
I think the main challenge here will be doing this in such a way that we don't have to do a fresh clone of swh-environment (and all associated repos) every time we build.
I would like us to conclude this discussion soon.
Fri, Oct 15
The permissions were missing for consumer groups, so no consumer could get started at all.
Looks sensible to me, thanks.
21:57 guest@softwareheritage => select count(distinct id) from revision_history where not exists (select 1 from revision where id=parent_id); count ─────── 2218 (1 ligne)
Thu, Oct 14
I've run alter system commands to bump these configuration variables in $DATADIR/postgresql.auto.conf, then ran a pg_reload_config():
The log is flooded with
2021-10-14 15:24:54.422 UTC  LOG: checkpoints are occurring too frequently (28 seconds apart) 2021-10-14 15:24:54.422 UTC  HINT: Consider increasing the configuration parameter "max_wal_size".
17:19:13 +olasd ╡ the postgresql tuning hasn't happened yet, afaict? effective_cache_size isn't set, and shared_buffers is tiny 17:19:46 ⤷ ╡ I'd bump shared_buffers to 128 GB and effective_cache_size to 256 GB, see where that gets you 17:20:19 ⤷ ╡ and probably maintenance_work_mem to something like 16 or 32 GB 17:20:54 ⤷ ╡ as well as random_page_cost to something lower like 1.5
Ah, now that I read through this again; would it make sense for the zookeeper server to be called using the CNAME instead of the host FQDN ?
Looks good, except for a missing new TLS certificate, I think.
I was thinking of something ad-hoc such as:
In SWHIDv2, instead of having a hardcoded "pointer to another revision" directory entry type, we could enable pointers to more generic "unresolved external entities". When possible, we should make these pointers compatible with the current ExtID table, so that users of the data can look the contents of the pointed objects up lazily.
(I've removed T3653 as parent as this is a somewhat longer term endeavour. Not the topological sorting itself, but making sure that (most) existing revisions aren't dangling, before we can use this topological guarantee)
Wed, Oct 13
Tue, Oct 12
Thanks for working on reducing the number of hypothesis fixtures!
Fri, Oct 8
Hmm, do we really want this to be open to the world with no authentication whatsoever? (which is what D6448 seems to be doing)
Fix revision -> release typo in release_add flush call
I'll split off the new buffer thresholds in a new diff. This diff now only contains the (small) improvements to the buffer/filter proxies
Thu, Oct 7
Ah, another question I've been thinking about: should we go back to existing visits of git repositories and give them a new, pruned snapshot? Our data model now allows it: we can just append a new final OriginVisitStatus pointing at a pruned snapshot.
rSPSITE6a233452cd48 fixed the prometheus node exporter.
Awesome, thanks for confirming this!
I'm asking this because using predictable origin-centric URLs is generally much more user friendly than having to use multiple APIs to look up the SWHID of a given object before being able to construct the URL, and one would have to always to dynamic API calls to generate the URL for browsing the "latest archival" of a given origin.
Just to be clear, you're looking to keep these URL working, but turn them into redirects over to swhid-centric URLs with context parameters (and drop the original view code from these URLs), correct?
While we're at it, we should probably be adding some thresholds in the buffer proxy for:
- cumulated length of messages for revisions and releases
- cumulated number of parents for revisions
(this also matches the fact that we've seen, on our main ingestion database, directory_add operations that would take multiple hours, and have knock-on effects on backups and replications because of the long-running insertion transactions)
So, after doing some more analysis of memory usage patterns on these edge case repositories, my suspicion is that the high memory usage is generally being caused by the loader processing batches of large directories, closely packed together, at the same time.
This should stay pending until we resolve the archiving policy discussion in T3627, so I'm marking it as such.
Wed, Oct 6
Looks fine (i.e. the identifiers DeprecationWarnings are gone in tox, except for one that gets triggered by some pytest internal assertion rewrite).
Rather than doing this, we should probably disable worker task events altogether (that is, run celery worker without the --events/--task-events flag)
This looks like an okay thing to do, but instead of only ignoring results (which would only cut down a third of the messages), we should probably be deactivating events completely on these workers.