User Details
- User Since
- Sep 7 2015, 3:25 PM (285 w, 3 d)
- Roles
- Administrator
Today
rebase
add inline comment suggested by @vlorentz
Yesterday
After the full backfill:
The backfill is now complete, and the scheduler journal client has processed its backlog of messages.
Tue, Feb 23
Rather than the "it was completely broken" comment (which it wasn't really, afaict), could you say what really happened in this commit? (afaict, you've added the unescaping of path qualifiers, and you've fixed the serialization of lines qualifiers).
Very nice.
via D5127
I guess mypy properly checks that the type of the <SWHIDClass>.object_type attribute matches the type parameter of the _BaseSWHID base class, and we don't need to add a test for that?
This looks fine.
Ah, one more change I think of when looking at D5118: please add a test to check that all values of the SWHID_QUALIFIERS list correspond to an attribute of QualifiedSWHID
Thanks!
After checking with @ardumont I turned cron back on on worker[0-2].staging, and ran puppet on them.
Looks good overall but a small issue has been found.
Mon, Feb 22
I had an immediate resurgence of the issue when restarting the backfill of the missing chunks, so I hot-patched a retry on BufferErrors in swh.journal.writer.kafka. Glancing at sentry, we never had this issue in prod. The hot patch seems to work; I'll write tests on that and submit it as a diff tomorrow.
After landing rDSCHecab745a5f20, we got partway there, but we still had only 98m origins in the visit stats table.
Looks like all servers (in icinga) have recovered their time sync, except for worker{0,1,2} in staging (on which cron seems to be off?)
(yes, I cheated and Just Pushed the diff to get the changes to show up here...)
I've added a NTP dashboard to grafana from the data already collected by the prometheus node exporter.
In general I think we should migrate machines to the default ntp pool of our distributor ([0-3].debian.pool.ntp.org).
Wed, Feb 17
We need to make sure that other loaders really don't use the *args, **kwargs, but yeah, that looks fine.
Tue, Feb 16
Maybe it would make sense to consider putting the very small objects (e.g. those <= the min alloc size) into a 3 or 4-way mirrored pool instead of an erasure coded pool;
@zack, very good point about having a target for the "time to first byte when reading an object".
Here's the output of the following query, which computes exact aggregates for objects smaller than the size boundaries of the original quartiles:
@ardumont points out that the base PackageLoader doesn't inherit from BaseLoader, which explains the new (common) base class. I think the new class could just as well be next to BaseLoader, and doesn't warrant the introduction of a pattern module.
This is great, thanks!
Mon, Feb 15
Fri, Feb 12
So, there's still a few separate issues in this task which I'll try to spell out (at least for my sake) :
Thu, Feb 11
Tue, Feb 9
Perfect, thank you!
Mon, Feb 8
I'm completely stumped.
Wouldn't it make sense to add an evolved(**kwargs) method to our BaseModel objects (or an evolve function in swh.model.model) to wrap this operation?
I still find slightly unfortunate that this logic is separate from the one used for revisions/releases, but I guess it makes sense as the "input types" are so dissimilar.
Here's my understanding of the status of the migration to the next generation scheduler as of today:
Thu, Feb 4
Flip with D5018; Flush in the right order
Rebase and factor out the summary update function
This is a duplicate of T75, the history of which would probably be useful to take into account (I suspect it can be closed).
Wed, Feb 3
So I've hotpatched this in production, and it seems to have cleared the current blockage, which is nice.
After mulling this over with @zack, and looking at the starved worker logs for a while, I suspect that we're also being bitten by our (early, early) choice of using celery acks_late, which only acknowledges tasks when they're done: when a worker is OOM-killed, it will never send task acknowledgements to rabbitmq, which will keep re-sending it the tasks.