As a test case, we've been asked to archive this small (for now) GitLab instance: https://gitlab.etsit.urjc.es/
We can easily find tons of other small public instances for further testing.
Wed, Feb 20
- runner: fine (no crash, no restart)
- listener: fine (same)
- workers: fine (same)
Tue, Feb 19
WIP as the new version has been deployed.
Let's see if the occurrences still occur.
I've done the upgrade on saatchi and restarted both listener and runner. I've removed the runner restart from the saatchi crontab.
I've pushed an updated kombu to our repository.
Sun, Feb 17
bunch of celery workers (loader*, lister*) indeed have a ConnectionResetError stacktrace (not necessarily the same):
Sat, Feb 16
As per our pair-programming yesterday, I think we reproduce this in production now (with the runner at least).
Thu, Feb 14
Thu, Feb 7
Also give a look at Dramatiq https://dramatiq.io/
Wed, Feb 6
Thu, Jan 31
As a minor improvement I suggest switching from nouns to verb, so: load-git, load-hg-archive, list-gitlab-full, etc. Rationale: "do this" is the semantics associated to a message.
+1 for this need, and +1 also to the initial draft by @ardumont.
That'd be great.
Tue, Jan 29
We should clean them up.
I agree we must be careful with not bloating swh.core, but the current subject, it really makes sense to me to put this basic db access wrapper as a core functionality.
Fri, Jan 25
Thu, Jan 24
I confirm that I do not see ConnectionResetError: [Errno 104] Connection reset by peer and BrokenPipeError: [Errno 32] Broken pipe so far in the runner logs with kombu from git's master.
Jan 23 2019
Jan 18 2019
Jan 17 2019
Jan 15 2019
Jan 14 2019
Jan 8 2019
Dec 18 2018
Tip: we can do something based on the collect_extra_debug_data defined here: https://github.com/ProgVal/Limnoria/blob/master/src/utils/python.py
Dec 14 2018