Prior to this commit, we "info" logged no meaningful information [1] about the
origin being loaded.
[1] The only information we have for now is what configuration file we loaded which
i don't find satisfying... D4011
Differential D4012
core.loader: Log information about origin currently being ingested ardumont on Sep 22 2020, 5:33 PM. Authored by
Details
Prior to this commit, we "info" logged no meaningful information [1] about the [1] The only information we have for now is what configuration file we loaded which tox
Diff Detail
Event TimelineComment Actions Build is green Patch application report for D4012 (id=14154)Rebasing onto 452fa224f9... Current branch diff-target is up to date. Changes applied before testcommit 45189cc35882d2238be8db5a7e965fe1f414e98c Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Sep 22 17:31:31 2020 +0200 core.loader: Log information about origin currently being ingested Prior to this commit, we "info" logged no meaningful information about the origin being loaded. With the following, we currently will have more context about the status, type and origin being loaded. This concerns the dvcs loaders for now (git, svn, mercurial) See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/301/ for more details.
Comment Actions Build is green Patch application report for D4012 (id=14157)Rebasing onto b5d607989b... First, rewinding head to replay your work on top of it... Fast-forwarded diff-target to base-revision-82-D4012. Changes applied before testSee https://jenkins.softwareheritage.org/job/DCORE/job/tests-on-diff/82/ for more details. Comment Actions Build is green Patch application report for D4012 (id=14158)Rebasing onto 452fa224f9... Current branch diff-target is up to date. Changes applied before testcommit b563bd610e92564eb74d1b8106fe09a07e8acc44 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Tue Sep 22 17:31:31 2020 +0200 core.loader: Log information about origin currently being ingested Prior to this commit, we "info" logged no meaningful information about the origin being loaded. With the following, we currently will have more context about the status, type and origin being loaded. This concerns the dvcs loaders for now (git, svn, mercurial) See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/302/ for more details. Comment Actions I don't think the origin url and visit type should be sent in the task result; they're arguments of the task already. If we want them logged by the worker when the task ends (which I agree would be useful), then we should improve logging on the worker/celery side to show some of the task arguments (for instance, if there's a "url" argument) instead / in addition of the task id.
Comment Actions
Yes, ok. Except that information is not immediately seen in logs output, or is it?
Yes, that sounds like the right place. Although, I don't really know what code To be clear, my main issue today, when I try to look through our logs to In my mind, the kibana information is not enough by itself, so i think i need
Comment Actions ok, i'll adapt
Comment Actions Simplify to just one log statement (the other improvment should go deeper in the stack) Comment Actions Build is green Patch application report for D4012 (id=14223)Rebasing onto 7ed1927b99... First, rewinding head to replay your work on top of it... Applying: core.loader: Log information about origin currently being ingested Changes applied before testcommit c5ce6ef3d34d234ea1cb5937a7345c6da2e5fda1 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Sep 24 16:02:46 2020 +0200 core.loader: Log information about origin currently being ingested Prior to this commit, we "info" logged no meaningful information about the origin being loaded. With the following commit, we will be able to tell what origin/visit-type is happening. See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/303/ for more details. Comment Actions Yeah, I totally get it. Our logging handler swh.core.logger.JournalHandler already knows how to pull some metadata from the celery tasks: https://forge.softwareheritage.org/source/swh-core/browse/master/swh/core/logger.py;v0.3.0$44-49 This data appears as extra fields in the logs handled by systemd-journald (you can use, for instance, journalctl -f SWH_TASK_NAME=swh.loader.git.tasks.UpdateGitRepository), and therefore end up as named fields in the messages stored in kibana (I guess the field name ends up lowercased there). rDCORE806f583 has dropped task arguments from the default systemd logging because:
I think it would work to hardcode a few task argument fields, now that we've normalized a lot of them:
I've started doing this and documenting it a little better, I'll submit a diff ASAP. I would prefer to have a hook in the celery logging infrastructure to add *explicit* extra fields to the logging calls (because doing it at the handler level makes this only happen with the systemd journald handler, that we have in prod, but it will not happen in other contexts), but I'm not even sure that's doable. Using the "ambient" current_task is a bit ugly but it works...
Comment Actions Adapt according to suggestion
Comment Actions Build is green Patch application report for D4012 (id=14257)Rebasing onto 7ed1927b99... Current branch diff-target is up to date. Changes applied before testcommit bdd29d41e0aed612da349efd10d80c6243e43c85 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Thu Sep 24 16:02:46 2020 +0200 core.loader: Log information about origin currently being ingested Prior to this commit, we "info" logged no meaningful information about the origin being loaded. With the following commit, we will be able to tell what origin/visit-type is happening. This also adds some extra assertion failure message to explicit why the self.origin and self.visit should be set and where to look if not. See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/304/ for more details. Comment Actions
Thanks a lot for clarifying that part! |