In T1414#55819, @vsellier wrote:Can this task be closed since the subject was addressed in T2620 ?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Jan 5 2021
Jan 5 2021
Can this task be closed since the subject was addressed in T2620 ?
Nov 17 2020
Nov 17 2020
anlambert closed T1768: Add end to end tests for the frontend part of swh-web, a subtask of T1411: reach a minimum of 80% SLOC coverage across all components, as Resolved.
Sep 22 2020
Sep 22 2020
olasd closed T1435: Improve swh-scheduler prometheus metrics, a subtask of T1408: More/better Metrics, as Resolved.
We've definitely improved on this (notably using proper hostnames for the instance label on prom metrics). I think we should make this task more actionable if we want to keep it open.
olasd closed T1438: Add labels to prometheus metrics to help queries, a subtask of T1408: More/better Metrics, as Resolved.
Sep 16 2020
Sep 16 2020
ardumont closed T1386: Refactor indexers' initialization step, a subtask of T1410: Kill implicit configuration: new configuration scheme, as Wontfix.
Sep 8 2020
Sep 8 2020
Sep 4 2020
Sep 4 2020
Wikimedia is using netbox as the source of trust in their infrastructure and puppet is configuring the facts from it. It's not exactly the same use case we want as we would like to have netbox automatically provisioned.
and their documentation : https://wikitech.wikimedia.org/wiki/Netbox
A docker-compose is available to easily test netbox : https://github.com/netbox-community/netbox-docker
This is the puppet configuration used at wikimedia : https://gerrit.wikimedia.org/r/c/operations/puppet/+/387880/
Feb 11 2020
Feb 11 2020
Netbox looks pretty nice as a full hardware/device inventory tool: https://netbox.readthedocs.io/en/stable/
Nov 27 2019
Nov 27 2019
Puppet changes added in 17b2b3041212aca9e0a9a35c510885de7bb78230.
Ideally the Debian package should now be added to the Software Heritage private repository.
Nov 26 2019
Nov 26 2019
ardumont closed T1425: refactor the loader stack for package managers, a subtask of T1418: Loaders, as Resolved.
ardumont closed T1389: Implement a base "package" loader for package managers, a subtask of T1425: refactor the loader stack for package managers, as Resolved.
Nov 25 2019
Nov 25 2019
Instructions to create Debian packages have been added in D2352.
Nov 24 2019
Nov 24 2019
AFAIU from last week work, munin is now gone
Nov 19 2019
Nov 19 2019
ftigeot changed the status of T1556: Document hardware architecture, a subtask of T1407: Internal documentation (meta task), from Open to Work in Progress.
Nov 8 2019
Nov 8 2019
ftigeot closed T1653: Prometheus rate functions considered unreliable, a subtask of T1356: Kill munin, as Wontfix.
No relevant problem has been reported with our dataset/usage of Prometheus. Closing.
Nov 6 2019
Nov 6 2019
ftigeot closed T1442: Replace Munin graphs with Grafana/Prometheus dashboards, a subtask of T1356: Kill munin, as Resolved.
I do not see any missing piece in the Grafana dashboard, the Munin graph service/VM can be shut down.
Any chance we can close this now?
Oct 1 2019
Oct 1 2019
ardumont changed the status of T1389: Implement a base "package" loader for package managers, a subtask of T1425: refactor the loader stack for package managers, from Open to Work in Progress.
Sep 6 2019
Sep 6 2019
ardumont updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
ardumont updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Sep 5 2019
Sep 5 2019
ardumont updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
ardumont updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Aug 29 2019
Aug 29 2019
anlambert updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
ardumont updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
ardumont updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
ardumont updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Aug 6 2019
Aug 6 2019
anlambert updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Aug 1 2019
Aug 1 2019
ardumont renamed T1413: swh-docker-dev: Refactor/improve provisionning step from Refactor/improve provisionning step to swh-docker-dev: Refactor/improve provisionning step.
Jul 16 2019
Jul 16 2019
ftigeot closed T1355: Move the object counter from munin to prometheus, a subtask of T1356: Kill munin, as Resolved.
Jul 11 2019
Jul 11 2019
zack updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Jun 12 2019
Jun 12 2019
The most recent update of the state of this task has shown a regression in the journal test coverage, which, per se, is not a big deal (just a few points). But it does raise the question of how, once we have attained whatever "minimum" coverage we are OK with, we monitor overtime that there is no regression. For instance, I think that code reviews should show to the reviewers how the submitted diff affects code coverage. Ideally, reviewers should be able to so if it has a net positive or negative effect on coverage, and take that into account in their review decisions. (Which is not to say we should never accept diffs that decrease code coverage—there might be reasons to do so. But it is a data point that would be useful for reviewers to see.)
zack updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Jun 6 2019
Jun 6 2019
May 25 2019
May 25 2019
zack renamed T1411: reach a minimum of 80% SLOC coverage across all components from at least 80% SLOC coverage in all components to reach a minimum of 80% SLOC coverage across all components.
only 3% to go in -lister and -core \o/
zack updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
these catch-all meta-tasks that will grow forever are not terribly useful, the individual tasks + their tasks should be enough
May 13 2019
May 13 2019
The grafana dashboards are stored in the postgresql database on pergamon, which is backed up through the full system backups.
olasd closed T1698: Make sure Grafana dashboards are backed up, a subtask of T1442: Replace Munin graphs with Grafana/Prometheus dashboards, as Resolved.
May 3 2019
May 3 2019
There was a config/deployment bug on both the hg and svn loaders. Both bugs have been fixed and the tasks are running fine now.
Reopening this as the first submitted save code now tasks for hg and svn did not get executed so far (see [1]).
Nevertheless, they have been scheduled so this looks like some extra workers configuration is needed in production.
May 2 2019
May 2 2019
Thanks... looks like the tasks have been properly scheduled, but they have
not been executed... some more polishing may be needed ...
@rdicosmo , the possibility to submit hg and svn origin types through the "Save code now" form has been deployed to production [1].
I have submitted one origin of each type to save. Let's see if the underlying scheduler tasks get correctly executed before spreading
the news to the wild.
anlambert closed T1419: hg/svn support in save code now as Resolved by committing rDWAPPS04b06d85c494: templates/origin-save.html: hg and svn origin types can now be saved.
we (@anlambert and I) will try to have this task closed ASAP (like today, if no big bad stopper arise in front of us)
Apr 30 2019
Apr 30 2019
ftigeot changed the status of T1697: Deploy Grafanalib-based dashboards with Puppet, a subtask of T1442: Replace Munin graphs with Grafana/Prometheus dashboards, from Open to Work in Progress.
ftigeot changed the status of T1697: Deploy Grafanalib-based dashboards with Puppet from Open to Work in Progress.
Grafanalib dashboards added to https://grafana.softwareheritage.org/ via the new provisioning mechanism of Grafana 5.x.
Fully automated provisioning is still a work-in-progress.
Prometheus does not provide storage device statistics for Proxmox container-based hosts.
The data can be read from their parent machine dashboards though.
Apr 19 2019
Apr 19 2019
Are there any blockers left? It would be really nice to roll this out in the very near future.
Apr 18 2019
Apr 18 2019
If we remove Munin before implementing missing graph replacements, we will lack a comparison base and possibly fail to discover bogus data.
Right now, Prometheus disk throughput and iops values are suspiciously low for example.
It is these graphs we are still missing.
Apr 16 2019
Apr 16 2019
Even though most/all of the Munin metrics are provided by Prometheus, Munin also provides graphs.
It is these graphs we are still missing.
Indeed it is the object of T1428. That's why I am a bit puzzled the work you have in progress does not simply target T1356. I was expecting some response to this very task in your grafanalib based code, which I did not find. So I was wondering if I missed something, that some data where still in munin only.
Wasn't that what T1428 was about ?
Apart from the list of pending packages, all commonly used Munin metrics should already have Prometheus equivalents.
In T1442#30575, @ftigeot wrote:When I asked where to put such work-in-progress, you suggested the snippets repository.
When I asked where to put such work-in-progress, you suggested the snippets repository.
In T1442#30564, @ftigeot wrote:Work-in-progress Grafanalib dashboards have been added to the https://forge.softwareheritage.org/source/snippets/ repository.
Apr 15 2019
Apr 15 2019
Work-in-progress Grafanalib dashboards have been added to the https://forge.softwareheritage.org/source/snippets/ repository.
Apr 13 2019
Apr 13 2019
(typo) nothing interesting, moving along.
ardumont closed T1459: docker container for swh-deposit, a subtask of T1443: Make swh services run within docker and docker-compose, as Resolved.
Apr 12 2019
Apr 12 2019
ardumont added a parent task for T1459: docker container for swh-deposit: T1581: Deposit: improvements.
Related D1411
Related rCDFD77f4b2e0617be57282e5ab4a972f7a643768e668
Related rCDFDf68bb33b1493d7ef46471e3162abd03e6d6b0021
Related rCDFD2f87f477046fc82b7ce4b6fede712c051e943c14
Apr 2 2019
Apr 2 2019
anlambert closed T1379: npm loader, a subtask of T1425: refactor the loader stack for package managers, as Resolved.
Mar 26 2019
Mar 26 2019
ardumont updated the task description for T1411: reach a minimum of 80% SLOC coverage across all components.
Mar 25 2019
Mar 25 2019
douardda closed T1405: Make it easy to run a complete swh instance, a subtask of T1413: swh-docker-dev: Refactor/improve provisionning step, as Resolved.
Let's call it done, event if the small dataset part has not been addressed.
douardda closed T1443: Make swh services run within docker and docker-compose, a subtask of T1405: Make it easy to run a complete swh instance, as Resolved.
Let's call it done, some minor parts may still need a bit of attention thou
douardda updated the task description for T1443: Make swh services run within docker and docker-compose.
Consider this done. Even if remains a background task.
Mar 20 2019
Mar 20 2019
ftigeot closed T1428: Create an inventory of useful Munin metrics, a subtask of T1408: More/better Metrics, as Resolved.
Already marked as done on 2018-12-19.
ftigeot closed T1428: Create an inventory of useful Munin metrics, a subtask of T1356: Kill munin, as Resolved.
Mar 11 2019
Mar 11 2019
This gargantuan query is now used on a grafana dashboard : https://grafana.softwareheritage.org/d/-lJ73Ujiz/scheduler-task-status
Mar 8 2019
Mar 8 2019
with task_count_by_bucket as ( -- get the count of tasks by delay bucket. Tasks are grouped by their -- characteristics (type, status, policy, priority, current interval), -- then by delay buckets that are 1 hour wide between -24 and +24 hours, -- and 1 day wide outside of this range. -- A positive delay means the task execution is late wrt scheduling. select "type", status, "policy", priority, current_interval, ( -- select the bucket widths case when delay between - 24 * 3600 and 24 * 3600 then (ceil(delay / 3600)::bigint) * 3600 else (ceil(delay / (24 * 3600))::bigint) * 24 * 3600 end ) as delay_bucket, count(*) from task join lateral ( -- this is where the "positive = late" convention is set select extract(epoch from (now() - next_run)) as delay ) as d on true group by "type", status, "policy", priority, current_interval, delay_bucket order by "type", status, "policy", priority, current_interval, delay_bucket ), delay_bounds as ( -- get the minimum and maximum delay bucket for each task group. This will -- let us generate all the buckets, even the empty ones in the next CTE. select "type", status, "policy", priority, current_interval, min(delay_bucket) as min, max(delay_bucket) as max from task_count_by_bucket group by "type", status, "policy", priority, current_interval ), task_buckets as ( -- Generate all time buckets for all categories. select "type", status, "policy", priority, current_interval, delay_bucket from delay_bounds join lateral ( -- 1 hour buckets select generate_series(- 23, 23) * 3600 as delay_bucket union -- 1 day buckets. The "- 1" is used to make sure we generate an empty -- bucket as lowest delay bucket, so prometheus quantile calculations -- stay accurate select generate_series(min / (24 * 3600) - 1, max / (24 * 3600)) * 24 * 3600 as delay_bucket ) as buckets on true ), task_count_for_all_buckets as ( -- This join merges the non-empty buckets (task_count_by_bucket) with -- the full list of buckets (task_buckets). -- The join clause can't use the "using (x, y, z)" syntax, as it uses -- equality and priority and current_interval can be null. This also -- forces us to label all the fields in the select. Ugh. select task_buckets."type", task_buckets.status, task_buckets."policy", task_buckets.priority, task_buckets.current_interval, task_buckets.delay_bucket, coalesce(count, 0) as count -- make sure empty buckets have a 0 count instead of null from task_buckets left join task_count_by_bucket on task_count_by_bucket."type" = task_buckets."type" and task_count_by_bucket.status = task_buckets.status and task_count_by_bucket. "policy" = task_buckets."policy" and task_count_by_bucket.priority is not distinct from task_buckets.priority and task_count_by_bucket.current_interval is not distinct from task_buckets.current_interval and task_count_by_bucket.delay_bucket = task_buckets.delay_bucket ), cumulative_buckets as ( -- Prometheus wants cumulative histograms: for each bucket, the value -- needs to be the total of all measurements below the given value (this -- allows downsampling by just throwing away some buckets). We use the -- "sum over partition" window function to compute this. -- Prometheus also expects a "+Inf" bucket for the total count. We -- generate it with a null lt value so we can sort it after the rest of -- the buckets.
Mar 6 2019
Mar 6 2019
Full prometheus histogram-compatible query:
Mar 5 2019
Mar 5 2019
Current status of the SQL query:
Feb 28 2019
Feb 28 2019
@douardda I wish to work on this patch. Can you explain to me please what do I have to do in this?
vlorentz renamed T1407: Internal documentation (meta task) from Internal documentation to Internal documentation (meta task).
Feb 27 2019
Feb 27 2019
\o/
Feb 26 2019
Feb 26 2019
All storage mocks are now removed in every swh modules, so closing this.
anlambert closed T1307: Remove mock storages used in tests., a subtask of T1421: drop swh-storage mocking everywhere, as Resolved.