Page MenuHomeSoftware Heritage

ftigeot (François Tigeot)
User

Projects (6)

User Details

User Since
Sep 6 2017, 1:06 PM (101 w, 3 d)

Recent Activity

Fri, Aug 9

ftigeot committed rSPSITEac753a023734: Archive counters exporter: optimize Prometheus queries (authored by ftigeot).
Archive counters exporter: optimize Prometheus queries
Fri, Aug 9, 6:11 PM
ftigeot closed T1949: Archive graphs: vertical axis scaling not optimal as Resolved.

The yaxis scale was explicitly forced to begin at zero.
Removing that constraint allows the graphs to scale and fill their allocated vertical space.

Fri, Aug 9, 2:35 PM
ftigeot added a comment to T1949: Archive graphs: vertical axis scaling not optimal.

When resizing the browser window or when loading the page the first time after having pasted its name in the URL bar, it is obvious the "Source files" data is used for all graphs.

Fri, Aug 9, 11:45 AM
ftigeot changed the status of T1949: Archive graphs: vertical axis scaling not optimal from Open to Work in Progress.

This could be caused by the Javascript framework used to create graphs, Flot -- https://www.flotcharts.org/

Fri, Aug 9, 10:40 AM
ftigeot closed T1544: archive graphs stopped being updated a while ago as Resolved.

Dedicated task created as T1949 .

Fri, Aug 9, 10:39 AM · Website, Web app
ftigeot triaged T1949: Archive graphs: vertical axis scaling not optimal as High priority.
Fri, Aug 9, 10:38 AM

Thu, Aug 8

ftigeot closed T1437: Rewrite the munin stats export for the website to use prometheus as Resolved.
Thu, Aug 8, 5:26 PM · System administration
ftigeot closed T1437: Rewrite the munin stats export for the website to use prometheus, a subtask of T1355: Move the object counter from munin to prometheus, as Resolved.
Thu, Aug 8, 5:26 PM · System administration

Wed, Aug 7

ftigeot added a comment to T1872: staging infra: New vlan.

Most of the relevant commits use the 192.168.128.0/24 address space.

Wed, Aug 7, 2:02 PM · Staff, System administration

Tue, Aug 6

ftigeot accepted D1820: defaults: Move gpg/certificate blocks to a dedicated config file.
Tue, Aug 6, 10:49 AM · System administration

Mon, Aug 5

ftigeot accepted D1815: pergamon: Add route to staging network to be able to check nodes.
Mon, Aug 5, 12:27 PM
ftigeot accepted D1808: staging: Rework output to display a summary on nodes.
Mon, Aug 5, 10:28 AM
ftigeot accepted D1807: staging: Add db0 node.
Mon, Aug 5, 10:27 AM · System administration

Fri, Aug 2

ftigeot accepted D1806: staging: Modularize node creation.

A bit too big to understand quickly but no real choice here.
Looks good.

Fri, Aug 2, 10:09 PM · System administration
ftigeot accepted D1805: stats_web: Remove proxy request to munin.
Fri, Aug 2, 4:42 PM
ftigeot accepted D1803: site.pp: Remove most desktops from puppet.
Fri, Aug 2, 3:23 PM

Thu, Aug 1

ftigeot accepted D1799: Docs: Update documentation to improve/clarify steps.
Thu, Aug 1, 12:18 PM
ftigeot accepted D1797: staging: Bootstrap infrastructure with the gateway node.

Looks good.

Thu, Aug 1, 11:45 AM
ftigeot accepted D1798: staging: Provision storage0 vm.
Thu, Aug 1, 10:40 AM
ftigeot accepted D1796: init-template: Explain how to bootstrap a debian template image.
Thu, Aug 1, 10:27 AM
ftigeot accepted D1795: README: Explain how to initialize and apply changes to infra.
Thu, Aug 1, 10:25 AM
ftigeot accepted D1794: terraform: Prepare the workstation tools.

Typo line 17:

  • # Install so that terrafor actually sees the plugin

+ # Install so that terraform actually sees the plugin

Thu, Aug 1, 9:49 AM

Wed, Jul 31

ftigeot accepted D1792: network: Allow to override the ups/downs route for the network.

Some route definitions look unnecessary and could be cleaned up in a second pass.

Wed, Jul 31, 4:24 PM · System administration

Tue, Jul 30

ftigeot committed rSPSITE9825a780069b: Archive counters: update export path (authored by ftigeot).
Archive counters: update export path
Tue, Jul 30, 2:40 PM
ftigeot committed rSPSITEb079718d907d: Archive counters: activate generation cron [3/3] (authored by ftigeot).
Archive counters: activate generation cron [3/3]
Tue, Jul 30, 2:14 PM
ftigeot committed rSPSITEe3f36a47b2cd: Archive counters: activate generation cron [2/2] (authored by ftigeot).
Archive counters: activate generation cron [2/2]
Tue, Jul 30, 2:06 PM
ftigeot committed rSPSITE291c9b16866c: Archive counters: activate generation cron (authored by ftigeot).
Archive counters: activate generation cron
Tue, Jul 30, 1:29 PM

Mon, Jul 29

ftigeot committed rSPSITE0680a4ff58dc: Archive counter exporter: deployment fix (authored by ftigeot).
Archive counter exporter: deployment fix
Mon, Jul 29, 4:18 PM
ftigeot committed rSPSITE3cd214878f68: Website: Use Prometheus data to export archive counters (authored by ftigeot).
Website: Use Prometheus data to export archive counters
Mon, Jul 29, 2:42 PM

Fri, Jul 26

ftigeot accepted D1776: puppet/master: Add clean up certificate script.

lgtm

Fri, Jul 26, 2:32 PM

Wed, Jul 24

ftigeot accepted D1767: Reference provenance page in annex behind basic auth.

This technically looks good but from a security point of view, why put the secret "private" and "provenance-index" directories in a publically accessible location ?

Wed, Jul 24, 5:48 PM · System administration, Staff
ftigeot changed the status of T1931: scheduler's cron cleanup error when filtering tasks to archive from Open to Work in Progress.

Fwiw, a manual connection to esnode1:9200 doesn't show this error

Wed, Jul 24, 4:36 PM · Scheduling utilities
ftigeot added a comment to T1437: Rewrite the munin stats export for the website to use prometheus.

Depending on Prometheus for all data is not a hard requirement.

Wed, Jul 24, 10:42 AM · System administration
ftigeot closed T792: Make the elasticsearch logging cluster actually a cluster as Resolved.

Removed T1017 Kafka subtask, it really has no relation to the Elasticsearch cluster being a true cluster or not.

Wed, Jul 24, 10:37 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot removed a parent task for T1017: Estimate for Kafka cluster specifications: T792: Make the elasticsearch logging cluster actually a cluster.
Wed, Jul 24, 10:35 AM · System administration
ftigeot removed a subtask for T792: Make the elasticsearch logging cluster actually a cluster: T1017: Estimate for Kafka cluster specifications.
Wed, Jul 24, 10:35 AM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot closed T1338: Change BBUs on orsay as Wontfix.

Hardware is too old / starting to fall apart for other reasons.
It would be more cost-effective to replace it.

Wed, Jul 24, 10:33 AM · System administration
ftigeot closed T1282: Revisit backups as Resolved.

Wrote backup tools documentation in T1372 .
No backup system changes wanted at this time.

Wed, Jul 24, 10:31 AM · System administration

Tue, Jul 23

ftigeot claimed T1437: Rewrite the munin stats export for the website to use prometheus.
Tue, Jul 23, 5:38 PM · System administration

Mon, Jul 22

ftigeot added a comment to T1338: Change BBUs on orsay.

Fwiw, I never got an answer from Dell on that topic.

Mon, Jul 22, 4:48 PM · System administration

Jul 18 2019

ftigeot added a comment to T1437: Rewrite the munin stats export for the website to use prometheus.

For the "March 2019 problem", the json output generated from the Prometheus API itself misses the more recent data points.

Jul 18 2019, 4:26 PM · System administration
ftigeot changed the status of T1437: Rewrite the munin stats export for the website to use prometheus from Open to Work in Progress.

Prometheus data has been exported to a json file similar to the format produced by the Muni/RRD based toolchain.
Results are visible on https://www-dev.softwareheritage.org/archive/
(vs https://www.softwareheritage.org/archive/ for original graphs)

Jul 18 2019, 4:10 PM · System administration
ftigeot changed the status of T1437: Rewrite the munin stats export for the website to use prometheus, a subtask of T1355: Move the object counter from munin to prometheus, from Open to Work in Progress.
Jul 18 2019, 4:10 PM · System administration

Jul 16 2019

ftigeot closed T1355: Move the object counter from munin to prometheus, a subtask of T1356: Kill munin, as Resolved.
Jul 16 2019, 3:20 PM · Sprint 2018 12, System administration
ftigeot closed T1355: Move the object counter from munin to prometheus as Resolved.

Even though it is not necessarily obvious, the object counter has been stored in Prometheus since December 2018.

Jul 16 2019, 3:20 PM · System administration
ftigeot closed T1882: Merge swh-sysadmin into existing swh-grafana-dashboards as Resolved.

Grafanalib dashboards to swh-grafana-dashboards in rTGRAee5d3074bf58 .

Jul 16 2019, 11:48 AM · System administrators
ftigeot committed rTGRAd8c4f1ead4cf: Grafanalib dashboards: Add cpu temperature (wip) (authored by ftigeot).
Grafanalib dashboards: Add cpu temperature (wip)
Jul 16 2019, 11:42 AM
ftigeot committed rTGRAee5d3074bf58: Import existing Grafanalib dashboards (authored by ftigeot).
Import existing Grafanalib dashboards
Jul 16 2019, 11:42 AM

Jul 8 2019

ftigeot closed T1854: Backup Postgres "secondary" cluster as Resolved.

A backup script has been added to the Puppet environment in e93781ef32836396008e28599bf02d412c2184d3 and 26a74ad2178568398de8cf448cd79ba8c5320232 .

Jul 8 2019, 4:44 PM · System administration
ftigeot committed R194:132245bfb826: Ansible VM deployment: Add disk and network information (authored by ftigeot).
Ansible VM deployment: Add disk and network information
Jul 8 2019, 3:44 PM
ftigeot committed R194:492d92fb0bde: First VM deployed (authored by ftigeot).
First VM deployed
Jul 8 2019, 3:44 PM
ftigeot committed R194:e8d6c770f84f: Ansible automation: Add a bootstrap playbook (authored by ftigeot).
Ansible automation: Add a bootstrap playbook
Jul 8 2019, 3:44 PM
ftigeot committed rSPSITE26a74ad21785: Puppet environment: Add a Postgres cluster backup script [2/2] (authored by ftigeot).
Puppet environment: Add a Postgres cluster backup script [2/2]
Jul 8 2019, 3:13 PM
ftigeot closed D1697: Puppet environment: Add a Postgres cluster backup script [2/2].
Jul 8 2019, 3:13 PM
Herald added a reviewer for D1697: Puppet environment: Add a Postgres cluster backup script [2/2]: Reviewers.
Jul 8 2019, 2:49 PM
ftigeot closed T1857: Backup MongoDB databases as Resolved.

Backup done, a full copy of the main MongoDB databases is now present on banco.

Jul 8 2019, 12:26 PM · System administration

Jul 5 2019

ftigeot committed rSPSITEe93781ef3283: Puppet environment: Add a Postgres cluster backup script (authored by ftigeot).
Puppet environment: Add a Postgres cluster backup script
Jul 5 2019, 10:27 AM

Jul 3 2019

ftigeot added a comment to T1857: Backup MongoDB databases.

Approximately 85% of the dump data has been copied so far.

Jul 3 2019, 2:57 PM · System administration

Jul 1 2019

ftigeot added a comment to T1857: Backup MongoDB databases.

Dumps of the six MongoDB databases have been created at the Paris office.
They are being copied to banco:/srv/storage/space/mongo_dumps at Rocquencourt.

Jul 1 2019, 4:40 PM · System administration
ftigeot committed rSPSITEeb7719a0e616: dar: Rename /srv/postgres-backups to db-backups (authored by ftigeot).
dar: Rename /srv/postgres-backups to db-backups
Jul 1 2019, 2:05 PM

Jun 28 2019

ftigeot changed the status of T1857: Backup MongoDB databases from Open to Work in Progress.
Jun 28 2019, 10:54 AM · System administration

Jun 27 2019

ftigeot closed T1372: Compare Rsnapshot / BorgBackup / Backuppc as Resolved.
Jun 27 2019, 10:15 AM · System administration
ftigeot closed T1372: Compare Rsnapshot / BorgBackup / Backuppc, a subtask of T1282: Revisit backups, as Resolved.
Jun 27 2019, 10:15 AM · System administration
ftigeot changed the status of T1854: Backup Postgres "secondary" cluster from Open to Work in Progress.
Jun 27 2019, 10:14 AM · System administration
ftigeot triaged T1854: Backup Postgres "secondary" cluster as Normal priority.
Jun 27 2019, 10:14 AM · System administration

Jun 25 2019

ftigeot committed rSPSITE6d8175abfeae: Puppet: make louvre a postgresql client [2/2] (authored by ftigeot).
Puppet: make louvre a postgresql client [2/2]
Jun 25 2019, 5:41 PM
ftigeot committed rSPSITEd3ae5ffded46: Puppet: make louvre a postgresql client (authored by ftigeot).
Puppet: make louvre a postgresql client
Jun 25 2019, 5:33 PM
ftigeot committed rSPSITE4a663e64952d: dar: exclude srv/postgres-backups from backups (authored by ftigeot).
dar: exclude srv/postgres-backups from backups
Jun 25 2019, 2:53 PM

Jun 24 2019

ftigeot closed T1501: Phantom device mapper volume usage in Proxmox: logical volume is used by another device as Resolved.

The solution to this problem is to first identify the partition devices and then remove them:
dmsetup remove ssd-vm--100--disk--2p1

Jun 24 2019, 4:53 PM · System administration
ftigeot added a comment to T1501: Phantom device mapper volume usage in Proxmox: logical volume is used by another device.

This behavior appears to be caused by partitions present on top of device mappers devices.
These partitions in turn are used to create other dm devices and these latest device keep an open reference to the base one.

Jun 24 2019, 4:50 PM · System administration

Jun 13 2019

ftigeot committed rSPSITEe1c567a05b72: pgbouncer: sowftwareheritage is now hosted on belvedere (authored by ftigeot).
pgbouncer: sowftwareheritage is now hosted on belvedere
Jun 13 2019, 9:04 PM

Jun 11 2019

ftigeot closed T1: Investigate NFS UID and GID mapping as Resolved.
Jun 11 2019, 2:40 PM · System administration
ftigeot added a parent task for T1: Investigate NFS UID and GID mapping: Unknown Object (Maniphest Task).
Jun 11 2019, 2:39 PM · System administration

Jun 6 2019

ftigeot added a comment to T1: Investigate NFS UID and GID mapping.

The reason of this behavior is Debian uses dynamic UIDs for most of its system users.

Jun 6 2019, 4:47 PM · System administration
ftigeot accepted D1555: pgbouncer: Deal with somerset/prado's instance.
Jun 6 2019, 4:19 PM
ftigeot accepted D1554: Puppetfile: pgbouncer: Use commit with puppet module fix.
Jun 6 2019, 2:55 PM
ftigeot accepted D1550: Add pgbouncer configuration.

Looks good for a first draft.

Jun 6 2019, 8:57 AM · System administration, Puppet recipes

May 29 2019

ftigeot committed rSPSITEdf42a3690522: data/defaults: Change db.internal CNAME target (authored by ftigeot).
data/defaults: Change db.internal CNAME target
May 29 2019, 4:28 PM

May 28 2019

ftigeot accepted D1514: Migrate services to use the new db machine belvedere.

Looks good to me.
Always using the fqdn belvedere.internal.softwareheritage.org would be more consistent though ;-)

May 28 2019, 8:59 AM

May 22 2019

ftigeot committed rSPSITE03cd3d9fe5c6: manifests: Add megacli profile to all database servers (authored by ftigeot).
manifests: Add megacli profile to all database servers
May 22 2019, 1:37 PM
ftigeot committed rSPSITEd39cf18925d9: manifests/site: add a new database server, belvedere (authored by ftigeot).
manifests/site: add a new database server, belvedere
May 22 2019, 11:47 AM

May 16 2019

ftigeot committed R188:a8c494488310: Import existing Grafanalib dashboards (authored by ftigeot).
Import existing Grafanalib dashboards
May 16 2019, 2:48 PM

May 14 2019

ftigeot added a comment to T1711: Create a testing environment.

We will use VMs running on the orsay.softwareinternal.org hypervisor for now.

May 14 2019, 3:26 PM · System administration
ftigeot triaged T1712: Create a separate testing network as Normal priority.
May 14 2019, 3:24 PM · System administration
ftigeot triaged T1711: Create a testing environment as Normal priority.
May 14 2019, 3:21 PM · System administration

May 13 2019

ftigeot removed a parent task for T792: Make the elasticsearch logging cluster actually a cluster: T986: Scheduler: Automate completed oneshot or disabled recurring tasks archival.
May 13 2019, 4:29 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot removed a subtask for T986: Scheduler: Automate completed oneshot or disabled recurring tasks archival: T792: Make the elasticsearch logging cluster actually a cluster.
May 13 2019, 4:29 PM · Scheduling utilities
ftigeot removed a subtask for T1005: webapp: Push logs to elasticsearch cluster: T792: Make the elasticsearch logging cluster actually a cluster.
May 13 2019, 4:28 PM · System administration, Web app
ftigeot removed a parent task for T792: Make the elasticsearch logging cluster actually a cluster: T1005: webapp: Push logs to elasticsearch cluster.
May 13 2019, 4:28 PM · System administration (Elasticsearch consolidation (W24/2018))
ftigeot removed a subtask for T1028: deposit: Push logs to elasticsearch: T792: Make the elasticsearch logging cluster actually a cluster.
May 13 2019, 4:26 PM · SWORD deposit
ftigeot removed a parent task for T792: Make the elasticsearch logging cluster actually a cluster: T1028: deposit: Push logs to elasticsearch.
May 13 2019, 4:26 PM · System administration (Elasticsearch consolidation (W24/2018))

Apr 30 2019

ftigeot triaged T1698: Make sure Grafana dashboards are backed up as High priority.
Apr 30 2019, 3:38 PM · Sprint 2018 12, System administration
ftigeot changed the status of T1697: Deploy Grafanalib-based dashboards with Puppet, a subtask of T1442: Replace Munin graphs with Grafana/Prometheus dashboards, from Open to Work in Progress.
Apr 30 2019, 3:37 PM · Sprint 2018 12, System administration
ftigeot changed the status of T1697: Deploy Grafanalib-based dashboards with Puppet from Open to Work in Progress.
Apr 30 2019, 3:37 PM · Sprint 2018 12, System administration
ftigeot triaged T1697: Deploy Grafanalib-based dashboards with Puppet as High priority.
Apr 30 2019, 3:37 PM · Sprint 2018 12, System administration
ftigeot added a comment to T1442: Replace Munin graphs with Grafana/Prometheus dashboards.

Grafanalib dashboards added to https://grafana.softwareheritage.org/ via the new provisioning mechanism of Grafana 5.x.
Fully automated provisioning is still a work-in-progress.

Apr 30 2019, 3:36 PM · Sprint 2018 12, System administration
ftigeot added a comment to T1442: Replace Munin graphs with Grafana/Prometheus dashboards.

Prometheus does not provide storage device statistics for Proxmox container-based hosts.
The data can be read from their parent machine dashboards though.

Apr 30 2019, 12:28 PM · Sprint 2018 12, System administration
ftigeot added a comment to T1372: Compare Rsnapshot / BorgBackup / Backuppc.

Some disk space usage statistics with ~= one month of snapshots

Apr 30 2019, 10:57 AM · System administration

Apr 25 2019

ftigeot closed T1007: Monitor nfs mount points on orangerie.internal.softwareheritage.org as Resolved.

Grafanalib based dashboards do not require special handling, the nfs filesystem on orangerie:/srv/softwareheritage is shown by default for example.

Apr 25 2019, 2:36 PM · System administration
ftigeot closed T791: Ship more logs to logstash/elasticsearch as Resolved.
Apr 25 2019, 1:33 PM · System administration