The varnish logs should be also ingested to elasticsearch to have fine grained statistics.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 17 2020
- adapt the configuration to be able to test locally without interference with the other environments :
The /etc/hosts files of the vagrant vms are configured to declare local ips for the service they are using [1] . It's not a strong security but it works for the moment.
A strongest security will be put in place when the admin servers will be moved to the admin network, the network could be filtered to ensure such local vms can't interact with real production servers
The network configuration is done and the staging archive and deposit are now exposed publicly. The principal goal of the task is achieve.
The staging VMs could be moved to their dedicated hypervisor when it will be available, finally it's not a mandatory step for this task as we were able to use the existing hypervisors.
The metric are well ingested by prometheus and the hosts availability is checked by icinga.
A basic dashboard was created in grafana[1] with the following information for both firewall :
- uptime
- load
- memory stats
- partitions stats
- network traffic for each interface
Nov 16 2020
LGTM, it works in vagrant with the firewalls configuration :
==> pergamon: Notice: /Stage[main]/Profile::Icinga2::Master/File[/etc/icinga2/zones.d/master/pushkin.internal.softwareheritage.org.conf]/ensure: removed ==> pergamon: Info: /etc/icinga2/zones.d/master: Scheduling refresh of Class[Icinga2::Service]
I have created the diff for information but will land it quickly to fix the prometheus configuration ASAP.
fix formating
replace the lost lookup by an alias
rebase and remove unnecessary spaces
Nov 10 2020
use https
It fixes problems to reach the public ip from the internal network.
Feel free to land it if it looks good to you
- Add webui checks on icinga
- Rename the puppet class to something more generic as it's not only dedicated to prometheus configuration
rebase
This is a schema in complement of the previous ones. It represent a more network oriented interaction between the server and the firewall :
After double(at least) checking the routed on louvre is working well (the packets are not intercepted by the ip masquerade).
The problem was the DNAT rule on the firewall was not applied because the packets are not entering from the vtnet0 interface (they were simply lost). The DNAT rule was updated to be applied on the vtnet1 (VLAN440) and vtnet0 (VLAN1300) interfaces[1]. Pergamon can now reach the reverse proxy on ports 80/443
To solve the monitoring alerts [1], we tried to bypass the restriction between the VLAN210 and the VLAN1300 by adding a route between pergamon and VLAN1300 via the firewall (D4454).
The route is well created on pergamon but it seems to be ignored :
root@pergamon:~# traceroute 128.93.166.2 traceroute to 128.93.166.2 (128.93.166.2), 30 hops max, 60 byte packets 1 louvre.internal.softwareheritage.org (192.168.100.1) 0.185 ms * *
It's the same for other routes :
root@pergamon:~# traceroute 192.168.130.10 traceroute to 192.168.130.10 (192.168.130.10), 30 hops max, 60 byte packets 1 louvre.internal.softwareheritage.org (192.168.100.1) 0.168 ms * * 2 pushkin.internal.softwareheritage.org (192.168.100.129) 0.331 ms 0.316 ms 0.307 ms 3 pushkin.internal.softwareheritage.org (192.168.100.129) 0.426 ms 0.414 ms 0.400 ms
Fix indentation
A step was achieve in the configuration. The staging services are now accessible from the internet from these addresses :
- webapp : https://webapp.staging.swh.network
- deposit: https://deposit.staging.swh.network
Nov 9 2020
we don't need it because pergamon is not managing the first level of swh.network and declaring such entries avoid puppet to test and update the dns configuration as your paste P862 shows it.
Fix previous too enthusiastic commit
Use an alias for sentry entry to clarify the internal ip usage
remove wrong plural
Nov 6 2020
LGTM as a coauthor 😃
LGTM
LGTM
Nov 4 2020
The only remaining task is the monitoring / metrics gathering, it will be detailed on another dedicated task.
In T2721#52000, @vsellier wrote:after digging why the git configuration is not pushed, I have found in the git backup configuration [1] the plugins needs an 'configuration-changed` event to detect the updated.
Now an upgrade can be performed without interruption:
- On glyptotek (SLAVE), upgrade to the version 20.7.4 launched via the web ui
- Switch the master from pushkin to glyptotek via the web ui (Interfaces / Virtual Ips / Status => Enter Persistent CARP Maintenance Mode) on pushkin
- Everything seems to work well in glytotek in 20.7.4 so the operation can be repeated on pushkin
- Don't forget to disable the Maintenance Mode on both firewalls
Nov 3 2020
- glyptotek hostname reserved on the host naming page [1]
- pushkin vm cloned on proxmox and deployed on beaubourg for the ha (pushkin in running on branly)
- to be able to start the new instance without ip conflicts, the network devices have to be disconnected in the proxmox configuration
- the IPs were reconfigured in the text console via the menu available when the user root connect. This is the assignement :
Interface | IP |
---|---|
VLAN440 | 192.168.100.128 |
VLAN442 | 192.168.50.3 |
VLAN443 | 192.168.130.3 |
VLAN1300 | 128.93.166.4 |
- the Ha settings were configured on both firewalls to activate the synchronization of the states (menu System / High availability / settings) and the configuration, the peer ip was configured to reach fw2 from fw1 and respectively
- the master/slave switch via the the interface (Interfaces > Virtual IPs / Status -> Enter/Leave Persistent CARP Maintenance Mode) are ok, there is no packets lost between 2 servers (1 in VLAN440 and the other in VLAN443)
after digging why the git configuration is not pushed, I have found in the git backup configuration [1] the plugins needs an 'configuration-changed` event to detect the updated.
This event[2] was added on the version v20.7.4. The firewall is in the v20.7.3 which can explain why the full process is not working.
Netbox is up and used since several weeks now.
The backup is correctly configured:
root@bojimans:/etc/borgmatic# borgmatic info --archive latest borg@banco.internal.softwareheritage.org:/srv/borg/repositories/bojimans.internal.softwareheritage.org: Displaying summary info for archives Archive name: bojimans.internal.softwareheritage.org-2020-11-03T12:41:02.069548 Archive fingerprint: f8d0932e85043e61f59b21856a2cd871336d2b7e7a3e7d6e681cd4333f091581 Comment: Hostname: bojimans Username: root Time (start): Tue, 2020-11-03 12:41:03 Time (end): Tue, 2020-11-03 12:41:10 Duration: 7.19 seconds Number of files: 62391 Command line: /usr/bin/borg create --exclude-from /tmp/tmpo2f1n9xq --exclude-caches --exclude-if-present .nobackup 'borg@banco.internal.softwareheritage.org:/srv/borg/repositories/bojimans.internal.softwareheritage.org::bojimans.internal.softwareheritage.org-{now:%Y-%m-%dT%H:%M:%S.%f}' / Utilization of maximum supported archive size: 0% ------------------------------------------------------------------------------ Original size Compressed size Deduplicated size This archive: 1.84 GB 938.96 MB 2.12 MB All archives: 64.97 GB 32.95 GB 1.06 GB Unique chunks Total chunks Chunk index: 61324 2163683
root@bojimans:~# borgmatic mount --archive latest --mount-point /tmp/bck root@bojimans:/tmp/bck/opt# du --apparent-size -schP {/tmp/bck,}/opt/netbox* {/tmp/bck,}/var/lib/netbox {/tmp/bck,}/var/lib/postgresql/ 17 /tmp/bck/opt/netbox 141M /tmp/bck/opt/netbox-2.9.3 17 /opt/netbox 156M /opt/netbox-2.9.3 0 /tmp/bck/var/lib/netbox 16K /var/lib/netbox 75M /tmp/bck/var/lib/postgresql/ 75M /var/lib/postgresql/ 446M total
the difference of size return by `du` on the netbox directory seems due to the computation of the size on the fuse fs
root@bojimans:~# mount | grep /tmp/bck borgfs on /tmp/bck type fuse (ro,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions)
There is no visible differences on the 2 directories :
root@bojimans:~# diff -r {/tmp/bck,}/opt/netbox-2.9.3/ root@bojimans:~#