Page MenuHomeSoftware Heritage

Install and configure a firewall for the staging environment
Closed, MigratedEdits Locked

Description

The new firewall will be deployed and configured for the staging environment.
It will allow to validate the behavior with a non critical environment and will be extended progressively to other environments (admin, production, public ips)

  • At the beginning, a single firewall will be used but to prepare a more robust deployment with active/passive instances when the production will enter in the game, the FW will expose virtual IPs as gateway address.
  • The internet gateway will be configured to use the new SWH public VLAN (1300) to reach internet
  • Some legacy services will be used (DNS server, ...(?))
  • A new route must be declared on the current gateway to be able to reach the new staging network (192.168.130.0/24) from the VPN
  • The network configuration of the current staging server must be updated to change the network from 192.168.128.XXX to 192.168.130.XXX

This is the identified configuration of the FW before the confrontation with the real world:

  • Interfaces :
InterfaceNetworkIPDescription
WANVLAN1300128.93.166.3Public network's interface
LAN1VLAN440192.168.100.129Production network's interface
LAN2VLAN443192.168.130.2New staging network's interface
LAN3VLAN 442192.168.50.2Future admin network's interface (not used in this poc)
  • Virtual IPs
NetworkIpcomment
WAN128.163.166.2
WAN128.163.166.9NAT to staging RP ?
LAN1192.168.100.130
LAN2192.168.130.1
LAN3192.168.50.1
  • Pseudo FW rules identified :

By default, OPNsense allows outbound connections and block inbound connections.
The explicit rules blocking internal network communication should be configured to reject the packet instead of blocking it and have to wait for the network timeout.

Legend:
A <- B : inbound connection from B to A
A -> B : outbound connection from A to B

RuleAction
staging/ICMP <- productionAllowed
production/ICMP <- stagingAllowed (as long as the production network is used by the admin tools)
production/DNS <- stagingAllowed
production/puppet master <- staging networkAllowed
production/icinga <- stagingAllowed
staging/prometheus <- productionAllowed
production/logstash <- stagingAllowed
staging/RP(80,443) <- public/web(NAT?)Allowed
production/sentry(9000) <- stagingAllowed
staging/*ssh <- productionallowed
wan/RP IP(web) <- productionAllowed (monitoring/access from the VPN)
production/keycloak(web) <- staging/webappAllowed (SSO)
production/ES <- staging/depositAllowed(Really needed ?)
production/borg(ssh) <- stagingAllowed(backups)
  • connections to internet are allowed by the default outbound allowed rule
  • smtp connections to smtp.inria.fr are included in the default outbound rules
  • the rules to expose the kafka cluster of the staging environment are not listed here as this cluster doesn't exist yet

Event Timeline

vsellier changed the task status from Open to Work in Progress.Oct 19 2020, 7:23 PM
vsellier triaged this task as Normal priority.
vsellier created this task.

The firewall was installed with an iso image OPNsense-20.7-OpenSSL-dvd-amd64.iso uploaded on the ceph-proxmox storage

  • After the boot, on the console, log with the user installer to start the installation on disk
  • Just before the reboot, remove the iso on the proxmox configuration
  • Wait for the prompt on the console
  • select 1) assign interfaces (the following response will depend of the interface assignment on proxmox)
    • WAN: vtnet0
    • LAN: <empty>
    • OPT1: vtnet1
    • OPT2: vtnet2
    • OPT#: vtnet3
  • Select 2) set interface IP address
    • for each interface disable dhcp, enter the ip address according the array on the task description, declare the gateway for the vtnet0 (VLAN1300) and vtnet1 (VLAN440)
  • Select 8) Shell to open a terminal. The firewall must be deactivated until the basic rules are declared. Enter the following command:
pfctl -d

At this step the console is reachable at the VLAN440's address (https://192.168.100.129)

  • connect to the interface with the default root credentials (root / opnsense)
  • go to Lobby/password and change the root password
  • In interfaces menu, select each interface and update the description
    • vtnet0 -> VLAN1300
    • vtnet1 -> VLAN440
    • vtnet2 -> VLAN443
    • vtnet3 -> VLAN442
  • go to system/settings/administration
    • Change WebGUI listen interface to VLAN440 and VLAN442
    • Activate SSH, check permit root login, Permit root password and change listen interface to VLAN440 and VLAN442
  • go to system/setting/general
    • change the hostname to pushkin
    • change the domain to swh.network
  • go to System / route / configuration
    • Add a new static route for the VPN users: Network: 192.168.101.0/24 / Gateway: 192.168.100.1 / Description: VPN gateway
  • go to System / Gateways / Single
    • Edit the GW_WAN interface:
      • check upstrean gateway
      • change pritority to 254 to select this gateway by default

Default firewall rules:

  • In Firewal / Aliases:
    • Create a new alias firewall_ips / type Host(s) / content 192.168.100.130, 192.168.100.129
    • Create a new alias webserver_ports / type Port(s) / content 80, 443
  • in **Firewall/Rules`, select VLAN440
    • Add a new Allow rule with: Destination: firewall_ips / Port: ssh / Description Allow ssh to the firewall
    • Add a new Allow rull with: Destination: firewall_ips / Port: webserver_ports / Description Allow access to the firewall admin UI

Go to Power / Reboot and reboot the firewall to be sure everything is ok with the firewall activated

VIPs configuration

On the FW UI, go to Interfaces / Virtual IPs / Settings
Add the following Virtual IPs :

  1. Mode CARP / interface VLAN440 / Address: 192.168.100.130/24 / Virtual IP Password: not significant / VHID Group : 1 / Description: VLAN440 gw wip
  2. Mode CARP / interface VLAN442 / Address: 192.168.50.1/24 / Virtual IP Password: not significant / VHID Group: 2 / Description: VLAN442 fw wip
  3. Mode CARP / interface: VLAN443 / Address: 192.168.130.1/24 / Virtual IP Password: not significant / VHID Group: 3/ Description: VLAN443 fw wip
  4. Mode CARP / interface: VLAN1300 / Address: 128.93.166.2/26 / Virtual IP Password: not significant / VHID Group: 4 / Description: VLAN1300 fw wip

Check the status of the vip to ensure the status is MASTER for each:

  • Nat rules configured :

Some rules needs to be declared to be able to reach the new networks through the firewall.

I didn't found anything on puppet related to the current routes on louvre, so I suppose it's managed manually. I think these actions should be enough but I need your validation to be sure I'm on the right path:

  • Manually create the new routes:
# New staging network
route add -net 192.168.130.0 netmask 255.255.255.0 gw 192.168.100.130
# New admin network
route add -net 192.168.50.0 netmask 255.255.255.0 gw 192.168.100.130
  • Make them persistent :
--- /etc/network/interfaces	2020-09-22 16:03:23.188012147 +0000
+++ /etc/network/interfaces	2020-10-20 21:48:56.128948032 +0000
@@ -19,6 +19,8 @@
 	address  192.168.100.1
 	netmask  255.255.255.0
 	post-up ip route add 192.168.128.0/24 via 192.168.100.125
+	post-up ip route add 192.168.130.0/24 via 192.168.100.130
+	post-up ip route add 192.168.50.0/24 via 192.168.100.130
 
 auto ens19
 iface ens19 inet static
  • Configure the new routes for the vpn:
--- /etc/openvpn/louvre.conf	2019-12-04 16:50:54.405960228 +0000
+++ /etc/openvpn/louvre.conf	2020-10-20 21:45:50.733070550 +0000
@@ -117,6 +117,8 @@
 push "route 192.168.100.0 255.255.255.0"
 push "route 192.168.200.0 255.255.248.0"
 push "route 192.168.128.0 255.255.255.0"
+push "route 192.168.130.0 255.255.255.0"
+push "route 192.168.50.0 255.255.255.0"
 
 # To assign specific IP addresses to specific
 # clients or if a connecting client has a private

@olasd Please let me know if it's ok to proceed.

(Some diffs will follow for the puppet configuration)

That routes change looks fine.

  • Route manually declared on louvre:
root@louvre:~# ip route add 192.168.130.0/24 via 192.168.100.130 dev ens18
root@louvre:~# ip route add 192.168.50.0/24 via 192.168.100.130 dev ens18
root@louvre:~# ip route
default via 128.93.193.254 dev ens19 onlink 
128.93.193.0/24 dev ens19 proto kernel scope link src 128.93.193.5 
192.168.50.0/24 via 192.168.100.130 dev ens18 
192.168.100.0/24 dev ens18 proto kernel scope link src 192.168.100.1 
192.168.101.0/24 via 192.168.101.2 dev tun0 
192.168.101.2 dev tun0 proto kernel scope link src 192.168.101.1 
192.168.128.0/24 via 192.168.100.125 dev ens18 
192.168.130.0/24 via 192.168.100.130 dev ens18

The route command is not installed on louvre as it's now replaced by ip.

A test vm on staging is now reachable by ping :

vsellier@louvre ~ % ping 192.168.130.198
PING 192.168.130.198 (192.168.130.198) 56(84) bytes of data.
64 bytes from 192.168.130.198: icmp_seq=1 ttl=63 time=0.535 ms
^C
--- 192.168.130.198 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.535/0.535/0.535/0.000 ms
  • Interfaces configuration updated :
root@louvre:/etc/network# diff -U3 /tmp/interfaces.orig interfaces
--- /tmp/interfaces.orig	2020-10-21 13:05:14.415774611 +0000
+++ interfaces	2020-10-21 13:05:49.155761781 +0000
@@ -19,6 +19,8 @@
 	address  192.168.100.1
 	netmask  255.255.255.0
 	post-up ip route add 192.168.128.0/24 via 192.168.100.125
+	post-up ip route add 192.168.130.0/24 via 192.168.100.130
+	post-up ip route add 192.168.50.0/24 via 192.168.100.130
 
 auto ens19
 iface ens19 inet static
  • VPN configuration updated and vpn restarted:
root@louvre:/etc/openvpn# diff -U3 ~/louvre.conf.orig louvre.conf
--- /root/louvre.conf.orig	2020-10-21 13:10:46.431644210 +0000
+++ louvre.conf	2020-10-21 13:11:16.643632065 +0000
@@ -117,6 +117,8 @@
 push "route 192.168.100.0 255.255.255.0"
 push "route 192.168.200.0 255.255.248.0"
 push "route 192.168.128.0 255.255.255.0"
+push "route 192.168.130.0 255.255.255.0"
+push "route 192.168.50.0 255.255.255.0"
 
 # To assign specific IP addresses to specific
 # clients or if a connecting client has a private
root@louvre:~# systemctl restart openvpn@louvre

After reconnecting to the vpn, the test vm is staging is reachable from my laptop :

$ ping 192.168.130.198
PING 192.168.130.198 (192.168.130.198) 56(84) bytes of data.
64 bytes from 192.168.130.198: icmp_seq=1 ttl=62 time=24.3 ms
64 bytes from 192.168.130.198: icmp_seq=2 ttl=62 time=21.7 ms
^C
--- 192.168.130.198 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 21.707/23.018/24.329/1.311 ms

After having some hard time to configure the initial firewall rules correctly due to the inter-vlan traffic seen as coming from the gateway address and not filtered, the fw rules allow the following facts :

  • everything entering in a network is rejected
  • everything is allowed from internal networks to internet (via VLAN1300)
  • pass through gateway traffic is allowed ( (VPN -> )VLAN440 -> VLAN440 gw -> VLAN443 for example)
  • specific inbound traffic to a network must be explicitly allowed

test vm in staging is reachable from my laptop

i can ping it as well ;)

$ ping 192.168.130.198
PING 192.168.130.198 (192.168.130.198) 56(84) bytes of data.
64 bytes from 192.168.130.198: icmp_seq=1 ttl=62 time=19.4 ms
64 bytes from 192.168.130.198: icmp_seq=2 ttl=62 time=19.8 ms
64 bytes from 192.168.130.198: icmp_seq=3 ttl=62 time=19.9 ms
64 bytes from 192.168.130.198: icmp_seq=4 ttl=62 time=19.5 ms
64 bytes from 192.168.130.198: icmp_seq=5 ttl=62 time=23.2 ms

good news! thanks for the confirmation

The first staging node will be migrated one by one to avoid too much noise in the monitoring and make the detection of the mission rules in the firewall easier. Puppet is disabled on all the staging node to avoid a massive migration :

root@pergamon:~# clush -b -w @staging 'puppet agent --disable "Network update"'

The new rules have to be also manually declared on pergamon to reach the new networks.
Puppet declared them on the configuration but didn't reload the network :

root@pergamon:~# puppet agent --test
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for pergamon.softwareheritage.org
Info: Applying configuration version '1603355074'
Notice: /Stage[main]/Profile::Network/Debnet::Iface[eth0]/Concat[/etc/network/interfaces]/File[/etc/network/interfaces]/content: 
--- /etc/network/interfaces	2020-09-15 16:10:15.235917411 +0000
+++ /tmp/puppet-file20201022-2531741-3gl773	2020-10-22 08:25:16.977289874 +0000
@@ -18,6 +18,8 @@
   up ip route add 192.168.101.0/24 via 192.168.100.1
   up ip route add 192.168.200.0/21 via 192.168.100.1
   up ip route add 192.168.128.0/24 via 192.168.100.125
+  up ip route add 192.168.130.0/24 via 192.168.100.130
+  up ip route add 192.168.50.0/24 via 192.168.100.130
   up ip rule add from 192.168.100.29 table private
   up ip route add 192.168.100.0/24 src 192.168.100.29 dev eth1 table private
   up ip route add default via 192.168.100.1 dev eth1 table private
@@ -25,6 +27,8 @@
   down ip route del default via 192.168.100.1 dev eth1 table private
   down ip route del 192.168.100.0/24 src 192.168.100.29 dev eth1 table private
   down ip rule del from 192.168.100.29 table private
+  down ip route del 192.168.50.0/24 via 192.168.100.130
+  down ip route del 192.168.130.0/24 via 192.168.100.130
   down ip route del 192.168.128.0/24 via 192.168.100.125
   down ip route del 192.168.200.0/21 via 192.168.100.1
   down ip route del 192.168.101.0/24 via 192.168.100.1

Info: Computing checksum on file /etc/network/interfaces
Info: /Stage[main]/Profile::Network/Debnet::Iface[eth0]/Concat[/etc/network/interfaces]/File[/etc/network/interfaces]: Filebucketed /etc/network/interfaces to puppet with sum 886bd2183250e3294dfe886517ba8b57
Notice: /Stage[main]/Profile::Network/Debnet::Iface[eth0]/Concat[/etc/network/interfaces]/File[/etc/network/interfaces]/content: content changed '{md5}886bd2183250e3294dfe886517ba8b57' to '{md5}473489702269c84a89100865222ee819'
Notice: /Stage[main]/Profile::Bind_server::Primary/Resource_record[pushkin/A]/ensure: created
Notice: /Stage[main]/Profile::Bind_server::Primary/Resource_record[pushkin/A+PTR]/ensure: created
Notice: /Stage[main]/Profile::Bind_server::Primary/Resource_record[internalgw/A]/ensure: created
Notice: /Stage[main]/Profile::Bind_server::Primary/Resource_record[internalgw/A+PTR]/ensure: created
Notice: Applied catalog in 37.90 seconds
root@pergamon:/etc/network# ip route
default via 128.93.193.254 dev eth0 onlink 
128.93.193.0/24 dev eth0 proto kernel scope link src 128.93.193.29 
192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.29 
192.168.101.0/24 via 192.168.100.1 dev eth1 
192.168.128.0/24 via 192.168.100.125 dev eth1 
192.168.200.0/21 via 192.168.100.1 dev eth1

Adding the new routes :

root@pergamon:/etc/network# ip route add 192.168.130.0/24 via 192.168.100.130 dev eth1
root@pergamon:/etc/network# ip route add 192.168.50.0/24 via 192.168.100.130 dev eth1

root@pergamon:/etc/network# ip route
default via 128.93.193.254 dev eth0 onlink 
128.93.193.0/24 dev eth0 proto kernel scope link src 128.93.193.29 
192.168.50.0/24 via 192.168.100.130 dev eth1 
192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.29 
192.168.101.0/24 via 192.168.100.1 dev eth1 
192.168.128.0/24 via 192.168.100.125 dev eth1 
192.168.130.0/24 via 192.168.100.130 dev eth1 
192.168.200.0/21 via 192.168.100.1 dev eth1 

root@pergamon:/etc/network# ping 192.168.130.198
PING 192.168.130.198 (192.168.130.198) 56(84) bytes of data.
64 bytes from 192.168.130.198: icmp_seq=1 ttl=63 time=0.490 ms
64 bytes from 192.168.130.198: icmp_seq=2 ttl=63 time=0.484 ms
^C
--- 192.168.130.198 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.484/0.487/0.490/0.003 ms
  • I didn't feel brave enough to restart pergamon or reload the full network configuration of pergamon

worker0 is migrated and reachable. the dns and icinga rules are well updated after puppet ran on worker0 and pergamon.
To update the server, I had to manually change the ip configuration and reboot it because puppet was failing as it was not able to determine the right ip in 192.168.130.0 network as the server was still associated to an ip in 192.168.128.0 :

root@worker0:~# puppet agent --test
Info: Using configured environment 'staging'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, pick(): must receive at least one non empty value (file: /etc/puppet/code/environments/staging/site-modules/profile/manifests/prometheus/node.pp, line: 31, column: 28) on node worker0.internal.staging.swh.network
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

Now several checks are failing due to missing firewall rules, let's analyse the firewall logs and adapt them

List of the rules created :

  • icinga : Floating rule: icinga server -> *:icinga port (5665)
  • prometheus: Floating rule: prometheurs server -> *:prometheus ports (9100/9102/9237/7071/9419)
  • logstash/journal: VLAN440 rule: * -> logstash server:logstash_port (5044)
  • All the servers are migrated to the new network 192.163.130.0/24.
  • Netbox is up to date.
  • The provisionning code was changed accordingly and applied

I tried to generate de network flow matrix via a script but it seems the opnsense api doesn't allow to retreive the firewall rules, even with the installation of the os-firewall [1] plugin extending the basic firewall rules management :

[1]: https://docs.opnsense.org/development/api/plugins/firewall.html

A new update of opnsense is already available. We will have to quickly install the second instance to be able to perform the updates without downtime.

The configuration backup in git is configured[3].
The configuration should be committed on the iFWCFG[1] repository by the user swhfirewall (the credentials are in the credentials repository)

As explained in the documentation[2], the updates are committed and pushed asynchronously. For the moment I haven't seen a complete cycle (update the conf/changed commited/commit pushed) without forcing it manually.

[1]: https://forge.softwareheritage.org/source/iFWCFG.git
[2]: https://wiki.opnsense.org/manual/git-backup.html
[3]: https://pushkin.internal.softwareheritage.org/diag_backup.php

after digging why the git configuration is not pushed, I have found in the git backup configuration [1] the plugins needs an 'configuration-changed` event to detect the updated.
This event[2] was added on the version v20.7.4. The firewall is in the v20.7.3 which can explain why the full process is not working.

I need to deploy the second instance to be able to perform the hot upgrade without interruptions.

[1] https://github.com/opnsense/plugins/issues/2049
[2] https://github.com/opnsense/core/issues/4388

  • glyptotek hostname reserved on the host naming page [1]
  • pushkin vm cloned on proxmox and deployed on beaubourg for the ha (pushkin in running on branly)
  • to be able to start the new instance without ip conflicts, the network devices have to be disconnected in the proxmox configuration
  • the IPs were reconfigured in the text console via the menu available when the user root connect. This is the assignement :
InterfaceIP
VLAN440192.168.100.128
VLAN442192.168.50.3
VLAN443192.168.130.3
VLAN1300128.93.166.4
  • the Ha settings were configured on both firewalls to activate the synchronization of the states (menu System / High availability / settings) and the configuration, the peer ip was configured to reach fw2 from fw1 and respectively
  • the master/slave switch via the the interface (Interfaces > Virtual IPs / Status -> Enter/Leave Persistent CARP Maintenance Mode) are ok, there is no packets lost between 2 servers (1 in VLAN440 and the other in VLAN443)

[1]: https://intranet.softwareheritage.org/wiki/Hostname_naming_scheme

Now an upgrade can be performed without interruption:

  • On glyptotek (SLAVE), upgrade to the version 20.7.4 launched via the web ui
  • Switch the master from pushkin to glyptotek via the web ui (Interfaces / Virtual Ips / Status => Enter Persistent CARP Maintenance Mode) on pushkin
  • Everything seems to work well in glytotek in 20.7.4 so the operation can be repeated on pushkin
  • Don't forget to disable the Maintenance Mode on both firewalls

after digging why the git configuration is not pushed, I have found in the git backup configuration [1] the plugins needs an 'configuration-changed` event to detect the updated.

After the upgrade, the configuration changes are well committed on the local git repository but never pushed to the forge.
It needs a cron rule on both firewalls to activate the remote backup (System > Settings > Cron ). It's not synchronized via the standard sync so it must be created manually on each firewall.

Each firewall pushes its commits on a dedicated branch (pushkin and glyptotek) to avoid conflicts (the config life cycle is not the same for the firewall master and the slave).

The only remaining task is the monitoring / metrics gathering, it will be detailed on another dedicated task.