Page MenuHomeSoftware Heritage

Staging environmentFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

Production-like environment for testing changes in SWH's infrastructure before applying them to production

Recent Activity

Yesterday

vsellier changed the status of T2920: Document staging infrastructure from Open to Work in Progress.
Mon, Jan 25, 3:32 PM · Documentation, System administration, Staging environment

Wed, Jan 20

moranegg added a project to T2920: Document staging infrastructure: Documentation.
Wed, Jan 20, 10:33 AM · Documentation, System administration, Staging environment

Mon, Jan 18

vsellier moved T2920: Document staging infrastructure from Backlog to Weekly backlog on the System administration board.
Mon, Jan 18, 7:13 PM · Documentation, System administration, Staging environment
vsellier added a project to T2920: Document staging infrastructure: System administration.
Mon, Jan 18, 7:13 PM · Documentation, System administration, Staging environment

Wed, Jan 6

ardumont added a comment to T2770: Fix all icinga checks on staging webapp.

The last check no longer appears in icinga.

Wed, Jan 6, 4:36 PM · Monitoring, System administration, Staging environment
ardumont closed T2770: Fix all icinga checks on staging webapp as Resolved.
Wed, Jan 6, 4:36 PM · Monitoring, System administration, Staging environment
ardumont changed the status of T2770: Fix all icinga checks on staging webapp from Open to Work in Progress.
Wed, Jan 6, 4:36 PM · Monitoring, System administration, Staging environment
ardumont moved T2877: Investigate spurious deposit logs from Backlog to deployed on the System administration board.
Wed, Jan 6, 3:45 PM · System administration, Staging environment, SWORD deposit

Mon, Jan 4

vsellier closed T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage) as Resolved.

Closing this task all the direct work is done.
The documentation will be addressed in T2920

Mon, Jan 4, 12:33 PM · Staging environment, System administration
vsellier triaged T2920: Document staging infrastructure as Normal priority.
Mon, Jan 4, 12:32 PM · Documentation, System administration, Staging environment

Dec 22 2020

vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).

Everything looks good, let's try to add some documentation before closing the issue

Dec 22 2020, 9:56 AM · Staging environment, System administration
vsellier updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Dec 22 2020, 9:54 AM · Staging environment, System administration

Dec 21 2020

vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
  • A new vm objstorage0.internal.staging.swh.network is configured with an read-only object storage service
  • It's exposed to internet via the reverse proxy at https://objstorage.staging.swh.network (it quite different as the usual objstorage:5003 url but it allow to expose the service without new network configuration)
  • DNS entry added on gandi
  • Inventory updated
Dec 21 2020, 7:32 PM · Staging environment, System administration
vsellier added a revision to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage): D4776: [staging] Configure and expose to internet a read-only objstorage.
Dec 21 2020, 6:01 PM · Staging environment, System administration
vsellier added a revision to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage): D4775: Add objstorage0.staging.swh.network node to expose a r/o objstorage node.
Dec 21 2020, 4:48 PM · Staging environment, System administration
vsellier updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Dec 21 2020, 12:58 PM · Staging environment, System administration
vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).

A user was correctly configured and a read test performed :

Dec 21 2020, 12:57 PM · Staging environment, System administration
vsellier updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Dec 21 2020, 12:38 PM · Staging environment, System administration
vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).

The network configuration is done. The server is now accessible from the internet at broker0.journal.staging.swh.network:9093

Dec 21 2020, 12:25 PM · Staging environment, System administration
vsellier updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Dec 21 2020, 12:24 PM · Staging environment, System administration

Dec 18 2020

vsellier updated the task description for T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
Dec 18 2020, 4:59 PM · Staging environment, System administration
vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).

The request to expose the journal to internet was done this afternoon to the dsi.

Dec 18 2020, 4:57 PM · Staging environment, System administration

Dec 17 2020

vsellier closed T2897: [staging] kafka data dir over 80%, a subtask of T2790: [staging] deploy the journal infrastructure, as Resolved.
Dec 17 2020, 10:00 AM · System administration, Staging environment
vsellier closed T2897: [staging] kafka data dir over 80% as Resolved.
Dec 17 2020, 10:00 AM · System administration, Staging environment
vsellier added a comment to T2897: [staging] kafka data dir over 80%.

After one week, the disk used by kafka was around 85% of usage

root@journal0:/tmp# df -h /srv/kafka/logdir
Filesystem      Size  Used Avail Use% Mounted on
kafka-volume    481G  409G   73G  85% /srv/kafka/logdir

Compared to the production, the compression was not activated on the zfs pool:

root@kafka1:~#  zfs get all data/kafka  | grep compress
data/kafka  compressratio         1.55x                  -
data/kafka  compression           lz4                    inherited from data
data/kafka  refcompressratio      1.55x                  -
root@journal0:/tmp# zfs get all  | grep compress
kafka-volume  compressratio         1.00x                  -
kafka-volume  compression           off                    default
kafka-volume  refcompressratio      1.00x                  -

So the compression was activated :

root@journal0:/tmp# zfs set compression=lz4 kafka-volume
root@journal0:/tmp# zfs get all  | grep compress
kafka-volume  compressratio         1.00x                  -
kafka-volume  compression           lz4                    local
kafka-volume  refcompressratio      1.00x                  -

As this parameter is only used for the new written data, we have force a compact on the biggest topics : `directory, revision and content`

 % ./kafka-topics.sh --zookeeper $ZK  --alter --topic swh.journal.objects.revision --config min.cleanable.dirty.ratio=0.01
WARNING: Altering topic configuration from this script has been deprecated and may be removed in future releases.
         Going forward, please use kafka-configs.sh for this functionality
Updated config for topic swh.journal.objects.revision.
vsellier@journal0 /opt/kafka/bin
 % ./kafka-topics.sh --zookeeper $ZK  --alter --topic swh.journal.objects_privileged.revision --config min.cleanable.dirty.ratio=0.01
WARNING: Altering topic configuration from this script has been deprecated and may be removed in future releases.
         Going forward, please use kafka-configs.sh for this functionality
Updated config for topic swh.journal.objects_privileged.revision.
Dec 17 2020, 10:00 AM · System administration, Staging environment
vsellier changed the status of T2897: [staging] kafka data dir over 80% from Open to Work in Progress.
Dec 17 2020, 9:58 AM · System administration, Staging environment

Dec 14 2020

vsellier added a comment to T2817: Enable the swh-search environment in staging.

With the "optimized" configuration, the import is quite faster :

root@search-esnode0:~# curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @/tmp/reindex-production.json    
{
  "took" : 10215280,
  "timed_out" : false,
  "total" : 91517657,
  "updated" : 0,
  "created" : 91517657,
  "deleted" : 0,
  "batches" : 91518,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

"took" : 10215280, => 2h45

Dec 14 2020, 9:47 AM · System administrators, Staging environment, Journal, Archive search

Dec 11 2020

vsellier added a comment to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage).
  • diff landed and applied on the server
  • VIP 128.93.166.40 configured on the firewall
  • NAT Port forward of port 9093 from public ip to internal journal0 declared on the firewall
  • DNS declaration of broker0.journal.staging.swh.network in gandi
  • Ask to DSI to apply the kafka firewall profile to 128.93.166.40
  • Configure a user to test the pipeline
Dec 11 2020, 6:11 PM · Staging environment, System administration
ardumont moved T2877: Investigate spurious deposit logs from Backlog to Deployed on the SWORD deposit board.
Dec 11 2020, 3:22 PM · System administration, Staging environment, SWORD deposit
vsellier added a revision to T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage): D4726: kafka: activate the authentication on the public network.
Dec 11 2020, 3:17 PM · Staging environment, System administration
ardumont placed T2877: Investigate spurious deposit logs up for grabs.
Dec 11 2020, 3:03 PM · System administration, Staging environment, SWORD deposit
ardumont closed T2877: Investigate spurious deposit logs as Resolved.
Dec 11 2020, 3:03 PM · System administration, Staging environment, SWORD deposit
ardumont claimed T2877: Investigate spurious deposit logs.

And now spurious logs are gone for the deposit.

Dec 11 2020, 3:03 PM · System administration, Staging environment, SWORD deposit
ardumont added a comment to T2877: Investigate spurious deposit logs.

Deployed (rp0.staging, webapp0.azure, moma).

Dec 11 2020, 3:02 PM · System administration, Staging environment, SWORD deposit
vsellier added a comment to T2877: Investigate spurious deposit logs.

I agree for the default site but we have several legit requests from the monitoring not correctly routed so the configuration needs to be adapted.

Dec 11 2020, 11:46 AM · System administration, Staging environment, SWORD deposit
ardumont updated the task description for T2877: Investigate spurious deposit logs.
Dec 11 2020, 11:45 AM · System administration, Staging environment, SWORD deposit
vsellier added a revision to T2877: Investigate spurious deposit logs: D4719: varnish: Correctly handle the vhost when the port number is included.
Dec 11 2020, 11:42 AM · System administration, Staging environment, SWORD deposit
vlorentz added a comment to T2877: Investigate spurious deposit logs.

You could just add a 00-default vhost that shows a generic error message. (that's not even a hack to rely on alphabetical order for vhost configs)

Dec 11 2020, 11:35 AM · System administration, Staging environment, SWORD deposit
ardumont triaged T2877: Investigate spurious deposit logs as Normal priority.
Dec 11 2020, 11:17 AM · System administration, Staging environment, SWORD deposit
vsellier added a comment to T2817: Enable the swh-search environment in staging.

The production index origin was correctly copied from the production cluster but it seems without the configuration to optimize the copy.
We keep this one and try a new optimized copy to check if the server still crash in an OOM with the new cpu and memory settings.

Dec 11 2020, 10:15 AM · System administrators, Staging environment, Journal, Archive search

Dec 10 2020

vsellier changed the status of T2682: Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage) from Open to Work in Progress.
Dec 10 2020, 5:41 PM · Staging environment, System administration
vsellier added a comment to T2817: Enable the swh-search environment in staging.

FI: The origin index was recreated with the "official" mapping and a backfill was performed (necessary after the test of the flattened mapping)

Dec 10 2020, 3:42 PM · System administrators, Staging environment, Journal, Archive search
vsellier closed T2817: Enable the swh-search environment in staging as Resolved.

The deployment manifest are ok and deployed in staging so this task can be resolved.
We will work on reactivating search-journal-client for the metadata in another task when T2876 is resolved

Dec 10 2020, 3:29 PM · System administrators, Staging environment, Journal, Archive search
vsellier updated the task description for T2817: Enable the swh-search environment in staging.
Dec 10 2020, 3:19 PM · System administrators, Staging environment, Journal, Archive search
ardumont updated the task description for T2817: Enable the swh-search environment in staging.
Dec 10 2020, 1:21 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4712: staging: Increase elasticsearch jvm heap size to half its memory.
Dec 10 2020, 11:47 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

The copy of the production index is restarted.
To improve the speed of the copy, the index was tuned to reduce the disk pressure (it's a temporary configuration and should not be used in a normal case as it's not safe) :

cat >/tmp/config.json <<EOF
{
  "index" : {
    "translog.sync_interval" : "60s",
	"translog.durability": "async",
	"refresh_interval": "60s"
  }
}
EOF
Dec 10 2020, 11:14 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.
  • Parition and memory extended with terraform.
  • The disk resize needed some console actions to be extended :
Dec 10 2020, 10:39 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

The production index import failed because the limit of 90% of used disk spaces was reached at some time to fall back to around 60G after a compaction
The progression was 80M documents of 91M.

Dec 10 2020, 9:59 AM · System administrators, Staging environment, Journal, Archive search

Dec 9 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4710: search.journal_client: Fix key error.
Dec 9 2020, 10:26 PM · System administrators, Staging environment, Journal, Archive search