Page MenuHomeSoftware Heritage

Deploy a small publicly available kafka server (with some content) on a staging (+ the related objstorage)
Closed, MigratedEdits Locked

Description

The idea is that I'd like to be able to document the mirror stack so that it "just works" out of the box so people interested in setting up a mirror can test it.

For this, we need a small kafka server with some data in it (say one or 2 small gitab/gitea ingested instances) that is publicly available. Only anonymized topics should be accessible there.

For the content-replayer to work, it will also need an objectstorage, since the content replayer pulls blobs from one objstorage to another (reading sha1s from the content topic).
This might be a bit trickier if we want to make it "read-only" to prevent kiddies from playing with it...
[edit] the ReadObjStorageFilter is our friend here[/edit]

It might also be useful to have the kafka accessible with and without authentication (to test that authentication layer), but it's not that important.

Actions :

  • diff landed and applied on the server
  • VIP 128.93.166.40 configured on the firewall
  • NAT Port forward of port 9093 from public ip to internal journal0 declared on the firewall
  • DNS declaration of broker0.journal.staging.swh.network in gandi
  • Ask to DSI to apply the kafka firewall profile to 128.93.166.40
  • Configure a user to test the pipeline
  • Configure a read-only object storage on webapp.staging

Event Timeline

douardda triaged this task as High priority.Oct 9 2020, 3:37 PM
douardda created this task.
vsellier changed the task status from Open to Work in Progress.Dec 10 2020, 5:41 PM
  • diff landed and applied on the server
  • VIP 128.93.166.40 configured on the firewall
  • NAT Port forward of port 9093 from public ip to internal journal0 declared on the firewall
  • DNS declaration of broker0.journal.staging.swh.network in gandi
  • Ask to DSI to apply the kafka firewall profile to 128.93.166.40
  • Configure a user to test the pipeline

    Note copied to the task description, please don't edit here

The request to expose the journal to internet was done this afternoon to the dsi.

The network configuration is done. The server is now accessible from the internet at broker0.journal.staging.swh.network:9093

A user was correctly configured and a read test performed :

  • User creation:
% /opt/kafka/bin/kafka-configs.sh \
    --zookeeper journal0.internal.staging.swh.network:2181/kafka/softwareheritage \
    --alter \
    --add-config 'SCRAM-SHA-256=[iterations=8192,password=xxxx],SCRAM-SHA-512=[password=xxxx]' \
    --entity-type users \
    --entity-name swh-test
Warning: --zookeeper is deprecated and will be removed in a future version of Kafka.
Use --bootstrap-server instead to specify a broker to connect to.
Completed updating config for entity: user-principal 'swh-test'.
  • acl configuration
bootstrap_servers=journal0.internal.staging.swh.network:9092
username=swh-test

# Allow READ and DESCRIBE on unprivileged topics
% /opt/kafka/bin/kafka-acls.sh --bootstrap-server $bootstrap_servers --add --resource-pattern-type PREFIXED --topic swh.journal.objects. --allow-principal User:$username --operation READ

Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=swh.journal.objects., patternType=PREFIXED)`: 
 	(principal=User:swh-test, host=*, operation=READ, permissionType=ALLOW) 

Current ACLs for resource `ResourcePattern(resourceType=TOPIC, name=swh.journal.objects., patternType=PREFIXED)`: 
 	(principal=User:swh-test, host=*, operation=READ, permissionType=ALLOW) 

% /opt/kafka/bin/kafka-acls.sh --bootstrap-server $bootstrap_servers --add --resource-pattern-type PREFIXED --topic swh.journal.objects. --allow-principal User:$username --operation DESCRIBE
Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=swh.journal.objects., patternType=PREFIXED)`: 
 	(principal=User:swh-test, host=*, operation=DESCRIBE, permissionType=ALLOW) 

Current ACLs for resource `ResourcePattern(resourceType=TOPIC, name=swh.journal.objects., patternType=PREFIXED)`: 
 	(principal=User:swh-test, host=*, operation=READ, permissionType=ALLOW)
	(principal=User:swh-test, host=*, operation=DESCRIBE, permissionType=ALLOW) 

# Allow READ on consumer groups prefixed with `$username-`
%  /opt/kafka/bin/kafka-acls.sh --bootstrap-server $bootstrap_servers --add --resource-pattern-type PREFIXED --group ${username}- --allow-principal User:$username --operation READ
Adding ACLs for resource `ResourcePattern(resourceType=GROUP, name=swh-test-, patternType=PREFIXED)`: 
 	(principal=User:swh-test, host=*, operation=READ, permissionType=ALLOW) 

Current ACLs for resource `ResourcePattern(resourceType=GROUP, name=swh-test-, patternType=PREFIXED)`: 
 	(principal=User:swh-test, host=*, operation=READ, permissionType=ALLOW)
  • Read test from internet :
~/Downloads/kafka_2.13-2.6.0/bin ❯ ./kafka-console-consumer.sh --consumer.config=kafka.properties --bootstrap-server  broker0.journal.staging.swh.network:9093  --topic swh.journal.objects.release --group swh-test-test --from-beginning
...

Processed a total of 57000 messages
% ./kafka-consumer-groups.sh --bootstrap-server ${SERVER} --describe --all-topics --group swh-test-test 

Consumer group 'swh-test-test' has no active members.

GROUP           TOPIC                       PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
swh-test-test   swh.journal.objects.release 10         1575            2339            764             -               -               -
swh-test-test   swh.journal.objects.release 43         1452            2150            698             -               -               -
swh-test-test   swh.journal.objects.release 6          1543            2290            747             -               -               -
...
  • A new vm objstorage0.internal.staging.swh.network is configured with an read-only object storage service
  • It's exposed to internet via the reverse proxy at https://objstorage.staging.swh.network (it quite different as the usual objstorage:5003 url but it allow to expose the service without new network configuration)
  • DNS entry added on gandi
  • Inventory updated

Everything looks good, let's try to add some documentation before closing the issue

vsellier closed this task as Resolved.EditedJan 4 2021, 12:33 PM

Closing this task as all the direct work is done.
The documentation will be addressed in T2920