Page MenuHomeSoftware Heritage

[staging] deploy the journal infrastructure
Closed, MigratedEdits Locked

Description

The journal0.staging.internal.swh.network server is installed but the kafka deployment is not configured.

This is a prerequisite for T2682

Event Timeline

vsellier changed the task status from Open to Work in Progress.Nov 17 2020, 3:09 PM

Rectification : kafka is installed on the node but it seems the configuration is not complete

[2020-11-17 16:29:43,971] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2020-11-17 16:29:44,426] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2020-11-17 16:29:44,446] ERROR Exiting Kafka due to fatal exception (kafka.Kafka$)
java.lang.IllegalArgumentException: Error creating broker listeners from 'PLAINTEXT://journal0.internal.staging.swh.network:': Unable to parse PLAINTEXT://journal0.internal.staging.swh.network: to a broker endpoint
        at kafka.utils.CoreUtils$.listenerListToEndPoints(CoreUtils.scala:268)
        at kafka.server.KafkaConfig.$anonfun$listeners$1(KafkaConfig.scala:1633)
        at kafka.server.KafkaConfig.listeners(KafkaConfig.scala:1632)
        at kafka.server.KafkaConfig.advertisedListeners(KafkaConfig.scala:1660)
        at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:1731)
        at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:1709)
        at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:1273)
        at kafka.server.KafkaServerStartable$.fromProps(KafkaServerStartable.scala:34)
        at kafka.Kafka$.main(Kafka.scala:68)
        at kafka.Kafka.main(Kafka.scala)
Caused by: org.apache.kafka.common.KafkaException: Unable to parse PLAINTEXT://journal0.internal.staging.swh.network: to a broker endpoint
        at kafka.cluster.EndPoint$.createEndPoint(EndPoint.scala:57)
        at kafka.utils.CoreUtils$.$anonfun$listenerListToEndPoints$6(CoreUtils.scala:265)
        at scala.collection.StrictOptimizedIterableOps.map(StrictOptimizedIterableOps.scala:99)
        at scala.collection.StrictOptimizedIterableOps.map$(StrictOptimizedIterableOps.scala:86)
        at scala.collection.mutable.ArraySeq.map(ArraySeq.scala:38)
        at kafka.utils.CoreUtils$.listenerListToEndPoints(CoreUtils.scala:265)
        ... 9 more

I have exactly the same log locally in vagrant

Kafka is up and running on journal0.
The next steps are:

  • tune the server as there is not a lot of disk space (and memory but only if needed) :
root@journal0:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            5.9G     0  5.9G   0% /dev
tmpfs           1.2G  560K  1.2G   1% /run
/dev/vda1        32G  7.2G   24G  24% /
tmpfs           5.9G  8.0K  5.9G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
tmpfs           244M     0  244M   0% /run/user/1025
root@journal0:~# free -h
              total        used        free      shared  buff/cache   available
Mem:           11Gi       6.5Gi       354Mi        11Mi       4.8Gi       4.9Gi
Swap:            0B          0B          0B
  • Create the topics as explained in T2520#48682 (with a smaller number of partition and a replication factor to 1 as we only have one staging server)
  • Launch the backfill to populate kafka with the current content of the staging archive
  • the vm memory was increased from 12G to 20G (completely "pifometrique" approximated value)
  • a new data disk of 500Go is attached to the VM (there are currently 300G of objects on storage1.staging)
  • the kafka's logdir was configured to be stored on a zfs volume composed of only the new data disk :
root@journal0:~# apt install zfs-dkms
## reboot

root@journal0:~# systemctl stop kafka
root@journal0:~# rm -rf /srv/kafka/logdir/*

root@journal0:~# zpool create -f kafka-volume -m /srv/kafka/logdir  /dev/vdb
root@journal0:~# zfs set relatime=on kafka-volume

root@journal0:~# zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
kafka-volume   496G   218K   496G        -         -     0%     0%  1.00x    ONLINE  -

# reboot to be sure

The topics were created with 64 partitions and a replication factor of 1 :

for object_type in content skipped_content directory revision release snapshot origin origin_visit origin_visit_status raw_extrinsic_metadata metadata_fetcher metadata_authority; do
    ./kafka-topics.sh --bootstrap-server journal0.internal.staging.swh.network:9092 --create --config cleanup.policy=compact --partitions 64 --replication-factor 1 --topic swh.journal.objects.$object_type
done

for object_type in revision release; do
    ./kafka-topics.sh --bootstrap-server journal0.internal.staging.swh.network:9092 --create --config cleanup.policy=compact --partitions 64 --replication-factor 1 --topic swh.journal.objects_privileged.$object_type
done

Logs:

root@journal0:/opt/kafka/bin# for object_type in content skipped_content directory revision release snapshot origin origin_visit origin_visit_status raw_extrinsic_metadata metadata_fetcher metadata_authority; do
>     ./kafka-topics.sh --bootstrap-server journal0.internal.staging.swh.network:9092 --create --config cleanup.policy=compact --partitions 64 --replication-factor 1 --topic swh.journal.objects.$object_type
> done
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.content.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.skipped_content.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.directory.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.revision.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.release.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.snapshot.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.origin.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.origin_visit.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.origin_visit_status.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.raw_extrinsic_metadata.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.metadata_fetcher.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects.metadata_authority.
root@journal0:/opt/kafka/bin# for object_type in revision release; do
>     ./kafka-topics.sh --bootstrap-server journal0.internal.staging.swh.network:9092 --create --config cleanup.policy=compact --partitions 64 --replication-factor 1 --topic swh.journal.objects_privileged.$object_type
> done
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects_privileged.revision.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic swh.journal.objects_privileged.release.

The backfillng is done for several objects type and still in progress for revision, content, directory :

root@journal0:/opt/kafka/bin# for topic in $(./kafka-topics.sh --bootstrap-server $SERVER --list); do   echo -n "$topic : ";   ./kafka-run-class.sh kafka.tools.GetOffsetShell  --broker-list $SERVER --topic $topic | awk -F: '{s+=$3}END{print s}'; done
__consumer_offsets : 0
swh.journal.objects.content : 927440
swh.journal.objects.directory : 213279
swh.journal.objects.metadata_authority : 0
swh.journal.objects.metadata_fetcher : 0
swh.journal.objects.origin : 62892
swh.journal.objects.origin_visit : 68368
swh.journal.objects.origin_visit_status : 136721
swh.journal.objects.raw_extrinsic_metadata : 0
swh.journal.objects.release : 3101
swh.journal.objects.revision : 155746
swh.journal.objects.skipped_content : 189
swh.journal.objects.snapshot : 36046
swh.journal.objects_privileged.release : 0
swh.journal.objects_privileged.revision : 0

I have some doubts on how to import the following object types and if they need to :

  • swh.journal.objects.metadata_authority
  • swh.journal.objects.metadata_fetcher
  • swh.journal.objects.raw_extrinsic_metadata
  • swh.journal.objects_privileged.release
  • swh.journal.objects_privileged.revision

origin

swhstorage@storage1:~$ SWH_CONFIG_FILENAME=storage.yml swh storage backfill --start-object=0 --end-object=34500
INFO:swh.storage.backfill:Processing origin range None to 1000                                         
INFO:swh.storage.backfill:Processing origin range 1000 to 2000                                         
INFO:swh.storage.backfill:Processing origin range 2000 to 3000                                         
INFO:swh.storage.backfill:Processing origin range 3000 to 4000                                         
...
INFO:swh.storage.backfill:Processing origin range 33000 to 34000
INFO:swh.storage.backfill:Processing origin range 34000 to 34500

real    0m11.536s
user    0m3.260s
sys     0m0.982s

origin_visit

swhstorage@storage1:~$ time SWH_CONFIG_FILENAME=storage.yml swh storage backfill  origin_visit --end-object=35000
INFO:swh.storage.backfill:Processing origin_visit range 34000 to 35000

real    0m16.783s
user    0m8.253s
sys     0m1.191s

origin_visit_status

# after patching line 507 of backfill.py
time SWH_CONFIG_FILENAME=storage.yml swh storage backfill  origin_visit_status --end-object=35000
...
INFO:swh.storage.backfill:Processing origin_visit_status range 32000 to 33000                                      
INFO:swh.storage.backfill:Processing origin_visit_status range 33000 to 34000                                      
INFO:swh.storage.backfill:Processing origin_visit_status range 34000 to 35000                                      

real    0m17.936s
user    0m12.551s
sys     0m1.120s

skipped_content

swhstorage@storage1:~$ time SWH_CONFIG_FILENAME=storage.yml swh storage backfill  skipped_content
INFO:swh.storage.backfill:Processing skipped_content range None to None

real    0m0.590s
user    0m0.487s
sys     0m0.064s

release

swhstorage@storage1:~$ time SWH_CONFIG_FILENAME=storage.yml swh storage backfill release --start-object=0 --end-object ffff
...
INFO:swh.storage.backfill:Processing release range fffe to ffff
INFO:swh.storage.backfill:Processing release range ffff to None

real    1m16.421s
user    0m19.655s
sys     0m3.622s

snapshot

swhstorage@storage1:~$ time SWH_CONFIG_FILENAME=storage.yml swh storage backfill snapshot  --end-object ffff
...
INFO:swh.storage.backfill:Processing snapshot range fffe to ffff
INFO:swh.storage.backfill:Processing snapshot range ffff to None

real    2m30.118s
user    0m31.171s
sys     0m9.037s

revision

swhstorage@storage1:~$ time SWH_CONFIG_FILENAME=storage.yml swh storage backfill revision  --end-object ffffff

...
INFO:swh.storage.backfill:Processing revision range fffffd to fffffe
INFO:swh.storage.backfill:Processing revision range fffffe to ffffff
INFO:swh.storage.backfill:Processing revision range ffffff to None

real    435m7.953s
user    104m50.847s
sys     19m12.365s

content

swhstorage@storage1:~$ time SWH_CONFIG_FILENAME=storage.yml swh storage backfill revision  --end-object ffffff

INFO:swh.storage.backfill:Processing content range fffffb to fffffc
INFO:swh.storage.backfill:Processing content range fffffc to fffffd
INFO:swh.storage.backfill:Processing content range fffffd to fffffe
INFO:swh.storage.backfill:Processing content range fffffe to ffffff
INFO:swh.storage.backfill:Processing content range ffffff to None

real    845m39.746s                                                                                                                                                                         
user    213m26.057s                                                                                                                                                                         
sys     53m32.150s

directory

swhstorage@storage1:~$ time SWH_CONFIG_FILENAME=storage.yml swh storage backfill directory  --end-object ffffff
...
INFO:swh.storage.backfill:Processing directory range fffffd to fffffe
INFO:swh.storage.backfill:Processing directory range fffffe to ffffff
INFO:swh.storage.backfill:Processing directory range ffffff to None
real    1326m38.221s
user    560m44.543s
sys     58m21.216s

the backfilling is complete (except for the metadatas). We will focus now on some clients to ensure all the local configuration is correct (T2814 for example), and then we will focus on exposing kafka to the outside.