The swh-indexer stack is deployed on staging and the initial loading is done.
The volumes are quite low :
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Nov 30 2020
Nov 27 2020
Nov 26 2020
T2814 needs to be released before
the backfilling is complete (except for the metadatas). We will focus now on some clients to ensure all the local configuration is correct (T2814 for example), and then we will focus on exposing kafka to the outside.
Nov 24 2020
The backfillng is done for several objects type and still in progress for revision, content, directory :
root@journal0:/opt/kafka/bin# for topic in $(./kafka-topics.sh --bootstrap-server $SERVER --list); do echo -n "$topic : "; ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $SERVER --topic $topic | awk -F: '{s+=$3}END{print s}'; done __consumer_offsets : 0 swh.journal.objects.content : 927440 swh.journal.objects.directory : 213279 swh.journal.objects.metadata_authority : 0 swh.journal.objects.metadata_fetcher : 0 swh.journal.objects.origin : 62892 swh.journal.objects.origin_visit : 68368 swh.journal.objects.origin_visit_status : 136721 swh.journal.objects.raw_extrinsic_metadata : 0 swh.journal.objects.release : 3101 swh.journal.objects.revision : 155746 swh.journal.objects.skipped_content : 189 swh.journal.objects.snapshot : 36046 swh.journal.objects_privileged.release : 0 swh.journal.objects_privileged.revision : 0
I have some doubts on how to import the following object types and if they need to :
- swh.journal.objects.metadata_authority
- swh.journal.objects.metadata_fetcher
- swh.journal.objects.raw_extrinsic_metadata
- swh.journal.objects_privileged.release
- swh.journal.objects_privileged.revision
The topics were created with 64 partitions and a replication factor of 1 :
- the vm memory was increased from 12G to 20G (completely "pifometrique" approximated value)
- a new data disk of 500Go is attached to the VM (there are currently 300G of objects on storage1.staging)
- the kafka's logdir was configured to be stored on a zfs volume composed of only the new data disk :
root@journal0:~# apt install zfs-dkms ## reboot
Kafka is up and running on journal0.
The next steps are:
- tune the server as there is not a lot of disk space (and memory but only if needed) :
root@journal0:~# df -h Filesystem Size Used Avail Use% Mounted on udev 5.9G 0 5.9G 0% /dev tmpfs 1.2G 560K 1.2G 1% /run /dev/vda1 32G 7.2G 24G 24% / tmpfs 5.9G 8.0K 5.9G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup tmpfs 244M 0 244M 0% /run/user/1025
root@journal0:~# free -h total used free shared buff/cache available Mem: 11Gi 6.5Gi 354Mi 11Mi 4.8Gi 4.9Gi Swap: 0B 0B 0B
- Create the topics as explained in T2520#48682 (with a smaller number of partition and a replication factor to 1 as we only have one staging server)
- Launch the backfill to populate kafka with the current content of the staging archive
Nov 23 2020
Nov 19 2020
Nov 17 2020
Rectification : kafka is installed on the node but it seems the configuration is not complete
Nov 16 2020
Nov 13 2020
for the last check, here is its origin:
Nov 12 2020
One more check done.
One last check to go.
$ ipython3 In [1]: from swh.loader.git.from_disk import GitLoaderFromDisk
i recall (now) one of those repositories is the parmap one, we got it on uffizi:
Nov 10 2020
Nov 9 2020
Nov 6 2020
Oct 14 2020
Oct 9 2020
Sep 28 2020
This was now just a matter of doing the clickity click on all hosts. They're now using the dedicated vlan.
I've added a bridge vmbr443 to all hypervisors.
I wanted to rename the bridges on the proxmox hosts to something clearer (like vmbr-staging) but it turns out that proxmox only supports bridges named /vmbr\d+/. Ugh.
Sep 23 2020
Jul 21 2020
Jun 9 2020
(the VLAN id for the staging vlan is 443).
In T1872#44251, @rdicosmo wrote:Is this now done? If that's the case this ticked should be closed.
May 29 2020
Deployed.
May 11 2020
This one we can close, not all of them though.
May 9 2020
Is this now done? If that's the case this ticked should be closed.
Apr 24 2020
It's fairly complete already.