The backfill was done in a couple of days.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 4 2021
Dec 23 2020
search1.internal.softwareheritage.org vm deployed.
The configuration of the index was automatically performed by puppet during the initial provisionning.
Index template created in elasticsearch with 1 replica and 90 shards to have the same number of shards on each node:
export ES_SERVER=192.168.100.81:9200 curl -XPUT -H "Content-Type: application/json" http://$ES_SERVER/_index_template/origin\?pretty -d '{"index_patterns": "origin", "template": {"settings": { "index": { "number_of_replicas":1, "number_of_shards": 90 } } } } '
search-esnode[1-3] installed with zfs configured :
apt update && apt install linux-image-amd64 linux-headers-amd64 # reboot to upgrade the kernel apt install libnvpair1linux libuutil1linux libzfs2linux libzpool2linux zfs-dkms zfsutils-linux zfs-zed systemctl stop elasticsearch rm -rf /srv/elasticsearch/nodes/0 zpool create -O atime=off -m /srv/elasticsearch/nodes elasticsearch-data /dev/vdb chown elasticsearch: /srv/elasticsearch/nodes
Inventory was updated to reserve the elastisearch vms :
- search-esnode[1-3].internal.softwareheritage.org
- ips : 192.168.100.8[1-3]/24
The webapp is available at https://webapp1.internal.softwareheritage.org
In prevision of the deployment, the production index present on the staging's elasticsearch was renamed from origin-production2 to production_origin (a clone operation will be user [1], the original index will be let in place)
[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-clone-index.html
Dec 22 2020
Dec 21 2020
Dec 14 2020
With the "optimized" configuration, the import is quite faster :
root@search-esnode0:~# curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @/tmp/reindex-production.json { "took" : 10215280, "timed_out" : false, "total" : 91517657, "updated" : 0, "created" : 91517657, "deleted" : 0, "batches" : 91518, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0, "failures" : [ ] }
"took" : 10215280, => 2h45
Dec 11 2020
The production index origin was correctly copied from the production cluster but it seems without the configuration to optimize the copy.
We keep this one and try a new optimized copy to check if the server still crash in an OOM with the new cpu and memory settings.
Dec 10 2020
FI: The origin index was recreated with the "official" mapping and a backfill was performed (necessary after the test of the flattened mapping)
The deployment manifest are ok and deployed in staging so this task can be resolved.
We will work on reactivating search-journal-client for the metadata in another task when T2876 is resolved
The copy of the production index is restarted.
To improve the speed of the copy, the index was tuned to reduce the disk pressure (it's a temporary configuration and should not be used in a normal case as it's not safe) :
cat >/tmp/config.json <<EOF { "index" : { "translog.sync_interval" : "60s", "translog.durability": "async", "refresh_interval": "60s" } } EOF
- Parition and memory extended with terraform.
- The disk resize needed some console actions to be extended :
The production index import failed because the limit of 90% of used disk spaces was reached at some time to fall back to around 60G after a compaction
The progression was 80M documents of 91M.
Dec 9 2020
The search rpc backend and the journal client listening on origin and origin_visit topics are deployed.
The inventory is up to date for both hosts [1][2]
Dec 8 2020
A dashboard to monitor the ES cluster behavior has been created on grafana [1]
It will be improved during the swh-search tests
Dec 7 2020
Interesting note about how to size the shards of an index : https://www.elastic.co/guide/en/elasticsearch/reference/7.x//size-your-shards.html
Dec 4 2020
done for the journal. We might want to do the same in the RPC low level stack (swh.core.api), some day (or replace this later by gRPC or so :-) )
We added a volume of 100Gib to the search-esnode0 through terraform (D4663).
So we could mount the /srv/elasticsearch as zfs volume.
Dec 3 2020
Dec 2 2020
Dec 1 2020
Nov 30 2020
Deployed the following:
It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.
Nov 27 2020
The swh-indexer stack is deployed on staging and the initial loading is done.
The volumes are quite low :
It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.
I propose meeting in the middle and having the following policies:
and indexer 0.6.1 is now packaged. We have everything we need to unstuck it now.
In T2780#53415, @olasd wrote:Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?
Where is the list of topics that need to be created?
It's unclear what the prefix should be. swh.storage uses swh.journal.objects, we can either use that one too, or a new one, eg. swh.journal.indexed
I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.
this a description of the pipeline to clarify the interaction between the components (source: P883) :
Nov 26 2020
Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?
T2814 needs to be released before
Nov 16 2020
Oct 12 2020
Closing in favor of T2686
Oct 8 2020
I like the Final version attribute idea. But to be clear, this means we will need to add successive versions as extra classes in swh.model.model when we change the schema (which is not necessarily a bad thing), and remove them when we are sure they are not around anymore