Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 4 2021

vsellier added a comment to T2905: Deploy swh-search for production.

The backfill was done in a couple of days.

Jan 4 2021, 9:41 AM · System administration, Journal, Archive search

Dec 23 2020

vsellier added a comment to T2905: Deploy swh-search for production.

search1.internal.softwareheritage.org vm deployed.
The configuration of the index was automatically performed by puppet during the initial provisionning.

Dec 23 2020, 11:34 PM · System administration, Journal, Archive search
vsellier added a comment to T2905: Deploy swh-search for production.

Index template created in elasticsearch with 1 replica and 90 shards to have the same number of shards on each node:

export ES_SERVER=192.168.100.81:9200
curl -XPUT -H "Content-Type: application/json" http://$ES_SERVER/_index_template/origin\?pretty -d '{"index_patterns": "origin", "template": {"settings": { "index": { "number_of_replicas":1, "number_of_shards": 90 } } } } '
Dec 23 2020, 8:55 PM · System administration, Journal, Archive search
vsellier added a comment to T2905: Deploy swh-search for production.

search-esnode[1-3] installed with zfs configured :

apt update && apt install linux-image-amd64 linux-headers-amd64 
# reboot to upgrade the kernel
apt install libnvpair1linux libuutil1linux libzfs2linux libzpool2linux zfs-dkms zfsutils-linux zfs-zed
systemctl stop elasticsearch
rm -rf /srv/elasticsearch/nodes/0
zpool create -O atime=off -m /srv/elasticsearch/nodes elasticsearch-data /dev/vdb
chown elasticsearch: /srv/elasticsearch/nodes
Dec 23 2020, 8:48 PM · System administration, Journal, Archive search
vsellier added a comment to T2905: Deploy swh-search for production.

Inventory was updated to reserve the elastisearch vms :

  • search-esnode[1-3].internal.softwareheritage.org
  • ips : 192.168.100.8[1-3]/24
Dec 23 2020, 6:20 PM · System administration, Journal, Archive search
vsellier changed the status of T2905: Deploy swh-search for production, a subtask of T2904: Create a new production webapp using the frozen index on the staging ES, from Open to Work in Progress.
Dec 23 2020, 5:53 PM · System administrators, Journal, Archive search
vsellier changed the status of T2905: Deploy swh-search for production from Open to Work in Progress.
Dec 23 2020, 5:53 PM · System administration, Journal, Archive search
vsellier closed T2904: Create a new production webapp using the frozen index on the staging ES, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Dec 23 2020, 5:42 PM · Journal, Archive search
vsellier closed T2904: Create a new production webapp using the frozen index on the staging ES as Resolved.
Dec 23 2020, 5:42 PM · System administrators, Journal, Archive search
vsellier added a comment to T2904: Create a new production webapp using the frozen index on the staging ES.

The webapp is available at https://webapp1.internal.softwareheritage.org

Dec 23 2020, 5:42 PM · System administrators, Journal, Archive search
vsellier added a comment to T2904: Create a new production webapp using the frozen index on the staging ES.

In prevision of the deployment, the production index present on the staging's elasticsearch was renamed from origin-production2 to production_origin (a clone operation will be user [1], the original index will be let in place)
[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-clone-index.html

Dec 23 2020, 4:18 PM · System administrators, Journal, Archive search
vsellier added a revision to T2904: Create a new production webapp using the frozen index on the staging ES: D4785: Allow to configure the index to use on elasticsearch.
Dec 23 2020, 1:03 PM · System administrators, Journal, Archive search
vsellier changed the status of T2904: Create a new production webapp using the frozen index on the staging ES, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Dec 23 2020, 10:04 AM · Journal, Archive search
vsellier changed the status of T2904: Create a new production webapp using the frozen index on the staging ES from Open to Work in Progress.
Dec 23 2020, 10:04 AM · System administrators, Journal, Archive search

Dec 22 2020

vlorentz closed T2876: metadata indexation : ES' dynamic mapping creation fails for field values that are of varying types, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Dec 22 2020, 10:52 AM · Journal, Archive search

Dec 21 2020

vsellier triaged T2905: Deploy swh-search for production as Normal priority.
Dec 21 2020, 10:03 AM · System administration, Journal, Archive search
vsellier triaged T2904: Create a new production webapp using the frozen index on the staging ES as Normal priority.
Dec 21 2020, 9:59 AM · System administrators, Journal, Archive search

Dec 14 2020

vlorentz reopened T2876: metadata indexation : ES' dynamic mapping creation fails for field values that are of varying types, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Open.
Dec 14 2020, 10:56 AM · Journal, Archive search
vlorentz closed T2876: metadata indexation : ES' dynamic mapping creation fails for field values that are of varying types, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Dec 14 2020, 10:54 AM · Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

With the "optimized" configuration, the import is quite faster :

root@search-esnode0:~# curl -XPOST -H "Content-Type: application/json" http://${ES_SERVER}/_reindex\?pretty\&refresh=true\&requests_per_second=-1\&\&wait_for_completion=true -d @/tmp/reindex-production.json    
{
  "took" : 10215280,
  "timed_out" : false,
  "total" : 91517657,
  "updated" : 0,
  "created" : 91517657,
  "deleted" : 0,
  "batches" : 91518,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

"took" : 10215280, => 2h45

Dec 14 2020, 9:47 AM · System administrators, Staging environment, Journal, Archive search

Dec 11 2020

vsellier added a comment to T2817: Enable the swh-search environment in staging.

The production index origin was correctly copied from the production cluster but it seems without the configuration to optimize the copy.
We keep this one and try a new optimized copy to check if the server still crash in an OOM with the new cpu and memory settings.

Dec 11 2020, 10:15 AM · System administrators, Staging environment, Journal, Archive search

Dec 10 2020

vsellier added a comment to T2817: Enable the swh-search environment in staging.

FI: The origin index was recreated with the "official" mapping and a backfill was performed (necessary after the test of the flattened mapping)

Dec 10 2020, 3:42 PM · System administrators, Staging environment, Journal, Archive search
vsellier closed T2817: Enable the swh-search environment in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Dec 10 2020, 3:29 PM · Journal, Archive search
vsellier closed T2817: Enable the swh-search environment in staging as Resolved.

The deployment manifest are ok and deployed in staging so this task can be resolved.
We will work on reactivating search-journal-client for the metadata in another task when T2876 is resolved

Dec 10 2020, 3:29 PM · System administrators, Staging environment, Journal, Archive search
vsellier updated the task description for T2817: Enable the swh-search environment in staging.
Dec 10 2020, 3:19 PM · System administrators, Staging environment, Journal, Archive search
ardumont updated the task description for T2817: Enable the swh-search environment in staging.
Dec 10 2020, 1:21 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a subtask for T2590: Finish the indexer -> swh-search pipeline: T2876: metadata indexation : ES' dynamic mapping creation fails for field values that are of varying types.
Dec 10 2020, 12:31 PM · Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4712: staging: Increase elasticsearch jvm heap size to half its memory.
Dec 10 2020, 11:47 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

The copy of the production index is restarted.
To improve the speed of the copy, the index was tuned to reduce the disk pressure (it's a temporary configuration and should not be used in a normal case as it's not safe) :

cat >/tmp/config.json <<EOF
{
  "index" : {
    "translog.sync_interval" : "60s",
	"translog.durability": "async",
	"refresh_interval": "60s"
  }
}
EOF
Dec 10 2020, 11:14 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.
  • Parition and memory extended with terraform.
  • The disk resize needed some console actions to be extended :
Dec 10 2020, 10:39 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

The production index import failed because the limit of 90% of used disk spaces was reached at some time to fall back to around 60G after a compaction
The progression was 80M documents of 91M.

Dec 10 2020, 9:59 AM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4709: indexer_storage: Publish indexer computation to journal topics.
Dec 10 2020, 9:43 AM · Journal, Archive search

Dec 9 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4710: search.journal_client: Fix key error.
Dec 9 2020, 10:26 PM · System administrators, Staging environment, Journal, Archive search
ardumont created P898 swh-search data out of the swh.journal.indexed.origin_intrinsic_metadata topic.
Dec 9 2020, 10:25 PM · Indexer, Archive search, Journal
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4709: indexer_storage: Publish indexer computation to journal topics.
Dec 9 2020, 10:09 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4704: docker-compose.search.yml: Add journal client for indexed values.
Dec 9 2020, 6:19 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a revision to T2817: Enable the swh-search environment in staging: D4701: Allow configuration through cli or config file.
Dec 9 2020, 5:57 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4699: search: Deploy multiple search journal client instances.
Dec 9 2020, 5:20 PM · System administrators, Staging environment, Journal, Archive search
ardumont updated the task description for T2817: Enable the swh-search environment in staging.
Dec 9 2020, 11:39 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

The search rpc backend and the journal client listening on origin and origin_visit topics are deployed.
The inventory is up to date for both hosts [1][2]

Dec 9 2020, 9:51 AM · System administrators, Staging environment, Journal, Archive search
vsellier updated the task description for T2817: Enable the swh-search environment in staging.
Dec 9 2020, 9:35 AM · System administrators, Staging environment, Journal, Archive search

Dec 8 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4687: search: Add initialization step on install or upgrade.
Dec 8 2020, 4:06 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

A dashboard to monitor the ES cluster behavior has been created on grafana [1]
It will be improved during the swh-search tests

Dec 8 2020, 10:49 AM · System administrators, Staging environment, Journal, Archive search

Dec 7 2020

ardumont closed T2821: indexer: Improve tests as Resolved.
Dec 7 2020, 8:54 PM · Journal, Indexer
vsellier added a comment to T2817: Enable the swh-search environment in staging.

Interesting note about how to size the shards of an index : https://www.elastic.co/guide/en/elasticsearch/reference/7.x//size-your-shards.html

Dec 7 2020, 6:15 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4675: base-buster: Pin elasticsearch to 7.9.3.
Dec 7 2020, 1:29 PM · Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4671: cli: Subscribe journal client to origin_visit_status.
Dec 7 2020, 8:54 AM · Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4670: cli: Allow topic prefix declaration through cli or configuration.
Dec 7 2020, 8:53 AM · Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4669: cli: Allow object-type declaration through cli or configuration.
Dec 7 2020, 8:52 AM · Journal, Archive search

Dec 4 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4668: Add swh-search-journal-client to swh_search_with_journal_client role.
Dec 4 2020, 7:27 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4666: staging: Deploy swh-search rpc backend on search0.
Dec 4 2020, 4:54 PM · System administrators, Staging environment, Journal, Archive search
douardda closed T2834: Use msgpack extension types instead of custom swh encoders/decoders as Resolved.

done for the journal. We might want to do the same in the RPC low level stack (swh.core.api), some day (or replace this later by gRPC or so :-) )

Dec 4 2020, 3:12 PM · Journal
vlorentz added a revision to T2590: Finish the indexer -> swh-search pipeline: D4661: search.cli: Subscribe journal client to origin_intrinsic_metadata topic.
Dec 4 2020, 1:41 PM · Journal, Archive search
ardumont added a comment to T2817: Enable the swh-search environment in staging.

We added a volume of 100Gib to the search-esnode0 through terraform (D4663).
So we could mount the /srv/elasticsearch as zfs volume.

Dec 4 2020, 12:44 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4664: search0: Add swh-search rpc backend node.
Dec 4 2020, 12:11 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4663: search-esnode0: Add a 100Gib storage disk.
Dec 4 2020, 12:04 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

dedicated ES node for staging deployed (search-esnode0.internal.staging.swh.network) with D4658 and D4651

Dec 4 2020, 11:46 AM · System administrators, Staging environment, Journal, Archive search
vsellier updated the task description for T2817: Enable the swh-search environment in staging.
Dec 4 2020, 11:44 AM · System administrators, Staging environment, Journal, Archive search

Dec 3 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4658: staging: Add search-esnode0.
Dec 3 2020, 5:59 PM · System administrators, Staging environment, Journal, Archive search
zack added a project to T2834: Use msgpack extension types instead of custom swh encoders/decoders: Journal.
Dec 3 2020, 1:30 PM · Journal
vsellier added a revision to T2817: Enable the swh-search environment in staging: D4654: -wip- Switch to the official elasticsearch plugin.
Dec 3 2020, 12:21 PM · System administrators, Staging environment, Journal, Archive search

Dec 2 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4651: Puppetize elasticsearch nodes.
Dec 2 2020, 4:53 PM · System administrators, Staging environment, Journal, Archive search
ardumont claimed T2821: indexer: Improve tests.
Dec 2 2020, 11:22 AM · Journal, Indexer
ardumont added a revision to T2821: indexer: Improve tests: D4641: test_journal_client: Send production objects to journal for testing.
Dec 2 2020, 9:24 AM · Journal, Indexer

Dec 1 2020

ardumont added a revision to T2821: indexer: Improve tests: D4640: test_journal_client: Migrate away from mocks.
Dec 1 2020, 6:01 PM · Journal, Indexer
ardumont added a revision to T2821: indexer: Improve tests: D4638: tests: Use production backends within the indexer tests.
Dec 1 2020, 3:45 PM · Journal, Indexer
ardumont renamed T2821: indexer: Improve tests from indexer.journal.client: Improve tests to indexer: Improve tests.
Dec 1 2020, 3:44 PM · Journal, Indexer

Nov 30 2020

ardumont closed T2814: Fix swh indexer journal client service as Resolved.

Deployed the following:

Nov 30 2020, 3:08 PM · Journal, Indexer
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 30 2020, 3:07 PM · Journal, Indexer
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 30 2020, 3:07 PM · Journal, Indexer
ardumont added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 30 2020, 10:47 AM · Journal, Indexer

Nov 27 2020

vsellier closed T2816: Enable the journal-writer for the swh-idx-storage in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Nov 27 2020, 6:20 PM · Journal, Archive search
vsellier closed T2816: Enable the journal-writer for the swh-idx-storage in staging as Resolved.

The swh-indexer stack is deployed on staging and the initial loading is done.
The volumes are quite low :

Nov 27 2020, 6:20 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a revision to T2816: Enable the journal-writer for the swh-idx-storage in staging: D4625: staging: Fix object storage configuration for indexers.
Nov 27 2020, 3:20 PM · System administrators, Staging environment, Journal, Archive search
vlorentz triaged T2823: Write tests for swh/journal/writer/inmemory.py as Low priority.
Nov 27 2020, 1:50 PM · Easy hack, Journal
ardumont updated the task description for T2821: indexer: Improve tests.
Nov 27 2020, 1:21 PM · Journal, Indexer
ardumont triaged T2821: indexer: Improve tests as Normal priority.
Nov 27 2020, 1:19 PM · Journal, Indexer
olasd added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 27 2020, 11:54 AM · Journal, Indexer
olasd added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

I propose meeting in the middle and having the following policies:

Nov 27 2020, 11:49 AM · System administration, Journal
ardumont added a comment to T2814: Fix swh indexer journal client service.

and indexer 0.6.1 is now packaged. We have everything we need to unstuck it now.

Nov 27 2020, 11:34 AM · Journal, Indexer
vlorentz added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.
In T2780#53415, @olasd wrote:

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Nov 27 2020, 11:01 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Where is the list of topics that need to be created?

Nov 27 2020, 10:50 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

It's unclear what the prefix should be. swh.storage uses swh.journal.objects, we can either use that one too, or a new one, eg. swh.journal.indexed

I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.

Nov 27 2020, 10:48 AM · System administration, Journal
vsellier added a revision to T2816: Enable the journal-writer for the swh-idx-storage in staging: D4620: staging: configure idx-storage to write to kafka.
Nov 27 2020, 10:43 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2590: Finish the indexer -> swh-search pipeline.

this a description of the pipeline to clarify the interaction between the components (source: P883) :

Nov 27 2020, 10:14 AM · Journal, Archive search

Nov 26 2020

vsellier changed the status of T2817: Enable the swh-search environment in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Nov 26 2020, 5:59 PM · Journal, Archive search
vsellier renamed T2817: Enable the swh-search environment in staging from Enable the swh-search in staging to Enable the swh-search environment in staging.
Nov 26 2020, 5:59 PM · System administrators, Staging environment, Journal, Archive search
vsellier triaged T2817: Enable the swh-search environment in staging as Normal priority.
Nov 26 2020, 5:58 PM · System administrators, Staging environment, Journal, Archive search
olasd added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Nov 26 2020, 5:53 PM · System administration, Journal
vsellier added a comment to T2816: Enable the journal-writer for the swh-idx-storage in staging.

T2814 needs to be released before

Nov 26 2020, 5:46 PM · System administrators, Staging environment, Journal, Archive search
vsellier triaged T2816: Enable the journal-writer for the swh-idx-storage in staging as Normal priority.
Nov 26 2020, 5:40 PM · System administrators, Staging environment, Journal, Archive search
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 26 2020, 3:20 PM · Journal, Indexer
ardumont added a revision to T2814: Fix swh indexer journal client service: D4605: indexer.journal_client: Subscribe to OriginVisitStatus topic.
Nov 26 2020, 3:18 PM · Journal, Indexer
vsellier added a revision to T2814: Fix swh indexer journal client service: D4599: swh.indexer.cli.journal_client: fix config use.
Nov 26 2020, 12:22 PM · Journal, Indexer
ardumont triaged T2814: Fix swh indexer journal client service as Normal priority.
Nov 26 2020, 12:21 PM · Journal, Indexer

Nov 16 2020

vlorentz triaged T2780: Enable the journal-writer for the swh-idx-storage in production as Normal priority.
Nov 16 2020, 1:31 PM · System administration, Journal
vlorentz closed T2651: Make the indexer-storage publish its rows to Kafka, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Nov 16 2020, 1:27 PM · Journal, Archive search

Oct 12 2020

vlorentz closed T2672: Inconsistent keys between visits and origin visits in the journal as Wontfix.

Closing in favor of T2686

Oct 12 2020, 1:05 PM · Journal

Oct 8 2020

vlorentz added a comment to T1279: swh-journal: The schema migration problem.

I like the Final version attribute idea. But to be clear, this means we will need to add successive versions as extra classes in swh.model.model when we change the schema (which is not necessarily a bad thing), and remove them when we are sure they are not around anymore

Oct 8 2020, 12:46 PM · Journal
vlorentz renamed T1279: swh-journal: The schema migration problem from swh-journal: The migration problem to swh-journal: The schema migration problem.
Oct 8 2020, 12:43 PM · Journal