The problem was solved with an increase of the default timeout in the cassandra configuration and by reducing the journal_client batch size.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 6 2021
A replay was run during during 13 hours the previous night with the current default consistency=ONE. It can be used as the reference for the next test with the LOCAL_QUORUM consistency.
After trying several options to render the result, the simpler was to export the content of a spreadsheet in a hedgedoc document [1] .
The data are stored in a prometheus instance on a proxmox vms so it will always possible to improve the statistics later[2].
Jul 5 2021
Good catch. A variable renaming was missing.
one result:
{ "_index" : "origin-production", "_type" : "_doc", "_id" : "4a7ff7d5b3827d34f81c7112835928bfd2e701a1", "_score" : 1.0, "_source" : { "intrinsic_metadata" : [ { "http://schema.org/dateModified" : [ { "@value" : "2018-01-29" } ], "http://schema.org/datePublished" : [ { "@value" : "2018-01-29" } ], "http://schema.org/dateCreated" : [ { "@value" : "2018-01-29" } ] } ], "url" : "https://github.com/XinlongSBU/Pynucastro_weakrates" } },
Sure, there is actually only a read-write access, but I will reopen it.
Jul 2 2021
content copied from
Jul 1 2021
A new run was launched with 4x10 replayers per object type.
A limit is still reached for the revision replayers which ends with read timeout.
A test with only the revision replayers shows the problems start to append when the number of // replayers is greater than ~20
One example of the wrong behavior of the read consistencty level ONE for read requests:
One of the probe of the monitoring query is based on the objectcount table content.
The servers were hard stopped after the end of the grid5000 resrvation and it seems some replication messages were lost. After the restart the content of the table in not in sync on all the servers.
Jun 30 2021
A first run was launched with 4x20 replayered per object type.
It seems it's too much for 3 cassandra nodes. The default 1s timeout for cassandra reads is often reached.
It's seems the batch size of 1000 is also too much for some object types like snapshots
A new test will be launched this night with some changes:
- reduce the number of replayers processes to 4x10 per object type
- reduce the journal client batch size to 500
Jun 29 2021
Regarding the recurrent disconnection of the azure VPN, it seems the only difference is on the reauth=no activated on louvre but not on opnsense.
The option was activated on opnsense too, the connection should not killed anymore on a key renegociation (if I have well undertood ;))
rebase
{ "_index": "origin-v0.9.0", "_type": "_doc", "_id": "6925508eeef1c21e32f41040849c5851ac02920e", "_version": 6, "_seq_no": 16879311, "_primary_term": 1, "found": true, "_source": { "intrinsic_metadata": [ { "http://schema.org/author": [ { "@list": [ { "http://schema.org/name": [ { "@value": "Evgeniy Malyarov" } ], "@type": [ "http://schema.org/Person" ], "http://schema.org/url": [ { "@id": "http://www.oknosoft.ru" } ], "http://schema.org/email": [ { "@value": "info@oknosoft.ru" } ] } ] } ], "http://schema.org/description": [ { "@value": "Library for building offline-first browser-based business applications" } ], "http://schema.org/name": [ { "@value": "metadata-js" } ], "https://codemeta.github.io/terms/issueTracker": [ { "@id": "https://github.com/oknosoft/metadata.js/issues" } ], "http://schema.org/license": [ { "@id": "https://spdx.org/licenses/MIT" } ], "http://schema.org/codeRepository": [ { "@id": "git+https://github.com/oknosoft/metadata.js.git" } ], "@type": [ "http://schema.org/SoftwareSourceCode" ], "http://schema.org/version": [ { "@value": "0.11.223" } ], "http://schema.org/keywords": [ { "@value": "metadata" }, { "@value": "browser data engine" }, { "@value": "spa offline" }, { "@value": "rest" }, { "@value": "odata" }, { "@value": "1c" }, { "@value": "1с" }, { "@value": "web сервис" }, { "@value": "клиент 1с" }, { "@value": "ui framework" }, { "@value": "offline framework" }, { "@value": "offline data engine" }, { "@value": "rest client" }, { "@value": "CRDT" }, { "@value": "offline-first" }, { "@value": "replication" } ], "http://schema.org/url": [ { "@id": "http://www.oknosoft.ru/metadata/" } ] } ], "sha1": "6925508eeef1c21e32f41040849c5851ac02920e", "url": "https://github.com/SMAlik93/metadata.js", "visit_types": [ "git" ], "has_visits": true, "last_visit_date": "2021-01-13T13:12:40.822142Z", "nb_visits": 2 } }
{ "_index": "origin-v0.9.0", "_type": "_doc", "_id": "014cea90c907f8c4af2d8e88d9c0b328388f766f", "_version": 17, "_seq_no": 19142389, "_primary_term": 1, "found": true, "_source": { "sha1": "014cea90c907f8c4af2d8e88d9c0b328388f766f", "url": "https://github.com/mvolz/html-metadata", "intrinsic_metadata": [ { "http://schema.org/author": [ { "@list": [ { "http://schema.org/name": [ { "@value": "Marielle Volz" } ], "@type": [ "http://schema.org/Person" ], "http://schema.org/email": [ { "@value": "marielle.volz@gmail.com" } ] } ] } ], "http://schema.org/description": [ { "@value": "Scrapes metadata of several different standards" } ], "http://schema.org/name": [ { "@value": "html-metadata" } ], "https://codemeta.github.io/terms/issueTracker": [ { "@id": "https://github.com/wikimedia/html-metadata/issues" } ], "http://schema.org/license": [ { "@id": "https://spdx.org/licenses/MIT" } ], "http://schema.org/codeRepository": [ { "@id": "git+https://github.com/wikimedia/html-metadata.git" } ], "@type": [ "http://schema.org/SoftwareSourceCode" ], "http://schema.org/version": [ { "@value": "1.7.0" } ], "http://schema.org/keywords": [ { "@value": "bepress" }, { "@value": "coins" }, { "@value": "dublin core" }, { "@value": "eprints" }, { "@value": "highwire press" }, { "@value": "json-ld" }, { "@value": "open graph" }, { "@value": "metadata" }, { "@value": "microdata" }, { "@value": "prism" }, { "@value": "twitter cards" }, { "@value": "web scraper" } ], "http://schema.org/url": [ { "@id": "https://github.com/wikimedia/html-metadata" } ] } ], "visit_types": [ "git" ], "nb_visits": 8, "has_visits": true, "last_visit_date": "2020-04-07T05:42:09.217233+00:00" } }
{ "_index": "origin-v0.9.0", "_type": "_doc", "_id": "cca7bef1de938d94d11d0a8b67b4b2e3e9791a70", "_version": 1331, "_seq_no": 25110599, "_primary_term": 1, "found": true, "_source": { "intrinsic_metadata": [ { "http://schema.org/identifier": [ { "@id": "com.blazegraph" } ], "http://schema.org/description": [ { "@value": "Blazegraph™ DB is our ultra high-performance graph database supporting Blueprints and RDF/SPARQL APIs. It supports up to 50 Billion edges on a single machine and has a High Availability and Scale-out architecture. It is in production use for customers such as EMC, Syapse, Wikidata Query Service, the British Museum, and many others. GPU acceleration and High Availability (HA) are available in the Enterprise edition. It contains war, jar, deb, rpm, and tar.gz deployment artifacts." } ], "http://schema.org/name": [ { "@value": "Blazegraph Database Platform" } ], "http://schema.org/license": [ { "@id": "http://www.gnu.org/licenses/gpl-2.0.html" } ], "http://schema.org/codeRepository": [ { "@id": "https://repo.maven.apache.org/maven2/com/blazegraph/blazegraph-parent" } ], "@type": [ "http://schema.org/SoftwareSourceCode" ], "http://schema.org/version": [ { "@value": "2.1.6-wmf.2-SNAPSHOT" } ] } ], "sha1": "cca7bef1de938d94d11d0a8b67b4b2e3e9791a70", "url": "https://phabricator.wikimedia.org/diffusion/WDQB/wikidata-query-blazegraph.git", "visit_types": [ "git" ], "has_visits": true, "last_visit_date": "2020-10-02T11:36:24.415385Z", "nb_visits": 665 } }
{ "_index": "origin-v0.9.0", "_type": "_doc", "_id": "94247b453f4290fb234e8f05be470673c044a7d1", "_version": 1940, "_seq_no": 25440556, "_primary_term": 1, "found": true, "_source": { "intrinsic_metadata": [ { "http://schema.org/author": [ { "@list": [ { "http://schema.org/name": [ { "@value": "FaBo" } ], "@type": [ "http://schema.org/Person" ], "http://schema.org/email": [ { "@value": "info@fabo.io" } ] } ] } ], "http://schema.org/description": [ { "@value": "FaBoTemperature-ADT7410-Python\n==============================\n\nHow to install.\n---------------\n\n::\n\n pip install FaBoTemperature_ADT7410\n\nFaBo Temperature I2C Brick\n--------------------------\n\n `#207 Temperature I2C Brick <http://fabo.io/207.html>`__\n\nADT7410\n-------\n\n ADT7410 is 16-Bit Digital I2C Temperature Sensor.\n\nADT7410 Datasheet\n~~~~~~~~~~~~~~~~~\n\n `ADT7410\nDatasheet <http://www.analog.com/media/en/technical-documentation/data-sheets/ADT7410.pdf>`__\n\nReleases\n--------\n\n- 1.0.0 Initial release.\n" }, { "@value": "This is a library for the FaBo Temperature I2C Brick." } ], "http://schema.org/name": [ { "@value": "FaBoTemperature_ADT7410" } ], "http://schema.org/license": [ { "@id": "Apache License 2.0" } ], "@type": [ "http://schema.org/SoftwareSourceCode" ], "http://schema.org/version": [ { "@value": "1.0.0" } ], "http://schema.org/url": [ { "@id": "https://github.com/FaBoPlatform/FaBoTemperature-ADT7410-Python" } ] } ], "sha1": "94247b453f4290fb234e8f05be470673c044a7d1", "url": "https://pypi.org/project/FaBoTemperature_ADT7410/", "visit_types": [ "pypi" ], "has_visits": true, "last_visit_date": "2021-06-29T11:23:44.141572+00:00", "nb_visits": 969 } }
These are the first tests that will be executed:
- baseline configuration : 3 cassandra nodes (parasilo[1]), commitlog on a dedicated SSD, data on 4 HDD. Goal: testing minimal configuration (a night or a complete weekend if possible)
- baseline configuration + 1 cassandra nodes. Goal: Testing the performance impact of having 1 more server (duration enough to have tendencies)
- baseline configuration + 3 cassandra nodes: Goal: Testing the performance impact of having the cluster size x 2 (duration enough to have tendencies)
- baseline configuration but with the commitlog on the data partition. Goal check the impact of data/commitlog mutualization (duration enough to have tendencies)
- baseline configuration but with 2 HDD. Goal check the impact of the number of disks + have a reference for the next run (a night)
- baseline configuration but with 2 HDD + commitlog on a dedicated HDD. Goal check the impact of having the commitlog on a slower disk (duration enough to have tendencies)
- baseline configuration but with 2x the default heap allocated to cassandra. Goal check the impact of the memory configuration ((!) check the gc profile)
Thanks, it was tested this night on grid5000, all the origins were correctly replayed without issues.
Jun 28 2021
In T3396#66905, @vlorentz wrote:only one confirmation of the write is needed
It's not perfect though. If the server that confirmed the writes breaks before it replicates the write, then the write is lost.
IMO, we should first try to have a global configuration for all the read/write queries, and improve that later if needed for performance or if it creates some problems. At worst, it will be possible to use the default ONE values by configuration.
released in swh-storage:v0.32.0
LGTM, thank you for fixing that.
The lag on the topics has recovered.
The configuration update of moma will be followed in T3373
The lag has recovered, the search on webapp1[1] is now fully up-to-date and can be tested before changing the configuration on the main webapp.
Jun 25 2021
improve commit message
Jun 23 2021
The metadata indexation is finished, https://webapp1.internal.softwareheritage.org can now search on them via elasticsearch without any issue.
Let's now wait for the lag on origin* topics to recover: https://grafana.softwareheritage.org/goto/iOvBK6gnk?orgId=1
Also update the index name. It's not necessary as only the aliases are used, for the searchbut it's cleaner