The search rpc backend and the journal client listening on origin and origin_visit topics are deployed.
The inventory is up to date for both hosts [1][2]
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Dec 9 2020
Dec 8 2020
A dashboard to monitor the ES cluster behavior has been created on grafana [1]
It will be improved during the swh-search tests
Dec 7 2020
Interesting note about how to size the shards of an index : https://www.elastic.co/guide/en/elasticsearch/reference/7.x//size-your-shards.html
Dec 4 2020
done for the journal. We might want to do the same in the RPC low level stack (swh.core.api), some day (or replace this later by gRPC or so :-) )
We added a volume of 100Gib to the search-esnode0 through terraform (D4663).
So we could mount the /srv/elasticsearch as zfs volume.
Dec 3 2020
Dec 2 2020
Dec 1 2020
Nov 30 2020
Deployed the following:
It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.
Nov 27 2020
The swh-indexer stack is deployed on staging and the initial loading is done.
The volumes are quite low :
It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.
I propose meeting in the middle and having the following policies:
and indexer 0.6.1 is now packaged. We have everything we need to unstuck it now.
In T2780#53415, @olasd wrote:Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?
Where is the list of topics that need to be created?
It's unclear what the prefix should be. swh.storage uses swh.journal.objects, we can either use that one too, or a new one, eg. swh.journal.indexed
I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.
this a description of the pipeline to clarify the interaction between the components (source: P883) :
Nov 26 2020
Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?
T2814 needs to be released before
Nov 16 2020
Oct 12 2020
Closing in favor of T2686
Oct 8 2020
I like the Final version attribute idea. But to be clear, this means we will need to add successive versions as extra classes in swh.model.model when we change the schema (which is not necessarily a bad thing), and remove them when we are sure they are not around anymore
Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).
Sep 22 2020
Thanks to @seirl using the full journal to do a graph export (and therefore having the time to check whether all objects were there), we've found a bunch of bugs in the journal backfiller / configuration preventing large objects to be added.
After resetting a local consumer to these offsets, I was completely unable to reproduce this issue.
(the backfill had, in fact, completed within a month)
At this point, I don't think we'll make it much better with postgres as source.
Sep 14 2020
Aug 27 2020
Aug 26 2020
Jul 31 2020
Jul 30 2020
Jul 29 2020
Jul 20 2020
Jul 1 2020
Jun 29 2020
Jun 17 2020
I suspect this hasn't happened recently
Jun 9 2020
Build is green
- Rework commit message
- Reuse same string format to reduce diff stat change
Build is green
Use pprint_key function as suggested in irc ;)
Build is green