Feed Advanced Search

Advanced Search
Use Results
Edit Query
Hide Query

	Include stories about projects I am a member of.

Dec 9 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4709: indexer_storage: Publish indexer computation to journal topics.

Dec 9 2020, 10:09 PM · System administrators, Staging environment, Journal, Archive search

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4704: docker-compose.search.yml: Add journal client for indexed values.

Dec 9 2020, 6:19 PM · System administrators, Staging environment, Journal, Archive search

vsellier added a revision to T2817: Enable the swh-search environment in staging: D4701: Allow configuration through cli or config file.

Dec 9 2020, 5:57 PM · System administrators, Staging environment, Journal, Archive search

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4699: search: Deploy multiple search journal client instances.

Dec 9 2020, 5:20 PM · System administrators, Staging environment, Journal, Archive search

ardumont updated the task description for T2817: Enable the swh-search environment in staging.

Dec 9 2020, 11:39 AM · System administrators, Staging environment, Journal, Archive search

vsellier added a comment to T2817: Enable the swh-search environment in staging.

The search rpc backend and the journal client listening on origin and origin_visit topics are deployed.
The inventory is up to date for both hosts [1][2]

Dec 9 2020, 9:51 AM · System administrators, Staging environment, Journal, Archive search

vsellier updated the task description for T2817: Enable the swh-search environment in staging.

Dec 9 2020, 9:35 AM · System administrators, Staging environment, Journal, Archive search

Dec 8 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4687: search: Add initialization step on install or upgrade.

Dec 8 2020, 4:06 PM · System administrators, Staging environment, Journal, Archive search

vsellier added a comment to T2817: Enable the swh-search environment in staging.

A dashboard to monitor the ES cluster behavior has been created on grafana [1]
It will be improved during the swh-search tests

Dec 8 2020, 10:49 AM · System administrators, Staging environment, Journal, Archive search

Dec 7 2020

ardumont closed T2821: indexer: Improve tests as Resolved.

Dec 7 2020, 8:54 PM · Journal, Indexer

vsellier added a comment to T2817: Enable the swh-search environment in staging.

Interesting note about how to size the shards of an index : https://www.elastic.co/guide/en/elasticsearch/reference/7.x//size-your-shards.html

Dec 7 2020, 6:15 PM · System administrators, Staging environment, Journal, Archive search

ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4675: base-buster: Pin elasticsearch to 7.9.3.

Dec 7 2020, 1:29 PM · Journal, Archive search

ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4671: cli: Subscribe journal client to origin_visit_status.

Dec 7 2020, 8:54 AM · Journal, Archive search

ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4670: cli: Allow topic prefix declaration through cli or configuration.

Dec 7 2020, 8:53 AM · Journal, Archive search

ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4669: cli: Allow object-type declaration through cli or configuration.

Dec 7 2020, 8:52 AM · Journal, Archive search

Dec 4 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4668: Add swh-search-journal-client to swh_search_with_journal_client role.

Dec 4 2020, 7:27 PM · System administrators, Staging environment, Journal, Archive search

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4666: staging: Deploy swh-search rpc backend on search0.

Dec 4 2020, 4:54 PM · System administrators, Staging environment, Journal, Archive search

douardda closed T2834: Use msgpack extension types instead of custom swh encoders/decoders as Resolved.

done for the journal. We might want to do the same in the RPC low level stack (swh.core.api), some day (or replace this later by gRPC or so :-) )

Dec 4 2020, 3:12 PM · Journal

vlorentz added a revision to T2590: Finish the indexer -> swh-search pipeline: D4661: search.cli: Subscribe journal client to origin_intrinsic_metadata topic.

Dec 4 2020, 1:41 PM · Journal, Archive search

ardumont added a comment to T2817: Enable the swh-search environment in staging.

We added a volume of 100Gib to the search-esnode0 through terraform (D4663).
So we could mount the /srv/elasticsearch as zfs volume.

Dec 4 2020, 12:44 PM · System administrators, Staging environment, Journal, Archive search

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4664: search0: Add swh-search rpc backend node.

Dec 4 2020, 12:11 PM · System administrators, Staging environment, Journal, Archive search

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4663: search-esnode0: Add a 100Gib storage disk.

Dec 4 2020, 12:04 PM · System administrators, Staging environment, Journal, Archive search

vsellier added a comment to T2817: Enable the swh-search environment in staging.

dedicated ES node for staging deployed (search-esnode0.internal.staging.swh.network) with D4658 and D4651

Dec 4 2020, 11:46 AM · System administrators, Staging environment, Journal, Archive search

vsellier updated the task description for T2817: Enable the swh-search environment in staging.

Dec 4 2020, 11:44 AM · System administrators, Staging environment, Journal, Archive search

Dec 3 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4658: staging: Add search-esnode0.

Dec 3 2020, 5:59 PM · System administrators, Staging environment, Journal, Archive search

zack added a project to T2834: Use msgpack extension types instead of custom swh encoders/decoders: Journal.

Dec 3 2020, 1:30 PM · Journal

vsellier added a revision to T2817: Enable the swh-search environment in staging: D4654: -wip- Switch to the official elasticsearch plugin.

Dec 3 2020, 12:21 PM · System administrators, Staging environment, Journal, Archive search

Dec 2 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4651: Puppetize elasticsearch nodes.

Dec 2 2020, 4:53 PM · System administrators, Staging environment, Journal, Archive search

ardumont claimed T2821: indexer: Improve tests.

Dec 2 2020, 11:22 AM · Journal, Indexer

ardumont added a revision to T2821: indexer: Improve tests: D4641: test_journal_client: Send production objects to journal for testing.

Dec 2 2020, 9:24 AM · Journal, Indexer

Dec 1 2020

ardumont added a revision to T2821: indexer: Improve tests: D4640: test_journal_client: Migrate away from mocks.

Dec 1 2020, 6:01 PM · Journal, Indexer

ardumont added a revision to T2821: indexer: Improve tests: D4638: tests: Use production backends within the indexer tests.

Dec 1 2020, 3:45 PM · Journal, Indexer

ardumont renamed T2821: indexer: Improve tests from indexer.journal.client: Improve tests to indexer: Improve tests.

Dec 1 2020, 3:44 PM · Journal, Indexer

Nov 30 2020

ardumont closed T2814: Fix swh indexer journal client service as Resolved.

Deployed the following:

Nov 30 2020, 3:08 PM · Journal, Indexer

ardumont updated the task description for T2814: Fix swh indexer journal client service.

Nov 30 2020, 3:07 PM · Journal, Indexer

ardumont updated the task description for T2814: Fix swh indexer journal client service.

Nov 30 2020, 3:07 PM · Journal, Indexer

ardumont added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 30 2020, 10:47 AM · Journal, Indexer

Nov 27 2020

vsellier closed T2816: Enable the journal-writer for the swh-idx-storage in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.

Nov 27 2020, 6:20 PM · Journal, Archive search

vsellier closed T2816: Enable the journal-writer for the swh-idx-storage in staging as Resolved.

The swh-indexer stack is deployed on staging and the initial loading is done.
The volumes are quite low :

Nov 27 2020, 6:20 PM · System administrators, Staging environment, Journal, Archive search

vsellier added a revision to T2816: Enable the journal-writer for the swh-idx-storage in staging: D4625: staging: Fix object storage configuration for indexers.

Nov 27 2020, 3:20 PM · System administrators, Staging environment, Journal, Archive search

vlorentz triaged T2823: Write tests for swh/journal/writer/inmemory.py as Low priority.

Nov 27 2020, 1:50 PM · Easy hack, Journal

ardumont updated the task description for T2821: indexer: Improve tests.

Nov 27 2020, 1:21 PM · Journal, Indexer

ardumont triaged T2821: indexer: Improve tests as Normal priority.

Nov 27 2020, 1:19 PM · Journal, Indexer

olasd added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 27 2020, 11:54 AM · Journal, Indexer

olasd added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

I propose meeting in the middle and having the following policies:

Nov 27 2020, 11:49 AM · System administration, Journal

ardumont added a comment to T2814: Fix swh indexer journal client service.

and indexer 0.6.1 is now packaged. We have everything we need to unstuck it now.

Nov 27 2020, 11:34 AM · Journal, Indexer

vlorentz added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

In T2780#53415, @olasd wrote:

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Nov 27 2020, 11:01 AM · System administration, Journal

ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Where is the list of topics that need to be created?

Nov 27 2020, 10:50 AM · System administration, Journal

ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

It's unclear what the prefix should be. swh.storage uses swh.journal.objects, we can either use that one too, or a new one, eg. swh.journal.indexed

I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.

Nov 27 2020, 10:48 AM · System administration, Journal

vsellier added a revision to T2816: Enable the journal-writer for the swh-idx-storage in staging: D4620: staging: configure idx-storage to write to kafka.

Nov 27 2020, 10:43 AM · System administrators, Staging environment, Journal, Archive search

vsellier added a comment to T2590: Finish the indexer -> swh-search pipeline.

this a description of the pipeline to clarify the interaction between the components (source: P883) :

Nov 27 2020, 10:14 AM · Journal, Archive search

Nov 26 2020

vsellier changed the status of T2817: Enable the swh-search environment in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.

Nov 26 2020, 5:59 PM · Journal, Archive search

vsellier renamed T2817: Enable the swh-search environment in staging from Enable the swh-search in staging to Enable the swh-search environment in staging.

Nov 26 2020, 5:59 PM · System administrators, Staging environment, Journal, Archive search

vsellier triaged T2817: Enable the swh-search environment in staging as Normal priority.

Nov 26 2020, 5:58 PM · System administrators, Staging environment, Journal, Archive search

olasd added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Nov 26 2020, 5:53 PM · System administration, Journal

vsellier added a comment to T2816: Enable the journal-writer for the swh-idx-storage in staging.

T2814 needs to be released before

Nov 26 2020, 5:46 PM · System administrators, Staging environment, Journal, Archive search

vsellier triaged T2816: Enable the journal-writer for the swh-idx-storage in staging as Normal priority.

Nov 26 2020, 5:40 PM · System administrators, Staging environment, Journal, Archive search

ardumont updated the task description for T2814: Fix swh indexer journal client service.

Nov 26 2020, 3:20 PM · Journal, Indexer

ardumont added a revision to T2814: Fix swh indexer journal client service: D4605: indexer.journal_client: Subscribe to OriginVisitStatus topic.

Nov 26 2020, 3:18 PM · Journal, Indexer

vsellier added a revision to T2814: Fix swh indexer journal client service: D4599: swh.indexer.cli.journal_client: fix config use.

Nov 26 2020, 12:22 PM · Journal, Indexer

ardumont triaged T2814: Fix swh indexer journal client service as Normal priority.

Nov 26 2020, 12:21 PM · Journal, Indexer

Nov 16 2020

vlorentz triaged T2780: Enable the journal-writer for the swh-idx-storage in production as Normal priority.

Nov 16 2020, 1:31 PM · System administration, Journal

vlorentz closed T2651: Make the indexer-storage publish its rows to Kafka, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.

Nov 16 2020, 1:27 PM · Journal, Archive search

Oct 12 2020

vlorentz closed T2672: Inconsistent keys between visits and origin visits in the journal as Wontfix.

Closing in favor of T2686

Oct 12 2020, 1:05 PM · Journal

Oct 8 2020

vlorentz added a comment to T1279: swh-journal: The schema migration problem.

I like the Final version attribute idea. But to be clear, this means we will need to add successive versions as extra classes in swh.model.model when we change the schema (which is not necessarily a bad thing), and remove them when we are sure they are not around anymore

Oct 8 2020, 12:46 PM · Journal

vlorentz renamed T1279: swh-journal: The schema migration problem from swh-journal: The migration problem to swh-journal: The schema migration problem.

Oct 8 2020, 12:43 PM · Journal

douardda added a comment to T1279: swh-journal: The schema migration problem.

Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).

Oct 8 2020, 12:41 PM · Journal

vlorentz added a revision to T2672: Inconsistent keys between visits and origin visits in the journal: D4194: model: use visit ids in the unique key, instead of their date..

Oct 8 2020, 11:26 AM · Journal

vlorentz triaged T2672: Inconsistent keys between visits and origin visits in the journal as High priority.

Oct 8 2020, 11:26 AM · Journal

Sep 22 2020

olasd closed T529: swh-journal: Create a journal checker comparing object lists between journal and database, a subtask of T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage, as Wontfix.

Sep 22 2020, 6:30 PM · Journal

olasd closed T529: swh-journal: Create a journal checker comparing object lists between journal and database as Wontfix.

Thanks to @seirl using the full journal to do a graph export (and therefore having the time to check whether all objects were there), we've found a bunch of bugs in the journal backfiller / configuration preventing large objects to be added.

Sep 22 2020, 6:30 PM · Journal

olasd added a comment to T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them.

After resetting a local consumer to these offsets, I was completely unable to reproduce this issue.

Sep 22 2020, 6:27 PM · System administration, Journal

olasd added a comment to T1828: Improve directory journal backfill performance.

(the backfill had, in fact, completed within a month)

Sep 22 2020, 6:14 PM · Mirror, Journal

olasd closed T1828: Improve directory journal backfill performance as Resolved.

At this point, I don't think we'll make it much better with postgres as source.

Sep 22 2020, 6:14 PM · Mirror, Journal

Sep 14 2020

vlorentz edited projects for T2590: Finish the indexer -> swh-search pipeline, added: Journal; removed Storage manager.

Sep 14 2020, 5:39 PM · Journal, Archive search

Aug 27 2020

olasd triaged T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them as High priority.

Aug 27 2020, 12:27 PM · System administration, Journal

Aug 26 2020

vlorentz moved T2514: Add raw_extrinsic_metadata to the journal backfiller from Backlog to Done on the Roadmap 2020 board.

Aug 26 2020, 5:00 PM · Journal, Storage manager, Roadmap 2020

Jul 31 2020

vlorentz closed T2514: Add raw_extrinsic_metadata to the journal backfiller as Resolved.

Jul 31 2020, 2:25 PM · Journal, Storage manager, Roadmap 2020

vlorentz added a revision to T2514: Add raw_extrinsic_metadata to the journal backfiller: D3659: Add support for metadata-related object types to the backfiller and replayer..

Jul 31 2020, 2:25 PM · Journal, Storage manager, Roadmap 2020

Jul 30 2020

zack renamed T2003: Content replayer may try to copy objects before they are available from an objstorage from Content replayer may try to copy objects before they are available in an objstorage to Content replayer may try to copy objects before they are available from an objstorage.

Jul 30 2020, 8:18 AM · Journal