Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 9 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4709: indexer_storage: Publish indexer computation to journal topics.
Dec 9 2020, 10:09 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4704: docker-compose.search.yml: Add journal client for indexed values.
Dec 9 2020, 6:19 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a revision to T2817: Enable the swh-search environment in staging: D4701: Allow configuration through cli or config file.
Dec 9 2020, 5:57 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4699: search: Deploy multiple search journal client instances.
Dec 9 2020, 5:20 PM · System administrators, Staging environment, Journal, Archive search
ardumont updated the task description for T2817: Enable the swh-search environment in staging.
Dec 9 2020, 11:39 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

The search rpc backend and the journal client listening on origin and origin_visit topics are deployed.
The inventory is up to date for both hosts [1][2]

Dec 9 2020, 9:51 AM · System administrators, Staging environment, Journal, Archive search
vsellier updated the task description for T2817: Enable the swh-search environment in staging.
Dec 9 2020, 9:35 AM · System administrators, Staging environment, Journal, Archive search

Dec 8 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4687: search: Add initialization step on install or upgrade.
Dec 8 2020, 4:06 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

A dashboard to monitor the ES cluster behavior has been created on grafana [1]
It will be improved during the swh-search tests

Dec 8 2020, 10:49 AM · System administrators, Staging environment, Journal, Archive search

Dec 7 2020

ardumont closed T2821: indexer: Improve tests as Resolved.
Dec 7 2020, 8:54 PM · Journal, Indexer
vsellier added a comment to T2817: Enable the swh-search environment in staging.

Interesting note about how to size the shards of an index : https://www.elastic.co/guide/en/elasticsearch/reference/7.x//size-your-shards.html

Dec 7 2020, 6:15 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4675: base-buster: Pin elasticsearch to 7.9.3.
Dec 7 2020, 1:29 PM · Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4671: cli: Subscribe journal client to origin_visit_status.
Dec 7 2020, 8:54 AM · Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4670: cli: Allow topic prefix declaration through cli or configuration.
Dec 7 2020, 8:53 AM · Journal, Archive search
ardumont added a revision to T2590: Finish the indexer -> swh-search pipeline: D4669: cli: Allow object-type declaration through cli or configuration.
Dec 7 2020, 8:52 AM · Journal, Archive search

Dec 4 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4668: Add swh-search-journal-client to swh_search_with_journal_client role.
Dec 4 2020, 7:27 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4666: staging: Deploy swh-search rpc backend on search0.
Dec 4 2020, 4:54 PM · System administrators, Staging environment, Journal, Archive search
douardda closed T2834: Use msgpack extension types instead of custom swh encoders/decoders as Resolved.

done for the journal. We might want to do the same in the RPC low level stack (swh.core.api), some day (or replace this later by gRPC or so :-) )

Dec 4 2020, 3:12 PM · Journal
vlorentz added a revision to T2590: Finish the indexer -> swh-search pipeline: D4661: search.cli: Subscribe journal client to origin_intrinsic_metadata topic.
Dec 4 2020, 1:41 PM · Journal, Archive search
ardumont added a comment to T2817: Enable the swh-search environment in staging.

We added a volume of 100Gib to the search-esnode0 through terraform (D4663).
So we could mount the /srv/elasticsearch as zfs volume.

Dec 4 2020, 12:44 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4664: search0: Add swh-search rpc backend node.
Dec 4 2020, 12:11 PM · System administrators, Staging environment, Journal, Archive search
ardumont added a revision to T2817: Enable the swh-search environment in staging: D4663: search-esnode0: Add a 100Gib storage disk.
Dec 4 2020, 12:04 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2817: Enable the swh-search environment in staging.

dedicated ES node for staging deployed (search-esnode0.internal.staging.swh.network) with D4658 and D4651

Dec 4 2020, 11:46 AM · System administrators, Staging environment, Journal, Archive search
vsellier updated the task description for T2817: Enable the swh-search environment in staging.
Dec 4 2020, 11:44 AM · System administrators, Staging environment, Journal, Archive search

Dec 3 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4658: staging: Add search-esnode0.
Dec 3 2020, 5:59 PM · System administrators, Staging environment, Journal, Archive search
zack added a project to T2834: Use msgpack extension types instead of custom swh encoders/decoders: Journal.
Dec 3 2020, 1:30 PM · Journal
vsellier added a revision to T2817: Enable the swh-search environment in staging: D4654: -wip- Switch to the official elasticsearch plugin.
Dec 3 2020, 12:21 PM · System administrators, Staging environment, Journal, Archive search

Dec 2 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4651: Puppetize elasticsearch nodes.
Dec 2 2020, 4:53 PM · System administrators, Staging environment, Journal, Archive search
ardumont claimed T2821: indexer: Improve tests.
Dec 2 2020, 11:22 AM · Journal, Indexer
ardumont added a revision to T2821: indexer: Improve tests: D4641: test_journal_client: Send production objects to journal for testing.
Dec 2 2020, 9:24 AM · Journal, Indexer

Dec 1 2020

ardumont added a revision to T2821: indexer: Improve tests: D4640: test_journal_client: Migrate away from mocks.
Dec 1 2020, 6:01 PM · Journal, Indexer
ardumont added a revision to T2821: indexer: Improve tests: D4638: tests: Use production backends within the indexer tests.
Dec 1 2020, 3:45 PM · Journal, Indexer
ardumont renamed T2821: indexer: Improve tests from indexer.journal.client: Improve tests to indexer: Improve tests.
Dec 1 2020, 3:44 PM · Journal, Indexer

Nov 30 2020

ardumont closed T2814: Fix swh indexer journal client service as Resolved.

Deployed the following:

Nov 30 2020, 3:08 PM · Journal, Indexer
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 30 2020, 3:07 PM · Journal, Indexer
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 30 2020, 3:07 PM · Journal, Indexer
ardumont added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 30 2020, 10:47 AM · Journal, Indexer

Nov 27 2020

vsellier closed T2816: Enable the journal-writer for the swh-idx-storage in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Nov 27 2020, 6:20 PM · Journal, Archive search
vsellier closed T2816: Enable the journal-writer for the swh-idx-storage in staging as Resolved.

The swh-indexer stack is deployed on staging and the initial loading is done.
The volumes are quite low :

Nov 27 2020, 6:20 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a revision to T2816: Enable the journal-writer for the swh-idx-storage in staging: D4625: staging: Fix object storage configuration for indexers.
Nov 27 2020, 3:20 PM · System administrators, Staging environment, Journal, Archive search
vlorentz triaged T2823: Write tests for swh/journal/writer/inmemory.py as Low priority.
Nov 27 2020, 1:50 PM · Easy hack, Journal
ardumont updated the task description for T2821: indexer: Improve tests.
Nov 27 2020, 1:21 PM · Journal, Indexer
ardumont triaged T2821: indexer: Improve tests as Normal priority.
Nov 27 2020, 1:19 PM · Journal, Indexer
olasd added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 27 2020, 11:54 AM · Journal, Indexer
olasd added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

I propose meeting in the middle and having the following policies:

Nov 27 2020, 11:49 AM · System administration, Journal
ardumont added a comment to T2814: Fix swh indexer journal client service.

and indexer 0.6.1 is now packaged. We have everything we need to unstuck it now.

Nov 27 2020, 11:34 AM · Journal, Indexer
vlorentz added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.
In T2780#53415, @olasd wrote:

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Nov 27 2020, 11:01 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Where is the list of topics that need to be created?

Nov 27 2020, 10:50 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

It's unclear what the prefix should be. swh.storage uses swh.journal.objects, we can either use that one too, or a new one, eg. swh.journal.indexed

I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.

Nov 27 2020, 10:48 AM · System administration, Journal
vsellier added a revision to T2816: Enable the journal-writer for the swh-idx-storage in staging: D4620: staging: configure idx-storage to write to kafka.
Nov 27 2020, 10:43 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2590: Finish the indexer -> swh-search pipeline.

this a description of the pipeline to clarify the interaction between the components (source: P883) :

Nov 27 2020, 10:14 AM · Journal, Archive search

Nov 26 2020

vsellier changed the status of T2817: Enable the swh-search environment in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Nov 26 2020, 5:59 PM · Journal, Archive search
vsellier renamed T2817: Enable the swh-search environment in staging from Enable the swh-search in staging to Enable the swh-search environment in staging.
Nov 26 2020, 5:59 PM · System administrators, Staging environment, Journal, Archive search
vsellier triaged T2817: Enable the swh-search environment in staging as Normal priority.
Nov 26 2020, 5:58 PM · System administrators, Staging environment, Journal, Archive search
olasd added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Nov 26 2020, 5:53 PM · System administration, Journal
vsellier added a comment to T2816: Enable the journal-writer for the swh-idx-storage in staging.

T2814 needs to be released before

Nov 26 2020, 5:46 PM · System administrators, Staging environment, Journal, Archive search
vsellier triaged T2816: Enable the journal-writer for the swh-idx-storage in staging as Normal priority.
Nov 26 2020, 5:40 PM · System administrators, Staging environment, Journal, Archive search
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 26 2020, 3:20 PM · Journal, Indexer
ardumont added a revision to T2814: Fix swh indexer journal client service: D4605: indexer.journal_client: Subscribe to OriginVisitStatus topic.
Nov 26 2020, 3:18 PM · Journal, Indexer
vsellier added a revision to T2814: Fix swh indexer journal client service: D4599: swh.indexer.cli.journal_client: fix config use.
Nov 26 2020, 12:22 PM · Journal, Indexer
ardumont triaged T2814: Fix swh indexer journal client service as Normal priority.
Nov 26 2020, 12:21 PM · Journal, Indexer

Nov 16 2020

vlorentz triaged T2780: Enable the journal-writer for the swh-idx-storage in production as Normal priority.
Nov 16 2020, 1:31 PM · System administration, Journal
vlorentz closed T2651: Make the indexer-storage publish its rows to Kafka, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Nov 16 2020, 1:27 PM · Journal, Archive search

Oct 12 2020

vlorentz closed T2672: Inconsistent keys between visits and origin visits in the journal as Wontfix.

Closing in favor of T2686

Oct 12 2020, 1:05 PM · Journal

Oct 8 2020

vlorentz added a comment to T1279: swh-journal: The schema migration problem.

I like the Final version attribute idea. But to be clear, this means we will need to add successive versions as extra classes in swh.model.model when we change the schema (which is not necessarily a bad thing), and remove them when we are sure they are not around anymore

Oct 8 2020, 12:46 PM · Journal
vlorentz renamed T1279: swh-journal: The schema migration problem from swh-journal: The migration problem to swh-journal: The schema migration problem.
Oct 8 2020, 12:43 PM · Journal
douardda added a comment to T1279: swh-journal: The schema migration problem.

Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).

Oct 8 2020, 12:41 PM · Journal
vlorentz added a revision to T2672: Inconsistent keys between visits and origin visits in the journal: D4194: model: use visit ids in the unique key, instead of their date..
Oct 8 2020, 11:26 AM · Journal
vlorentz triaged T2672: Inconsistent keys between visits and origin visits in the journal as High priority.
Oct 8 2020, 11:26 AM · Journal

Sep 22 2020

olasd closed T529: swh-journal: Create a journal checker comparing object lists between journal and database, a subtask of T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage, as Wontfix.
Sep 22 2020, 6:30 PM · Journal
olasd closed T529: swh-journal: Create a journal checker comparing object lists between journal and database as Wontfix.

Thanks to @seirl using the full journal to do a graph export (and therefore having the time to check whether all objects were there), we've found a bunch of bugs in the journal backfiller / configuration preventing large objects to be added.

Sep 22 2020, 6:30 PM · Journal
olasd added a comment to T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them.

After resetting a local consumer to these offsets, I was completely unable to reproduce this issue.

Sep 22 2020, 6:27 PM · System administration, Journal
olasd added a comment to T1828: Improve directory journal backfill performance.

(the backfill had, in fact, completed within a month)

Sep 22 2020, 6:14 PM · Mirror, Journal
olasd closed T1828: Improve directory journal backfill performance as Resolved.

At this point, I don't think we'll make it much better with postgres as source.

Sep 22 2020, 6:14 PM · Mirror, Journal

Sep 14 2020

vlorentz edited projects for T2590: Finish the indexer -> swh-search pipeline, added: Journal; removed Storage manager.
Sep 14 2020, 5:39 PM · Journal, Archive search

Aug 27 2020

olasd triaged T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them as High priority.
Aug 27 2020, 12:27 PM · System administration, Journal

Aug 26 2020

vlorentz moved T2514: Add raw_extrinsic_metadata to the journal backfiller from Backlog to Done on the Roadmap 2020 board.
Aug 26 2020, 5:00 PM · Journal, Storage manager, Roadmap 2020

Jul 31 2020

vlorentz closed T2514: Add raw_extrinsic_metadata to the journal backfiller as Resolved.
Jul 31 2020, 2:25 PM · Journal, Storage manager, Roadmap 2020
vlorentz added a revision to T2514: Add raw_extrinsic_metadata to the journal backfiller: D3659: Add support for metadata-related object types to the backfiller and replayer..
Jul 31 2020, 2:25 PM · Journal, Storage manager, Roadmap 2020

Jul 30 2020

zack renamed T2003: Content replayer may try to copy objects before they are available from an objstorage from Content replayer may try to copy objects before they are available in an objstorage to Content replayer may try to copy objects before they are available from an objstorage.
Jul 30 2020, 8:18 AM · Journal

Jul 29 2020

vlorentz closed T2074: Publish extrinsic metadata to swh-journal/Kafka, a subtask of T2514: Add raw_extrinsic_metadata to the journal backfiller, as Resolved.
Jul 29 2020, 7:36 PM · Journal, Storage manager, Roadmap 2020
vlorentz closed T2074: Publish extrinsic metadata to swh-journal/Kafka as Resolved.
Jul 29 2020, 7:36 PM · Storage manager, Journal, Metadata workflow
vlorentz added a revision to T2074: Publish extrinsic metadata to swh-journal/Kafka: D3633: Write metadata + metadata authorities/fetchers to the journal..
Jul 29 2020, 7:36 PM · Storage manager, Journal, Metadata workflow
vlorentz added a parent task for T2074: Publish extrinsic metadata to swh-journal/Kafka: T2514: Add raw_extrinsic_metadata to the journal backfiller.
Jul 29 2020, 7:35 PM · Storage manager, Journal, Metadata workflow
vlorentz added a subtask for T2514: Add raw_extrinsic_metadata to the journal backfiller: T2074: Publish extrinsic metadata to swh-journal/Kafka.
Jul 29 2020, 7:35 PM · Journal, Storage manager, Roadmap 2020
vlorentz triaged T2514: Add raw_extrinsic_metadata to the journal backfiller as Normal priority.
Jul 29 2020, 7:35 PM · Journal, Storage manager, Roadmap 2020
vlorentz closed T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, as Resolved.
Jul 29 2020, 7:33 PM · Storage manager, Journal, Metadata workflow

Jul 20 2020

vlorentz claimed T2074: Publish extrinsic metadata to swh-journal/Kafka.
Jul 20 2020, 11:20 AM · Storage manager, Journal, Metadata workflow

Jul 1 2020

ardumont changed the status of T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, from Open to Work in Progress.
Jul 1 2020, 3:50 PM · Storage manager, Journal, Metadata workflow

Jun 29 2020

vlorentz added a subtask for T2074: Publish extrinsic metadata to swh-journal/Kafka: T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive.
Jun 29 2020, 2:58 PM · Storage manager, Journal, Metadata workflow
vlorentz closed T2075: Implement metadata authority specification, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, as Resolved.
Jun 29 2020, 2:58 PM · Storage manager, Journal, Metadata workflow

Jun 17 2020

olasd closed T2143: Journal error in production as Resolved.

I suspect this hasn't happened recently

Jun 17 2020, 2:25 PM · Journal

Jun 9 2020

ardumont closed D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 4:23 PM · Journal
olasd accepted D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 4:21 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 4:15 PM · Journal
ardumont updated the diff for D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
  • Rework commit message
  • Reuse same string format to reduce diff stat change
Jun 9 2020, 4:14 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 4:10 PM · Journal
ardumont updated the diff for D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Use pprint_key function as suggested in irc ;)

Jun 9 2020, 4:08 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 2:18 PM · Journal
ardumont added inline comments to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 2:18 PM · Journal