Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 3 2020

vsellier added a revision to T2817: Enable the swh-search environment in staging: D4654: -wip- Switch to the official elasticsearch plugin.
Dec 3 2020, 12:21 PM · System administrators, Staging environment, Journal, Archive search

Dec 2 2020

ardumont added a revision to T2817: Enable the swh-search environment in staging: D4651: Puppetize elasticsearch nodes.
Dec 2 2020, 4:53 PM · System administrators, Staging environment, Journal, Archive search
ardumont claimed T2821: indexer: Improve tests.
Dec 2 2020, 11:22 AM · Journal, Indexer
ardumont added a revision to T2821: indexer: Improve tests: D4641: test_journal_client: Send production objects to journal for testing.
Dec 2 2020, 9:24 AM · Journal, Indexer

Dec 1 2020

ardumont added a revision to T2821: indexer: Improve tests: D4640: test_journal_client: Migrate away from mocks.
Dec 1 2020, 6:01 PM · Journal, Indexer
ardumont added a revision to T2821: indexer: Improve tests: D4638: tests: Use production backends within the indexer tests.
Dec 1 2020, 3:45 PM · Journal, Indexer
ardumont renamed T2821: indexer: Improve tests from indexer.journal.client: Improve tests to indexer: Improve tests.
Dec 1 2020, 3:44 PM · Journal, Indexer

Nov 30 2020

ardumont closed T2814: Fix swh indexer journal client service as Resolved.

Deployed the following:

Nov 30 2020, 3:08 PM · Journal, Indexer
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 30 2020, 3:07 PM · Journal, Indexer
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 30 2020, 3:07 PM · Journal, Indexer
ardumont added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 30 2020, 10:47 AM · Journal, Indexer

Nov 27 2020

vsellier closed T2816: Enable the journal-writer for the swh-idx-storage in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Nov 27 2020, 6:20 PM · Journal, Archive search
vsellier closed T2816: Enable the journal-writer for the swh-idx-storage in staging as Resolved.

The swh-indexer stack is deployed on staging and the initial loading is done.
The volumes are quite low :

Nov 27 2020, 6:20 PM · System administrators, Staging environment, Journal, Archive search
vsellier added a revision to T2816: Enable the journal-writer for the swh-idx-storage in staging: D4625: staging: Fix object storage configuration for indexers.
Nov 27 2020, 3:20 PM · System administrators, Staging environment, Journal, Archive search
vlorentz triaged T2823: Write tests for swh/journal/writer/inmemory.py as Low priority.
Nov 27 2020, 1:50 PM · Easy hack, Journal
ardumont updated the task description for T2821: indexer: Improve tests.
Nov 27 2020, 1:21 PM · Journal, Indexer
ardumont triaged T2821: indexer: Improve tests as Normal priority.
Nov 27 2020, 1:19 PM · Journal, Indexer
olasd added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 27 2020, 11:54 AM · Journal, Indexer
olasd added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

I propose meeting in the middle and having the following policies:

Nov 27 2020, 11:49 AM · System administration, Journal
ardumont added a comment to T2814: Fix swh indexer journal client service.

and indexer 0.6.1 is now packaged. We have everything we need to unstuck it now.

Nov 27 2020, 11:34 AM · Journal, Indexer
vlorentz added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.
In T2780#53415, @olasd wrote:

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Nov 27 2020, 11:01 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Where is the list of topics that need to be created?

Nov 27 2020, 10:50 AM · System administration, Journal
ardumont added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

It's unclear what the prefix should be. swh.storage uses swh.journal.objects, we can either use that one too, or a new one, eg. swh.journal.indexed

I think we should definitely use a different prefix as swh.storage, as the ACLs for third parties should be separate.

Nov 27 2020, 10:48 AM · System administration, Journal
vsellier added a revision to T2816: Enable the journal-writer for the swh-idx-storage in staging: D4620: staging: configure idx-storage to write to kafka.
Nov 27 2020, 10:43 AM · System administrators, Staging environment, Journal, Archive search
vsellier added a comment to T2590: Finish the indexer -> swh-search pipeline.

this a description of the pipeline to clarify the interaction between the components (source: P883) :

Nov 27 2020, 10:14 AM · Journal, Archive search

Nov 26 2020

vsellier changed the status of T2817: Enable the swh-search environment in staging, a subtask of T2590: Finish the indexer -> swh-search pipeline, from Open to Work in Progress.
Nov 26 2020, 5:59 PM · Journal, Archive search
vsellier renamed T2817: Enable the swh-search environment in staging from Enable the swh-search in staging to Enable the swh-search environment in staging.
Nov 26 2020, 5:59 PM · System administrators, Staging environment, Journal, Archive search
vsellier triaged T2817: Enable the swh-search environment in staging as Normal priority.
Nov 26 2020, 5:58 PM · System administrators, Staging environment, Journal, Archive search
olasd added a comment to T2780: Enable the journal-writer for the swh-idx-storage in production.

Is this supposed to be persistent (and keep the full history of all messages), or transient (and used for "real-time" clients)? IOW, what are the storage requirements for this?

Nov 26 2020, 5:53 PM · System administration, Journal
vsellier added a comment to T2816: Enable the journal-writer for the swh-idx-storage in staging.

T2814 needs to be released before

Nov 26 2020, 5:46 PM · System administrators, Staging environment, Journal, Archive search
vsellier triaged T2816: Enable the journal-writer for the swh-idx-storage in staging as Normal priority.
Nov 26 2020, 5:40 PM · System administrators, Staging environment, Journal, Archive search
ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 26 2020, 3:20 PM · Journal, Indexer
ardumont added a revision to T2814: Fix swh indexer journal client service: D4605: indexer.journal_client: Subscribe to OriginVisitStatus topic.
Nov 26 2020, 3:18 PM · Journal, Indexer
vsellier added a revision to T2814: Fix swh indexer journal client service: D4599: swh.indexer.cli.journal_client: fix config use.
Nov 26 2020, 12:22 PM · Journal, Indexer
ardumont triaged T2814: Fix swh indexer journal client service as Normal priority.
Nov 26 2020, 12:21 PM · Journal, Indexer

Nov 16 2020

vlorentz triaged T2780: Enable the journal-writer for the swh-idx-storage in production as Normal priority.
Nov 16 2020, 1:31 PM · System administration, Journal
vlorentz closed T2651: Make the indexer-storage publish its rows to Kafka, a subtask of T2590: Finish the indexer -> swh-search pipeline, as Resolved.
Nov 16 2020, 1:27 PM · Journal, Archive search

Oct 12 2020

vlorentz closed T2672: Inconsistent keys between visits and origin visits in the journal as Wontfix.

Closing in favor of T2686

Oct 12 2020, 1:05 PM · Journal

Oct 8 2020

vlorentz added a comment to T1279: swh-journal: The schema migration problem.

I like the Final version attribute idea. But to be clear, this means we will need to add successive versions as extra classes in swh.model.model when we change the schema (which is not necessarily a bad thing), and remove them when we are sure they are not around anymore

Oct 8 2020, 12:46 PM · Journal
vlorentz renamed T1279: swh-journal: The schema migration problem from swh-journal: The migration problem to swh-journal: The schema migration problem.
Oct 8 2020, 12:43 PM · Journal
douardda added a comment to T1279: swh-journal: The schema migration problem.

Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).

Oct 8 2020, 12:41 PM · Journal
vlorentz added a revision to T2672: Inconsistent keys between visits and origin visits in the journal: D4194: model: use visit ids in the unique key, instead of their date..
Oct 8 2020, 11:26 AM · Journal
vlorentz triaged T2672: Inconsistent keys between visits and origin visits in the journal as High priority.
Oct 8 2020, 11:26 AM · Journal

Sep 22 2020

olasd closed T529: swh-journal: Create a journal checker comparing object lists between journal and database, a subtask of T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage, as Wontfix.
Sep 22 2020, 6:30 PM · Journal
olasd closed T529: swh-journal: Create a journal checker comparing object lists between journal and database as Wontfix.

Thanks to @seirl using the full journal to do a graph export (and therefore having the time to check whether all objects were there), we've found a bunch of bugs in the journal backfiller / configuration preventing large objects to be added.

Sep 22 2020, 6:30 PM · Journal
olasd added a comment to T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them.

After resetting a local consumer to these offsets, I was completely unable to reproduce this issue.

Sep 22 2020, 6:27 PM · System administration, Journal
olasd added a comment to T1828: Improve directory journal backfill performance.

(the backfill had, in fact, completed within a month)

Sep 22 2020, 6:14 PM · Mirror, Journal
olasd closed T1828: Improve directory journal backfill performance as Resolved.

At this point, I don't think we'll make it much better with postgres as source.

Sep 22 2020, 6:14 PM · Mirror, Journal

Sep 14 2020

vlorentz edited projects for T2590: Finish the indexer -> swh-search pipeline, added: Journal; removed Storage manager.
Sep 14 2020, 5:39 PM · Journal, Archive search

Aug 27 2020

olasd triaged T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them as High priority.
Aug 27 2020, 12:27 PM · System administration, Journal

Aug 26 2020

vlorentz moved T2514: Add raw_extrinsic_metadata to the journal backfiller from Backlog to Done on the Roadmap 2020 board.
Aug 26 2020, 5:00 PM · Journal, Storage manager, Roadmap 2020

Jul 31 2020

vlorentz closed T2514: Add raw_extrinsic_metadata to the journal backfiller as Resolved.
Jul 31 2020, 2:25 PM · Journal, Storage manager, Roadmap 2020
vlorentz added a revision to T2514: Add raw_extrinsic_metadata to the journal backfiller: D3659: Add support for metadata-related object types to the backfiller and replayer..
Jul 31 2020, 2:25 PM · Journal, Storage manager, Roadmap 2020

Jul 30 2020

zack renamed T2003: Content replayer may try to copy objects before they are available from an objstorage from Content replayer may try to copy objects before they are available in an objstorage to Content replayer may try to copy objects before they are available from an objstorage.
Jul 30 2020, 8:18 AM · Journal

Jul 29 2020

vlorentz closed T2074: Publish extrinsic metadata to swh-journal/Kafka, a subtask of T2514: Add raw_extrinsic_metadata to the journal backfiller, as Resolved.
Jul 29 2020, 7:36 PM · Journal, Storage manager, Roadmap 2020
vlorentz closed T2074: Publish extrinsic metadata to swh-journal/Kafka as Resolved.
Jul 29 2020, 7:36 PM · Storage manager, Journal, Metadata workflow
vlorentz added a revision to T2074: Publish extrinsic metadata to swh-journal/Kafka: D3633: Write metadata + metadata authorities/fetchers to the journal..
Jul 29 2020, 7:36 PM · Storage manager, Journal, Metadata workflow
vlorentz added a parent task for T2074: Publish extrinsic metadata to swh-journal/Kafka: T2514: Add raw_extrinsic_metadata to the journal backfiller.
Jul 29 2020, 7:35 PM · Storage manager, Journal, Metadata workflow
vlorentz added a subtask for T2514: Add raw_extrinsic_metadata to the journal backfiller: T2074: Publish extrinsic metadata to swh-journal/Kafka.
Jul 29 2020, 7:35 PM · Journal, Storage manager, Roadmap 2020
vlorentz triaged T2514: Add raw_extrinsic_metadata to the journal backfiller as Normal priority.
Jul 29 2020, 7:35 PM · Journal, Storage manager, Roadmap 2020
vlorentz closed T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, as Resolved.
Jul 29 2020, 7:33 PM · Storage manager, Journal, Metadata workflow

Jul 20 2020

vlorentz claimed T2074: Publish extrinsic metadata to swh-journal/Kafka.
Jul 20 2020, 11:20 AM · Storage manager, Journal, Metadata workflow

Jul 1 2020

ardumont changed the status of T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, from Open to Work in Progress.
Jul 1 2020, 3:50 PM · Storage manager, Journal, Metadata workflow

Jun 29 2020

vlorentz added a subtask for T2074: Publish extrinsic metadata to swh-journal/Kafka: T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive.
Jun 29 2020, 2:58 PM · Storage manager, Journal, Metadata workflow
vlorentz closed T2075: Implement metadata authority specification, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, as Resolved.
Jun 29 2020, 2:58 PM · Storage manager, Journal, Metadata workflow

Jun 17 2020

olasd closed T2143: Journal error in production as Resolved.

I suspect this hasn't happened recently

Jun 17 2020, 2:25 PM · Journal

Jun 9 2020

ardumont closed D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 4:23 PM · Journal
olasd accepted D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 4:21 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 4:15 PM · Journal
ardumont updated the diff for D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
  • Rework commit message
  • Reuse same string format to reduce diff stat change
Jun 9 2020, 4:14 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 4:10 PM · Journal
ardumont updated the diff for D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Use pprint_key function as suggested in irc ;)

Jun 9 2020, 4:08 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 2:18 PM · Journal
ardumont added inline comments to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 2:18 PM · Journal
ardumont updated the diff for D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Fix according to review \m/

Jun 9 2020, 2:16 PM · Journal
ardumont added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

checking for dict instance seems completely arbitrary, and doesn't catch other non-hashable types.

I suggest this instead:

try:
    key_str = hash_to_hex(key)
except TypeError:
    key_str = repr(key)
Jun 9 2020, 2:14 PM · Journal
vlorentz added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

checking for dict instance seems completely arbitrary, and doesn't catch other non-hashable types.

Jun 9 2020, 1:47 PM · Journal
ardumont updated the summary of D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 1:38 PM · Journal

May 13 2020

ardumont closed D3151: (fix ci) Fix search journal client tests.
May 13 2020, 4:08 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build is green

May 13 2020, 4:07 PM · Journal, Archive search
douardda added a comment to D3151: (fix ci) Fix search journal client tests.

Remove global var and put that definition local to its use

May 13 2020, 4:06 PM · Journal, Archive search
ardumont updated the diff for D3151: (fix ci) Fix search journal client tests.

Remove global var and put that definition local to its use

May 13 2020, 4:06 PM · Journal, Archive search
ardumont added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 4:05 PM · Journal, Archive search
douardda accepted D3151: (fix ci) Fix search journal client tests.
May 13 2020, 4:02 PM · Journal, Archive search
vlorentz added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:55 PM · Journal, Archive search
vlorentz added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:54 PM · Journal, Archive search
ardumont added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:54 PM · Journal, Archive search
douardda added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:50 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build is green

May 13 2020, 3:30 PM · Journal, Archive search
ardumont updated the diff for D3151: (fix ci) Fix search journal client tests.
  • Drop no longer required setUp method
  • Specify the deinitialize/initialize in comments
May 13 2020, 3:29 PM · Journal, Archive search
ardumont updated the summary of D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:08 PM · Journal, Archive search
ardumont updated the summary of D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:04 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build is green

May 13 2020, 3:01 PM · Journal, Archive search
ardumont updated the test plan for D3151: (fix ci) Fix search journal client tests.
May 13 2020, 2:59 PM · Journal, Archive search
ardumont updated the diff for D3151: (fix ci) Fix search journal client tests.

Fix tests (in the end, reset issue)

May 13 2020, 2:59 PM · Journal, Archive search
Harbormaster failed remote builds in B12364: Diff 11182 for D3151: (fix ci) Fix search journal client tests!
May 13 2020, 2:50 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build has FAILED

May 13 2020, 2:50 PM · Journal, Archive search
ardumont updated the diff for D3151: (fix ci) Fix search journal client tests.

Keep initial assertions order

May 13 2020, 2:49 PM · Journal, Archive search
Harbormaster failed remote builds in B12363: Diff 11181 for D3151: (fix ci) Fix search journal client tests!
May 13 2020, 2:44 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build has FAILED

May 13 2020, 2:44 PM · Journal, Archive search