Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 8 2020

douardda added a comment to T1279: swh-journal: The schema migration problem.

Since this "migration problem" also concerns cassandra, maybe an simple approach would be to add a Final version attribute to all model entities (a simple monotonic integer).

Oct 8 2020, 12:41 PM · Journal
vlorentz added a revision to T2672: Inconsistent keys between visits and origin visits in the journal: D4194: model: use visit ids in the unique key, instead of their date..
Oct 8 2020, 11:26 AM · Journal
vlorentz triaged T2672: Inconsistent keys between visits and origin visits in the journal as High priority.
Oct 8 2020, 11:26 AM · Journal

Sep 22 2020

olasd closed T529: swh-journal: Create a journal checker comparing object lists between journal and database, a subtask of T424: swh-journal: persistent journal infrastructure to record additions to the swh-storage, as Wontfix.
Sep 22 2020, 6:30 PM · Journal
olasd closed T529: swh-journal: Create a journal checker comparing object lists between journal and database as Wontfix.

Thanks to @seirl using the full journal to do a graph export (and therefore having the time to check whether all objects were there), we've found a bunch of bugs in the journal backfiller / configuration preventing large objects to be added.

Sep 22 2020, 6:30 PM · Journal
olasd added a comment to T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them.

After resetting a local consumer to these offsets, I was completely unable to reproduce this issue.

Sep 22 2020, 6:27 PM · System administration, Journal
olasd added a comment to T1828: Improve directory journal backfill performance.

(the backfill had, in fact, completed within a month)

Sep 22 2020, 6:14 PM · Mirror, Journal
olasd closed T1828: Improve directory journal backfill performance as Resolved.

At this point, I don't think we'll make it much better with postgres as source.

Sep 22 2020, 6:14 PM · Mirror, Journal

Sep 14 2020

vlorentz edited projects for T2590: Finish the indexer -> swh-search pipeline, added: Journal; removed Storage manager.
Sep 14 2020, 5:39 PM · Journal, Archive search

Aug 27 2020

olasd triaged T2543: Some messages in the (azure) kafka cluster are too large for rdkafka clients to be able to decompress them as High priority.
Aug 27 2020, 12:27 PM · System administration, Journal

Aug 26 2020

vlorentz moved T2514: Add raw_extrinsic_metadata to the journal backfiller from Backlog to Done on the Roadmap 2020 board.
Aug 26 2020, 5:00 PM · Journal, Storage manager, Roadmap 2020

Jul 31 2020

vlorentz closed T2514: Add raw_extrinsic_metadata to the journal backfiller as Resolved.
Jul 31 2020, 2:25 PM · Journal, Storage manager, Roadmap 2020
vlorentz added a revision to T2514: Add raw_extrinsic_metadata to the journal backfiller: D3659: Add support for metadata-related object types to the backfiller and replayer..
Jul 31 2020, 2:25 PM · Journal, Storage manager, Roadmap 2020

Jul 30 2020

zack renamed T2003: Content replayer may try to copy objects before they are available from an objstorage from Content replayer may try to copy objects before they are available in an objstorage to Content replayer may try to copy objects before they are available from an objstorage.
Jul 30 2020, 8:18 AM · Journal

Jul 29 2020

vlorentz closed T2074: Publish extrinsic metadata to swh-journal/Kafka, a subtask of T2514: Add raw_extrinsic_metadata to the journal backfiller, as Resolved.
Jul 29 2020, 7:36 PM · Journal, Storage manager, Roadmap 2020
vlorentz closed T2074: Publish extrinsic metadata to swh-journal/Kafka as Resolved.
Jul 29 2020, 7:36 PM · Storage manager, Journal, Metadata workflow
vlorentz added a revision to T2074: Publish extrinsic metadata to swh-journal/Kafka: D3633: Write metadata + metadata authorities/fetchers to the journal..
Jul 29 2020, 7:36 PM · Storage manager, Journal, Metadata workflow
vlorentz added a parent task for T2074: Publish extrinsic metadata to swh-journal/Kafka: T2514: Add raw_extrinsic_metadata to the journal backfiller.
Jul 29 2020, 7:35 PM · Storage manager, Journal, Metadata workflow
vlorentz added a subtask for T2514: Add raw_extrinsic_metadata to the journal backfiller: T2074: Publish extrinsic metadata to swh-journal/Kafka.
Jul 29 2020, 7:35 PM · Journal, Storage manager, Roadmap 2020
vlorentz triaged T2514: Add raw_extrinsic_metadata to the journal backfiller as Normal priority.
Jul 29 2020, 7:35 PM · Journal, Storage manager, Roadmap 2020
vlorentz closed T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, as Resolved.
Jul 29 2020, 7:33 PM · Storage manager, Journal, Metadata workflow

Jul 20 2020

vlorentz claimed T2074: Publish extrinsic metadata to swh-journal/Kafka.
Jul 20 2020, 11:20 AM · Storage manager, Journal, Metadata workflow

Jul 1 2020

ardumont changed the status of T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, from Open to Work in Progress.
Jul 1 2020, 3:50 PM · Storage manager, Journal, Metadata workflow

Jun 29 2020

vlorentz added a subtask for T2074: Publish extrinsic metadata to swh-journal/Kafka: T2306: Generic storage for extrinsic, qualified metadata related to any node of the swh archive.
Jun 29 2020, 2:58 PM · Storage manager, Journal, Metadata workflow
vlorentz closed T2075: Implement metadata authority specification, a subtask of T2074: Publish extrinsic metadata to swh-journal/Kafka, as Resolved.
Jun 29 2020, 2:58 PM · Storage manager, Journal, Metadata workflow

Jun 17 2020

olasd closed T2143: Journal error in production as Resolved.

I suspect this hasn't happened recently

Jun 17 2020, 2:25 PM · Journal

Jun 9 2020

ardumont closed D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 4:23 PM · Journal
olasd accepted D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 4:21 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 4:15 PM · Journal
ardumont updated the diff for D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
  • Rework commit message
  • Reuse same string format to reduce diff stat change
Jun 9 2020, 4:14 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 4:10 PM · Journal
ardumont updated the diff for D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Use pprint_key function as suggested in irc ;)

Jun 9 2020, 4:08 PM · Journal
swh-public-ci added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Build is green

Jun 9 2020, 2:18 PM · Journal
ardumont added inline comments to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 2:18 PM · Journal
ardumont updated the diff for D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

Fix according to review \m/

Jun 9 2020, 2:16 PM · Journal
ardumont added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

checking for dict instance seems completely arbitrary, and doesn't catch other non-hashable types.

I suggest this instead:

try:
    key_str = hash_to_hex(key)
except TypeError:
    key_str = repr(key)
Jun 9 2020, 2:14 PM · Journal
vlorentz added a comment to D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.

checking for dict instance seems completely arbitrary, and doesn't catch other non-hashable types.

Jun 9 2020, 1:47 PM · Journal
ardumont updated the summary of D3249: pytest_plugin: Fix cascading assertion error when key is not hashable.
Jun 9 2020, 1:38 PM · Journal

May 13 2020

ardumont closed D3151: (fix ci) Fix search journal client tests.
May 13 2020, 4:08 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build is green

May 13 2020, 4:07 PM · Journal, Archive search
douardda added a comment to D3151: (fix ci) Fix search journal client tests.

Remove global var and put that definition local to its use

May 13 2020, 4:06 PM · Journal, Archive search
ardumont updated the diff for D3151: (fix ci) Fix search journal client tests.

Remove global var and put that definition local to its use

May 13 2020, 4:06 PM · Journal, Archive search
ardumont added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 4:05 PM · Journal, Archive search
douardda accepted D3151: (fix ci) Fix search journal client tests.
May 13 2020, 4:02 PM · Journal, Archive search
vlorentz added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:55 PM · Journal, Archive search
vlorentz added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:54 PM · Journal, Archive search
ardumont added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:54 PM · Journal, Archive search
douardda added inline comments to D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:50 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build is green

May 13 2020, 3:30 PM · Journal, Archive search
ardumont updated the diff for D3151: (fix ci) Fix search journal client tests.
  • Drop no longer required setUp method
  • Specify the deinitialize/initialize in comments
May 13 2020, 3:29 PM · Journal, Archive search
ardumont updated the summary of D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:08 PM · Journal, Archive search
ardumont updated the summary of D3151: (fix ci) Fix search journal client tests.
May 13 2020, 3:04 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build is green

May 13 2020, 3:01 PM · Journal, Archive search
ardumont updated the test plan for D3151: (fix ci) Fix search journal client tests.
May 13 2020, 2:59 PM · Journal, Archive search
ardumont updated the diff for D3151: (fix ci) Fix search journal client tests.

Fix tests (in the end, reset issue)

May 13 2020, 2:59 PM · Journal, Archive search
Harbormaster failed remote builds in B12364: Diff 11182 for D3151: (fix ci) Fix search journal client tests!
May 13 2020, 2:50 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build has FAILED

May 13 2020, 2:50 PM · Journal, Archive search
ardumont updated the diff for D3151: (fix ci) Fix search journal client tests.

Keep initial assertions order

May 13 2020, 2:49 PM · Journal, Archive search
Harbormaster failed remote builds in B12363: Diff 11181 for D3151: (fix ci) Fix search journal client tests!
May 13 2020, 2:44 PM · Journal, Archive search
swh-public-ci added a comment to D3151: (fix ci) Fix search journal client tests.

Build has FAILED

May 13 2020, 2:44 PM · Journal, Archive search
ardumont added projects to D3151: (fix ci) Fix search journal client tests: Archive search, Journal.
May 13 2020, 2:43 PM · Journal, Archive search

May 5 2020

ardumont closed D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.
May 5 2020, 2:33 PM · Journal, Archive search
vlorentz accepted D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.
May 5 2020, 2:31 PM · Journal, Archive search
swh-public-ci added a comment to D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.

Build is green

May 5 2020, 2:27 PM · Journal, Archive search
ardumont updated the diff for D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.

Adapt according to review

May 5 2020, 2:26 PM · Journal, Archive search
vlorentz added inline comments to D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.
May 5 2020, 2:13 PM · Journal, Archive search
douardda added a comment to D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.

why was get_journal_client removed from swh.journal.cli?

All the code you're adding in swh/search/cli.py should be in swh-journal, so it can be used by other CLIs using a journal client

May 5 2020, 2:00 PM · Journal, Archive search
vlorentz requested changes to D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.

why was get_journal_client removed from swh.journal.cli?

May 5 2020, 1:14 PM · Journal, Archive search
ardumont updated the summary of D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.
May 5 2020, 10:34 AM · Journal, Archive search
ardumont retitled D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks from search.cli: Fix journal client instantiation and add config checks to (fix ci) search.cli: Fix journal client instantiation and add config checks.
May 5 2020, 10:28 AM · Journal, Archive search
ardumont updated the summary of D3122: (fix ci) search.cli: Fix journal client instantiation and add config checks.
May 5 2020, 10:27 AM · Journal, Archive search

May 4 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3119: Add skipped_content to the list of accepted objects.
May 4 2020, 5:37 PM · Object storage, Storage manager, Journal

Apr 30 2020

douardda closed T2355: Make swh-journal independent from swh-storage or swh-objstorage as Resolved.

Let's consider this is done now.

Apr 30 2020, 4:09 PM · Object storage, Storage manager, Journal

Apr 29 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3087: Remove the content replayer code.
Apr 29 2020, 1:47 PM · Object storage, Storage manager, Journal

Apr 28 2020

olasd closed T2350: Support large messages in swh.journal / kafka, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 28 2020, 11:28 AM · Mirror, Journal
olasd closed T2350: Support large messages in swh.journal / kafka as Resolved.

We've bumped the max message size to 100 MB in all producers.

Apr 28 2020, 11:28 AM · Journal
olasd closed T2350: Support large messages in swh.journal / kafka, a subtask of T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL, as Resolved.
Apr 28 2020, 11:28 AM · Journal
olasd closed T2348: swh.journal silently loses large objects instead of rejecting them as Resolved.

The kafka producer in swh.journal now reads message receipts and fails if they're negative, or if they didn't arrive within two minutes.

Apr 28 2020, 11:27 AM · Mirror, Journal
olasd closed T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL as Resolved.

snapshots, releases, revisions and directories have now been completely backfilled, and no objects of these types are (known to be) missing from the kafka cluster on azure.

Apr 28 2020, 11:24 AM · Journal
olasd closed T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 28 2020, 11:24 AM · Mirror, Journal

Apr 24 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3062: Move the content of swh/objstorage/__init__.py in swh/objstorage/factory.py.
Apr 24 2020, 3:54 PM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3056: Deprecate the `config-path` argument of the `swh storage rpc-serve` command.
Apr 24 2020, 11:29 AM · Object storage, Storage manager, Journal

Apr 23 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3058: Adapt journal client loading to swh.journal 0.0.31.
Apr 23 2020, 4:58 PM · Object storage, Storage manager, Journal

Apr 22 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3044: Move get_journal_client function to swh.journal.client.
Apr 22 2020, 4:50 PM · Object storage, Storage manager, Journal
ardumont renamed T2355: Make swh-journal independent from swh-storage or swh-objstorage from Make swh-journal independant from swh-storage or swh-objstorage to Make swh-journal independent from swh-storage or swh-objstorage.
Apr 22 2020, 3:50 PM · Object storage, Storage manager, Journal
douardda renamed T2355: Make swh-journal independent from swh-storage or swh-objstorage from Merge parts of swh-journal in swh-storage to Make swh-journal independant from swh-storage or swh-objstorage.
Apr 22 2020, 3:41 PM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3043: Extract kafka-related pytest fixtures in a pytest plugin module.
Apr 22 2020, 3:38 PM · Object storage, Storage manager, Journal

Apr 17 2020

olasd added a comment to T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL.

Backfilled objects:

  • snapshot
  • release
Apr 17 2020, 3:59 PM · Journal

Apr 15 2020

olasd changed the status of T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, from Open to Work in Progress.
Apr 15 2020, 10:27 AM · Mirror, Journal
olasd changed the status of T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL from Open to Work in Progress.

I've pulled the list of objects from kafka using @seirl's graph export. I'm now looking to make the diff between postgres and that list of objects.

Apr 15 2020, 10:27 AM · Journal
olasd closed T2349: Make the journal writer reliable, a subtask of T2348: swh.journal silently loses large objects instead of rejecting them, as Resolved.
Apr 15 2020, 10:15 AM · Mirror, Journal
olasd closed T2349: Make the journal writer reliable as Resolved.

rDJNL7ff372a02de4 has now been deployed to production

Apr 15 2020, 10:15 AM · Journal

Apr 14 2020

douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3010: Copy the graph replayer component from swh-journal.
Apr 14 2020, 11:14 AM · Object storage, Storage manager, Journal
douardda added a revision to T2355: Make swh-journal independent from swh-storage or swh-objstorage: D3008: Copy the backfiller component from swh-journal.
Apr 14 2020, 11:14 AM · Object storage, Storage manager, Journal
olasd added a revision to T2349: Make the journal writer reliable: D2994: Add delivery notification handling to swh.journal.writer.kafka.
Apr 14 2020, 11:06 AM · Journal
olasd removed a revision from T2349: Make the journal writer reliable: D2994: Add delivery notification handling to swh.journal.writer.kafka.
Apr 14 2020, 11:00 AM · Journal
olasd added a revision to T2349: Make the journal writer reliable: D2994: Add delivery notification handling to swh.journal.writer.kafka.
Apr 14 2020, 10:45 AM · Journal

Apr 9 2020

ardumont updated the task description for T2355: Make swh-journal independent from swh-storage or swh-objstorage.
Apr 9 2020, 4:35 PM · Object storage, Storage manager, Journal

Apr 6 2020

olasd added a subtask for T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL: T2350: Support large messages in swh.journal / kafka.
Apr 6 2020, 10:35 PM · Journal
olasd added a parent task for T2350: Support large messages in swh.journal / kafka: T2351: Consider backfilling mistakenly rejected large objects from PostgreSQL.
Apr 6 2020, 10:35 PM · Journal