Page MenuHomeSoftware Heritage
Feed Advanced Search

Nov 30 2020

ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 30 2020, 3:07 PM · Journal, Indexer
ardumont added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 30 2020, 10:47 AM · Journal, Indexer

Nov 27 2020

ardumont updated the task description for T2821: indexer: Improve tests.
Nov 27 2020, 1:21 PM · Journal, Indexer
ardumont triaged T2821: indexer: Improve tests as Normal priority.
Nov 27 2020, 1:19 PM · Journal, Indexer
olasd added a comment to T2814: Fix swh indexer journal client service.

It would be nice if the tests for this journal client used an actual storage with a journal writer, rather than fully mocked topics. Doing this would have caught the original breakage.

Nov 27 2020, 11:54 AM · Journal, Indexer
ardumont added a comment to T2814: Fix swh indexer journal client service.

and indexer 0.6.1 is now packaged. We have everything we need to unstuck it now.

Nov 27 2020, 11:34 AM · Journal, Indexer

Nov 26 2020

ardumont updated the task description for T2814: Fix swh indexer journal client service.
Nov 26 2020, 3:20 PM · Journal, Indexer
ardumont added a revision to T2814: Fix swh indexer journal client service: D4605: indexer.journal_client: Subscribe to OriginVisitStatus topic.
Nov 26 2020, 3:18 PM · Journal, Indexer
vsellier added a revision to T2814: Fix swh indexer journal client service: D4599: swh.indexer.cli.journal_client: fix config use.
Nov 26 2020, 12:22 PM · Journal, Indexer
ardumont triaged T2814: Fix swh indexer journal client service as Normal priority.
Nov 26 2020, 12:21 PM · Journal, Indexer

Nov 16 2020

vlorentz closed T2651: Make the indexer-storage publish its rows to Kafka as Resolved.
Nov 16 2020, 1:27 PM · Indexer

Nov 6 2020

vlorentz added revisions to T2651: Make the indexer-storage publish its rows to Kafka: D4420: Set the value_sanitizer argument of get_journal_writer., D4083: Make the indexer storage write to the journal., D1959: Publish origin_intrinsic_metadata to Kafka..
Nov 6 2020, 12:42 PM · Indexer

Nov 3 2020

ardumont moved T871: Migrate swh-storage api functions relative to indexers to swh-indexer from Backlog to Archived on the SWORD deposit board.
Nov 3 2020, 4:07 PM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont moved T872: Deploy and restart indexers from Backlog to Archived on the SWORD deposit board.
Nov 3 2020, 4:07 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer

Nov 2 2020

vlorentz added a revision to T2651: Make the indexer-storage publish its rows to Kafka: D4392: Remove metadata deletion endpoints and algorithms.
Nov 2 2020, 1:50 PM · Indexer
vlorentz renamed T2651: Make the indexer-storage publish its rows to Kafka from Make the index-storage publish its rows to Kafka to Make the indexer-storage publish its rows to Kafka.
Nov 2 2020, 1:35 PM · Indexer

Oct 26 2020

vlorentz added revisions to T2651: Make the indexer-storage publish its rows to Kafka: D4080: Make value_sanitizer an argument of JournalWriter., D4082: Make the type of values of JournalWriter generic, so it works with types not from swh-model..
Oct 26 2020, 9:31 AM · Indexer

Oct 10 2020

ardumont added a comment to T2652: Make the indexer-storage interface use attr classes instead of dicts.

\o/

Oct 10 2020, 8:13 AM · Indexer

Oct 9 2020

vlorentz closed T2652: Make the indexer-storage interface use attr classes instead of dicts as Resolved.
Oct 9 2020, 8:41 PM · Indexer
vlorentz closed T2652: Make the indexer-storage interface use attr classes instead of dicts, a subtask of T2651: Make the indexer-storage publish its rows to Kafka, as Resolved.
Oct 9 2020, 8:41 PM · Indexer
vlorentz closed T2652: Make the indexer-storage interface use attr classes instead of dicts, a subtask of T1433: Refactor output of indexer storage's `get` methods., as Resolved.
Oct 9 2020, 8:41 PM · Easy hack, Indexer
vlorentz closed T1433: Refactor output of indexer storage's `get` methods. as Resolved.
Oct 9 2020, 8:41 PM · Easy hack, Indexer

Oct 7 2020

vlorentz added revisions to T2652: Make the indexer-storage interface use attr classes instead of dicts: D4190: Update to work with swh.storage.indexer 0.3., D4189: Properly define the type of index() methods of indexers., D4188: tests: Clean up compatibility code with endpoints that return dicts., D4187: use OriginIntrinsicMetadataRow in the storage interface instead of dicts., D4184: use RevisionIntrinsicMetadataRow in the storage interface instead of dicts., D4180: use ContentMetadataRow in the storage interface instead of dicts., D4177: use ContentCtagsRow in the storage interface instead of dicts., D4176: use ContentLanguageRow in the storage interface instead of dicts., D4174: indexer.storage: Change return types from Iterable to List, D4173: license: use ContentLicenseRow in the storage interface instead of dicts., D4172: base indexers: add type annotation for self.{storage,idx_storage}., D4171: indexer.storage: Update docstrings of mimetype-related endpoints., D4170: indexer.storage: Change return type annotation from Iterator to Iterable., D4169: tests: Enable type-checking on storage test functions., D4159: Make base indexers generic, with the result of index() as their type parameter., D4158: mimetype: use ContentMimetypeRow in the storage interface instead of dicts..
Oct 7 2020, 5:58 PM · Indexer

Oct 6 2020

vlorentz added a parent task for T2652: Make the indexer-storage interface use attr classes instead of dicts: T1433: Refactor output of indexer storage's `get` methods..
Oct 6 2020, 10:36 AM · Indexer
vlorentz added a subtask for T1433: Refactor output of indexer storage's `get` methods.: T2652: Make the indexer-storage interface use attr classes instead of dicts.
Oct 6 2020, 10:36 AM · Easy hack, Indexer
vlorentz changed the status of T2652: Make the indexer-storage interface use attr classes instead of dicts from Open to Work in Progress.
Oct 6 2020, 10:33 AM · Indexer
vlorentz changed the status of T2652: Make the indexer-storage interface use attr classes instead of dicts, a subtask of T2651: Make the indexer-storage publish its rows to Kafka, from Open to Work in Progress.
Oct 6 2020, 10:33 AM · Indexer

Oct 1 2020

vlorentz changed the status of T1433: Refactor output of indexer storage's `get` methods. from Open to Work in Progress.
Oct 1 2020, 12:40 PM · Easy hack, Indexer
vlorentz triaged T2652: Make the indexer-storage interface use attr classes instead of dicts as Normal priority.
Oct 1 2020, 12:09 PM · Indexer
vlorentz triaged T2651: Make the indexer-storage publish its rows to Kafka as Normal priority.
Oct 1 2020, 12:08 PM · Indexer

Sep 22 2020

moranegg placed T831: review all json schemas in storage for metadata objects (content_metadata, revision_metadata and origin_metadata) up for grabs.
Sep 22 2020, 4:54 PM · Metadata workflow, Indexer

Sep 18 2020

moranegg moved T1485: Show stats on extracted metadata from Backlog to Implementation on the Metadata workflow board.
Sep 18 2020, 2:46 PM · Web app, Metadata workflow, Indexer
moranegg moved T1681: Use project metadata as a "lister" from Backlog to Implementation on the Metadata workflow board.
Sep 18 2020, 2:19 PM · Archive coverage, Indexer, Metadata workflow
moranegg moved T1585: Add support for extracting metadata from Python classifiers from Backlog to Implementation on the Metadata workflow board.
Sep 18 2020, 1:50 PM · Indexer, Metadata workflow
moranegg moved T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS from Backlog to Implementation on the Metadata workflow board.
Sep 18 2020, 1:48 PM · Intrinsic metadata, Indexer
moranegg added a project to T2472: Indexing intrinsic metadata in a deposit using a sub-folder for the content: Indexer.
Sep 18 2020, 1:45 PM · Intrinsic metadata, Indexer, SWORD deposit

Sep 16 2020

ardumont closed T1386: Refactor indexers' initialization step as Wontfix.
Sep 16 2020, 5:56 PM · Indexer, Scheduling utilities
olasd added a parent task for T2603: Configuration mismatch between swh.indexer.journal_client and the configuration declared in puppet: T2597: Switch indexer journal client to new kafka cluster.
Sep 16 2020, 2:07 PM · Puppet recipes, Indexer
olasd triaged T2603: Configuration mismatch between swh.indexer.journal_client and the configuration declared in puppet as Normal priority.
Sep 16 2020, 2:06 PM · Puppet recipes, Indexer

Sep 10 2020

pythoncourse added a comment to T1462: mimetype indexer: fails with TypeError: 'NoneType' object is not subscriptable.
Sep 10 2020, 11:28 AM · Indexer

Feb 25 2020

ardumont closed T1788: indexer-license: Investigate timeouts as Resolved by committing rDCIDXfc7a19e80874: storage.db: Improve content range queries to actually finish.
Feb 25 2020, 11:28 AM · Indexer
ardumont closed D2709: idx.storage.db: Improve content range queries to actually finish.
Feb 25 2020, 11:28 AM · Indexer
vlorentz accepted D2709: idx.storage.db: Improve content range queries to actually finish.
Feb 25 2020, 11:07 AM · Indexer

Feb 24 2020

ardumont updated the test plan for D2709: idx.storage.db: Improve content range queries to actually finish.
Feb 24 2020, 2:23 PM · Indexer
ardumont added a revision to T1788: indexer-license: Investigate timeouts: D2709: idx.storage.db: Improve content range queries to actually finish.
Feb 24 2020, 2:14 PM · Indexer

Feb 13 2020

moranegg added a comment to T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS.

example: https://archive.softwareheritage.org/swh:1:cnt:a6463d2ce390990c31c2f2fa8019606721f0ca13;origin=http://git.savannah.gnu.org/git/gnugo.git/

Feb 13 2020, 2:46 PM · Intrinsic metadata, Indexer

Feb 11 2020

krithikvaidya closed T2258: Add type annotations to indexer classes, a subtask of T2257: Fully annotate swh-indexer with types, as Resolved.
Feb 11 2020, 4:35 AM · Indexer
krithikvaidya closed T2258: Add type annotations to indexer classes as Resolved by committing rDCIDX5f49b59e6aa3: Add type annotations to indexer classes.
Feb 11 2020, 4:35 AM · Easy hack, Indexer

Feb 7 2020

vlorentz added a comment to T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS.

I think it qualifies, yes

Feb 7 2020, 11:09 AM · Intrinsic metadata, Indexer
moranegg added a comment to T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS.

@vlorentz I'm not sure it is an easy hack, could you review the task and decide?

Feb 7 2020, 11:03 AM · Intrinsic metadata, Indexer
moranegg triaged T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS as Normal priority.
Feb 7 2020, 11:03 AM · Intrinsic metadata, Indexer

Feb 4 2020

krithikvaidya added a revision to T2258: Add type annotations to indexer classes: D2622: Add type annotations to indexer classes.
Feb 4 2020, 3:31 PM · Easy hack, Indexer
vlorentz added a comment to T2258: Add type annotations to indexer classes.

Yes, but you should just open a diff instead.

Feb 4 2020, 12:15 PM · Easy hack, Indexer
krithikvaidya added a comment to T2258: Add type annotations to indexer classes.

And does Jenkins also run code in non-master branches through the pipeline?

Feb 4 2020, 3:10 AM · Easy hack, Indexer

Feb 3 2020

ardumont added a comment to T2258: Add type annotations to indexer classes.

and apologies for the delay.

Feb 3 2020, 10:41 AM · Easy hack, Indexer

Feb 2 2020

krithikvaidya added a comment to T2258: Add type annotations to indexer classes.

Thanks for the detailed reply 🙂, and apologies for the delay. Things are clearer now 👍

Feb 2 2020, 4:09 AM · Easy hack, Indexer

Feb 1 2020

twentyse7en added a comment to T2259: Add type annotations to metadata mappings.

hey, I would like to contribute.

Feb 1 2020, 6:49 PM · Easy hack, Indexer
ardumont added a comment to T2258: Add type annotations to indexer classes.

The pytest tests are succeeding in the swh-indexer module, but failing in some other modules. Since this issue pertains to only the swh-indexer module, it shouldn't cause problems, right?

Feb 1 2020, 10:42 AM · Easy hack, Indexer
krithikvaidya added a comment to T2258: Add type annotations to indexer classes.

Hi, I'd like to take up this issue as my first issue here :) . But before I take it up, I just had a few queries:

Feb 1 2020, 9:58 AM · Easy hack, Indexer

Jan 29 2020

vlorentz removed a project from T2257: Fully annotate swh-indexer with types: Roadmap 2020.
Jan 29 2020, 5:06 PM · Indexer
vlorentz removed a project from T2258: Add type annotations to indexer classes: Roadmap 2020.
Jan 29 2020, 5:06 PM · Easy hack, Indexer
vlorentz removed a project from T2259: Add type annotations to metadata mappings: Roadmap 2020.
Jan 29 2020, 5:06 PM · Easy hack, Indexer
vlorentz added a project to T2259: Add type annotations to metadata mappings: Easy hack.
Jan 29 2020, 3:35 PM · Easy hack, Indexer
vlorentz triaged T2259: Add type annotations to metadata mappings as Low priority.
Jan 29 2020, 3:35 PM · Easy hack, Indexer
vlorentz triaged T2258: Add type annotations to indexer classes as Low priority.
Jan 29 2020, 3:34 PM · Easy hack, Indexer
vlorentz renamed T2257: Fully annotate swh-indexer with types from Fully annotate swh-index with types to Fully annotate swh-indexer with types.
Jan 29 2020, 3:32 PM · Indexer
vlorentz triaged T2257: Fully annotate swh-indexer with types as Low priority.
Jan 29 2020, 3:31 PM · Indexer

Jan 27 2020

vlorentz updated the task description for T1475: Test more edge cases of metadata indexer mappings.
Jan 27 2020, 4:41 PM · Easy hack, Indexer
vlorentz renamed T1475: Test more edge cases of metadata indexer mappings from Add more tests for edge cases of indexer mappings to Test more edge cases of metadata indexer mappings.
Jan 27 2020, 4:39 PM · Easy hack, Indexer
vlorentz added a project to T1475: Test more edge cases of metadata indexer mappings: Easy hack.
Jan 27 2020, 4:35 PM · Easy hack, Indexer

Jan 23 2020

ardumont added a comment to T1788: indexer-license: Investigate timeouts.

sample with our shiny sentry: https://sentry.softwareheritage.org/share/issue/f4a40625783b4a5588980005ddc5a5e6/

Jan 23 2020, 9:05 AM · Indexer

Jan 22 2020

vlorentz placed T1475: Test more edge cases of metadata indexer mappings up for grabs.
Jan 22 2020, 3:36 PM · Easy hack, Indexer

Jan 13 2020

vlorentz closed T2144: Define an architecture for end-to-end monitoring/testing, a subtask of T2127: Standalone Indexer Testing, as Resolved.
Jan 13 2020, 3:23 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)

Dec 20 2019

vlorentz added a subtask for T2127: Standalone Indexer Testing: T2144: Define an architecture for end-to-end monitoring/testing.
Dec 20 2019, 3:09 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)

Dec 3 2019

vlorentz lowered the priority of T2127: Standalone Indexer Testing from High to Normal.
Dec 3 2019, 5:50 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)
vlorentz triaged T2127: Standalone Indexer Testing as High priority.
Dec 3 2019, 3:17 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)

Dec 2 2019

olasd created T2127: Standalone Indexer Testing.
Dec 2 2019, 2:26 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)

Nov 22 2019

vlorentz placed T1464: Auto-detect indexer tool versions instead of reading them from the config up for grabs.
Nov 22 2019, 10:54 AM · Indexer
vlorentz closed T861: mimetype indexer: edge case makes the indexer fail miserably as Resolved.

Fixed by D896.

Nov 22 2019, 10:54 AM · Indexer
vlorentz closed T861: mimetype indexer: edge case makes the indexer fail miserably, a subtask of T713: Index existing contents (mimetype, language, license), as Resolved.
Nov 22 2019, 10:54 AM · Indexer

Nov 14 2019

vlorentz added a comment to T1513: The indexer journal client is unstable.

Does it still happen? The journal client changed a lot since this task was open, including switching backend library.

Nov 14 2019, 12:27 PM · Indexer

Nov 8 2019

vlorentz closed T2060: Many rows in origin_intrinsic_metadata still do not have an origin_url as Resolved.

Fixed by @olasd

Nov 8 2019, 5:32 PM · Indexer

Nov 5 2019

vlorentz added a project to T2060: Many rows in origin_intrinsic_metadata still do not have an origin_url: Indexer.
Nov 5 2019, 1:59 PM · Indexer

Sep 30 2019

ardumont added a comment to T1788: indexer-license: Investigate timeouts.

That's the postgresql statement_timeout variable that we set for some methods on the storage backends.

Sep 30 2019, 5:47 PM · Indexer
olasd added a comment to T1788: indexer-license: Investigate timeouts.

We can investigate 2 things:

  • check postgresql options to kill queries that takes too long (solely indexer-db right now) -> and find some way to report those
Sep 30 2019, 1:53 PM · Indexer

Sep 27 2019

ardumont added a comment to T1788: indexer-license: Investigate timeouts.

We can investigate 2 things:

  • check postgresql options to kill queries that takes too long (solely indexer-db right now) -> and find some way to report those
  • push the proxy client storage idea (started within the T1389 for the storage, currently wip) up to the indexer-storage.
Sep 27 2019, 9:53 AM · Indexer

Sep 24 2019

ardumont updated subscribers of T1788: indexer-license: Investigate timeouts.

Note: ... comments stack pop ... (-> been there a while apparently)

Sep 24 2019, 4:25 PM · Indexer

Sep 6 2019

ardumont added a comment to P520 [fixed] broken ci: index failure status.

In the end, it was missing initialization data steps.

Sep 6 2019, 9:43 AM · Indexer
ardumont updated the title for P520 [fixed] broken ci: index failure status from wip: broken ci: index failure status to [fixed] broken ci: index failure status.
Sep 6 2019, 9:42 AM · Indexer

Sep 5 2019

ardumont updated the title for P520 [fixed] broken ci: index failure status from failure in indexer to wip: broken ci: index failure status.
Sep 5 2019, 3:01 PM · Indexer
ardumont added a comment to P520 [fixed] broken ci: index failure status.

current status:

Sep 5 2019, 3:01 PM · Indexer
ardumont created P520 [fixed] broken ci: index failure status.
Sep 5 2019, 2:55 PM · Indexer

Jul 10 2019

zack added a project to T1485: Show stats on extracted metadata: Web app.
Jul 10 2019, 8:13 PM · Web app, Metadata workflow, Indexer
vlorentz updated the task description for T1485: Show stats on extracted metadata.
Jul 10 2019, 7:12 PM · Web app, Metadata workflow, Indexer
vlorentz placed T1485: Show stats on extracted metadata up for grabs.
Jul 10 2019, 5:00 PM · Web app, Metadata workflow, Indexer

Jul 9 2019

twitu added a comment to T1433: Refactor output of indexer storage's `get` methods..

I went through all the tests in test_storage.py. It appears that only content_fossology_license_get needs to be refactored. All other storage methods return a dictionary or a list of dictionaries, where each dictionary has multiple keys.

Jul 9 2019, 5:28 AM · Easy hack, Indexer

Jul 8 2019

vlorentz updated the task description for T1433: Refactor output of indexer storage's `get` methods..
Jul 8 2019, 1:25 PM · Easy hack, Indexer

Jul 7 2019

twitu added a comment to T1433: Refactor output of indexer storage's `get` methods..

I am familiar with the web APIs and I went through the discussion in T782. When you say output a single dictionary, I believe you mean something like this

{
  sha1: [
    {tool: TOOL, licenses: [licences]},
    {tool: TOOL, licenses: [licences]}
  ],
Jul 7 2019, 4:23 PM · Easy hack, Indexer

Jul 3 2019

ardumont closed D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version.
Jul 3 2019, 6:09 PM · Journal, Indexer
ardumont updated the diff for D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version.
  • swh-indexer-journal-client: Update cli to latest version
  • swh-indexer-journal-client: Fix configuration change
Jul 3 2019, 6:09 PM · Journal, Indexer