Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 16 2020

ardumont closed T1386: Refactor indexers' initialization step as Wontfix.
Sep 16 2020, 5:56 PM · Indexer, Scheduling utilities
olasd added a parent task for T2603: Configuration mismatch between swh.indexer.journal_client and the configuration declared in puppet: T2597: Switch indexer journal client to new kafka cluster.
Sep 16 2020, 2:07 PM · Puppet recipes, Indexer
olasd triaged T2603: Configuration mismatch between swh.indexer.journal_client and the configuration declared in puppet as Normal priority.
Sep 16 2020, 2:06 PM · Puppet recipes, Indexer

Sep 10 2020

pythoncourse added a comment to T1462: mimetype indexer: fails with TypeError: 'NoneType' object is not subscriptable.
Sep 10 2020, 11:28 AM · Indexer

Feb 25 2020

ardumont closed T1788: indexer-license: Investigate timeouts as Resolved by committing rDCIDXfc7a19e80874: storage.db: Improve content range queries to actually finish.
Feb 25 2020, 11:28 AM · Indexer
ardumont closed D2709: idx.storage.db: Improve content range queries to actually finish.
Feb 25 2020, 11:28 AM · Indexer
vlorentz accepted D2709: idx.storage.db: Improve content range queries to actually finish.
Feb 25 2020, 11:07 AM · Indexer

Feb 24 2020

ardumont updated the test plan for D2709: idx.storage.db: Improve content range queries to actually finish.
Feb 24 2020, 2:23 PM · Indexer
ardumont added a revision to T1788: indexer-license: Investigate timeouts: D2709: idx.storage.db: Improve content range queries to actually finish.
Feb 24 2020, 2:14 PM · Indexer

Feb 13 2020

moranegg added a comment to T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS.

example: https://archive.softwareheritage.org/swh:1:cnt:a6463d2ce390990c31c2f2fa8019606721f0ca13;origin=http://git.savannah.gnu.org/git/gnugo.git/

Feb 13 2020, 2:46 PM · Intrinsic metadata, Indexer

Feb 11 2020

krithikvaidya closed T2258: Add type annotations to indexer classes, a subtask of T2257: Fully annotate swh-indexer with types, as Resolved.
Feb 11 2020, 4:35 AM · Indexer
krithikvaidya closed T2258: Add type annotations to indexer classes as Resolved by committing rDCIDX5f49b59e6aa3: Add type annotations to indexer classes.
Feb 11 2020, 4:35 AM · Easy hack, Indexer

Feb 7 2020

vlorentz added a comment to T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS.

I think it qualifies, yes

Feb 7 2020, 11:09 AM · Intrinsic metadata, Indexer
moranegg added a comment to T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS.

@vlorentz I'm not sure it is an easy hack, could you review the task and decide?

Feb 7 2020, 11:03 AM · Intrinsic metadata, Indexer
moranegg triaged T2270: Add to intrinsic metadata files to be indexed: AUTHORS and CONTRIBUTORS as Normal priority.
Feb 7 2020, 11:03 AM · Intrinsic metadata, Indexer

Feb 4 2020

krithikvaidya added a revision to T2258: Add type annotations to indexer classes: D2622: Add type annotations to indexer classes.
Feb 4 2020, 3:31 PM · Easy hack, Indexer
vlorentz added a comment to T2258: Add type annotations to indexer classes.

Yes, but you should just open a diff instead.

Feb 4 2020, 12:15 PM · Easy hack, Indexer
krithikvaidya added a comment to T2258: Add type annotations to indexer classes.

And does Jenkins also run code in non-master branches through the pipeline?

Feb 4 2020, 3:10 AM · Easy hack, Indexer

Feb 3 2020

ardumont added a comment to T2258: Add type annotations to indexer classes.

and apologies for the delay.

Feb 3 2020, 10:41 AM · Easy hack, Indexer

Feb 2 2020

krithikvaidya added a comment to T2258: Add type annotations to indexer classes.

Thanks for the detailed reply 🙂, and apologies for the delay. Things are clearer now 👍

Feb 2 2020, 4:09 AM · Easy hack, Indexer

Feb 1 2020

twentyse7en added a comment to T2259: Add type annotations to metadata mappings.

hey, I would like to contribute.

Feb 1 2020, 6:49 PM · Easy hack, Indexer
ardumont added a comment to T2258: Add type annotations to indexer classes.

The pytest tests are succeeding in the swh-indexer module, but failing in some other modules. Since this issue pertains to only the swh-indexer module, it shouldn't cause problems, right?

Feb 1 2020, 10:42 AM · Easy hack, Indexer
krithikvaidya added a comment to T2258: Add type annotations to indexer classes.

Hi, I'd like to take up this issue as my first issue here :) . But before I take it up, I just had a few queries:

Feb 1 2020, 9:58 AM · Easy hack, Indexer

Jan 29 2020

vlorentz removed a project from T2257: Fully annotate swh-indexer with types: Roadmap 2020.
Jan 29 2020, 5:06 PM · Indexer
vlorentz removed a project from T2258: Add type annotations to indexer classes: Roadmap 2020.
Jan 29 2020, 5:06 PM · Easy hack, Indexer
vlorentz removed a project from T2259: Add type annotations to metadata mappings: Roadmap 2020.
Jan 29 2020, 5:06 PM · Easy hack, Indexer
vlorentz added a project to T2259: Add type annotations to metadata mappings: Easy hack.
Jan 29 2020, 3:35 PM · Easy hack, Indexer
vlorentz triaged T2259: Add type annotations to metadata mappings as Low priority.
Jan 29 2020, 3:35 PM · Easy hack, Indexer
vlorentz triaged T2258: Add type annotations to indexer classes as Low priority.
Jan 29 2020, 3:34 PM · Easy hack, Indexer
vlorentz renamed T2257: Fully annotate swh-indexer with types from Fully annotate swh-index with types to Fully annotate swh-indexer with types.
Jan 29 2020, 3:32 PM · Indexer
vlorentz triaged T2257: Fully annotate swh-indexer with types as Low priority.
Jan 29 2020, 3:31 PM · Indexer

Jan 27 2020

vlorentz updated the task description for T1475: Test more edge cases of metadata indexer mappings.
Jan 27 2020, 4:41 PM · Easy hack, Indexer
vlorentz renamed T1475: Test more edge cases of metadata indexer mappings from Add more tests for edge cases of indexer mappings to Test more edge cases of metadata indexer mappings.
Jan 27 2020, 4:39 PM · Easy hack, Indexer
vlorentz added a project to T1475: Test more edge cases of metadata indexer mappings: Easy hack.
Jan 27 2020, 4:35 PM · Easy hack, Indexer

Jan 23 2020

ardumont added a comment to T1788: indexer-license: Investigate timeouts.

sample with our shiny sentry: https://sentry.softwareheritage.org/share/issue/f4a40625783b4a5588980005ddc5a5e6/

Jan 23 2020, 9:05 AM · Indexer

Jan 22 2020

vlorentz placed T1475: Test more edge cases of metadata indexer mappings up for grabs.
Jan 22 2020, 3:36 PM · Easy hack, Indexer

Jan 13 2020

vlorentz closed T2144: Define an architecture for end-to-end monitoring/testing, a subtask of T2127: Standalone Indexer Testing, as Resolved.
Jan 13 2020, 3:23 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)

Dec 20 2019

vlorentz added a subtask for T2127: Standalone Indexer Testing: T2144: Define an architecture for end-to-end monitoring/testing.
Dec 20 2019, 3:09 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)

Dec 3 2019

vlorentz lowered the priority of T2127: Standalone Indexer Testing from High to Normal.
Dec 3 2019, 5:50 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)
vlorentz triaged T2127: Standalone Indexer Testing as High priority.
Dec 3 2019, 3:17 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)

Dec 2 2019

olasd created T2127: Standalone Indexer Testing.
Dec 2 2019, 2:26 PM · Indexer, Sprint 2019/12 (Monitor and Conquer)

Nov 22 2019

vlorentz placed T1464: Auto-detect indexer tool versions instead of reading them from the config up for grabs.
Nov 22 2019, 10:54 AM · Indexer
vlorentz closed T861: mimetype indexer: edge case makes the indexer fail miserably as Resolved.

Fixed by D896.

Nov 22 2019, 10:54 AM · Indexer
vlorentz closed T861: mimetype indexer: edge case makes the indexer fail miserably, a subtask of T713: Index existing contents (mimetype, language, license), as Resolved.
Nov 22 2019, 10:54 AM · Indexer

Nov 14 2019

vlorentz added a comment to T1513: The indexer journal client is unstable.

Does it still happen? The journal client changed a lot since this task was open, including switching backend library.

Nov 14 2019, 12:27 PM · Indexer

Nov 8 2019

vlorentz closed T2060: Many rows in origin_intrinsic_metadata still do not have an origin_url as Resolved.

Fixed by @olasd

Nov 8 2019, 5:32 PM · Indexer

Nov 5 2019

vlorentz added a project to T2060: Many rows in origin_intrinsic_metadata still do not have an origin_url: Indexer.
Nov 5 2019, 1:59 PM · Indexer

Sep 30 2019

ardumont added a comment to T1788: indexer-license: Investigate timeouts.

That's the postgresql statement_timeout variable that we set for some methods on the storage backends.

Sep 30 2019, 5:47 PM · Indexer
olasd added a comment to T1788: indexer-license: Investigate timeouts.

We can investigate 2 things:

  • check postgresql options to kill queries that takes too long (solely indexer-db right now) -> and find some way to report those
Sep 30 2019, 1:53 PM · Indexer

Sep 27 2019

ardumont added a comment to T1788: indexer-license: Investigate timeouts.

We can investigate 2 things:

  • check postgresql options to kill queries that takes too long (solely indexer-db right now) -> and find some way to report those
  • push the proxy client storage idea (started within the T1389 for the storage, currently wip) up to the indexer-storage.
Sep 27 2019, 9:53 AM · Indexer

Sep 24 2019

ardumont updated subscribers of T1788: indexer-license: Investigate timeouts.

Note: ... comments stack pop ... (-> been there a while apparently)

Sep 24 2019, 4:25 PM · Indexer

Sep 6 2019

ardumont added a comment to P520 [fixed] broken ci: index failure status.

In the end, it was missing initialization data steps.

Sep 6 2019, 9:43 AM · Indexer
ardumont updated the title for P520 [fixed] broken ci: index failure status from wip: broken ci: index failure status to [fixed] broken ci: index failure status.
Sep 6 2019, 9:42 AM · Indexer

Sep 5 2019

ardumont updated the title for P520 [fixed] broken ci: index failure status from failure in indexer to wip: broken ci: index failure status.
Sep 5 2019, 3:01 PM · Indexer
ardumont added a comment to P520 [fixed] broken ci: index failure status.

current status:

Sep 5 2019, 3:01 PM · Indexer
ardumont created P520 [fixed] broken ci: index failure status.
Sep 5 2019, 2:55 PM · Indexer

Jul 10 2019

zack added a project to T1485: Show stats on extracted metadata: Web app.
Jul 10 2019, 8:13 PM · Web app, Metadata workflow, Indexer
vlorentz updated the task description for T1485: Show stats on extracted metadata.
Jul 10 2019, 7:12 PM · Web app, Metadata workflow, Indexer
vlorentz placed T1485: Show stats on extracted metadata up for grabs.
Jul 10 2019, 5:00 PM · Web app, Metadata workflow, Indexer

Jul 9 2019

twitu added a comment to T1433: Refactor output of indexer storage's `get` methods..

I went through all the tests in test_storage.py. It appears that only content_fossology_license_get needs to be refactored. All other storage methods return a dictionary or a list of dictionaries, where each dictionary has multiple keys.

Jul 9 2019, 5:28 AM · Easy hack, Indexer

Jul 8 2019

vlorentz updated the task description for T1433: Refactor output of indexer storage's `get` methods..
Jul 8 2019, 1:25 PM · Easy hack, Indexer

Jul 7 2019

twitu added a comment to T1433: Refactor output of indexer storage's `get` methods..

I am familiar with the web APIs and I went through the discussion in T782. When you say output a single dictionary, I believe you mean something like this

{
  sha1: [
    {tool: TOOL, licenses: [licences]},
    {tool: TOOL, licenses: [licences]}
  ],
Jul 7 2019, 4:23 PM · Easy hack, Indexer

Jul 3 2019

ardumont closed D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version.
Jul 3 2019, 6:09 PM · Journal, Indexer
ardumont updated the diff for D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version.
  • swh-indexer-journal-client: Update cli to latest version
  • swh-indexer-journal-client: Fix configuration change
Jul 3 2019, 6:09 PM · Journal, Indexer
vlorentz accepted D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version.
Jul 3 2019, 4:44 PM · Journal, Indexer
ardumont retitled D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version from swh-indexer-journal-client: Adapt configuration to swh-indexer-journal-client: Adapt systemd and configuration according to latest version.
Jul 3 2019, 4:38 PM · Journal, Indexer
ardumont added projects to D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version: Indexer, Journal.
Jul 3 2019, 4:34 PM · Journal, Indexer
ardumont placed T1386: Refactor indexers' initialization step up for grabs.
Jul 3 2019, 3:26 PM · Indexer, Scheduling utilities

Jun 25 2019

twitu closed T1527: Have comments on all columns of all databases as Resolved.
Jun 25 2019, 6:25 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 24 2019

twitu updated the task description for T1527: Have comments on all columns of all databases.
Jun 24 2019, 6:28 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 20 2019

twitu added a comment to T1527: Have comments on all columns of all databases.

D1582 has been pushed the task can be closed

Jun 20 2019, 10:26 AM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 18 2019

vlorentz placed T1528: Efficient reindex when adding a metadata mapping up for grabs.
Jun 18 2019, 1:27 PM · Indexer

Jun 15 2019

twitu added a comment to T1527: Have comments on all columns of all databases.

All columns commented in swh-scheduler, waiting review.
Some columns for swh-storage required a small discussion to frame appropriate comments.

Jun 15 2019, 5:22 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 14 2019

ardumont updated the task description for T1527: Have comments on all columns of all databases.
Jun 14 2019, 5:40 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

All columns are already commented in swh-indexer

Jun 14 2019, 5:18 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

Have added a few comments in D1582

Jun 14 2019, 8:30 AM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 13 2019

ardumont added a comment to T1527: Have comments on all columns of all databases.

The latest upgrade is 136.sql while the version in 30-swh-schema.sql is 133. Should I name the next upgrade 137?

Jun 13 2019, 6:59 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

there seems to be an inconsistency between sql/upgrades and latest sql version in swh-storage. The latest upgrade is 136.sql while the version in 30-swh-schema.sql is 133. Should I name the next upgrade 137?

Jun 13 2019, 6:54 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
ardumont added a project to T1527: Have comments on all columns of all databases: Easy hack.
Jun 13 2019, 12:33 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
ardumont updated the task description for T1527: Have comments on all columns of all databases.
Jun 13 2019, 12:31 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

is there anything left to be done to close the task?

Jun 13 2019, 12:09 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 12 2019

twitu added a comment to T1527: Have comments on all columns of all databases.

modules swh-scheduler, swh-indexer, swh-storage, all seem to have column comments written in 30-swh-schema.sql

Jun 12 2019, 7:44 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

Can you provide a few more details so I can work on this? Maybe which packages will be affected and what is expected in the comments.

Jun 12 2019, 6:21 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 7 2019

ardumont added a comment to T1788: indexer-license: Investigate timeouts.

In the mean time, i've stopped those indexers as this impacts other (i see transactions piling-up).

Jun 7 2019, 10:34 AM · Indexer
ardumont triaged T1788: indexer-license: Investigate timeouts as Normal priority.
Jun 7 2019, 10:27 AM · Indexer

May 25 2019

zack renamed T1475: Test more edge cases of metadata indexer mappings from Add more tests for edge cases of indexer mappings. to Add more tests for edge cases of indexer mappings.
May 25 2019, 5:31 PM · Easy hack, Indexer
zack added a project to T1527: Have comments on all columns of all databases: Documentation.
May 25 2019, 5:30 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

May 22 2019

vlorentz added a revision to T1513: The indexer journal client is unstable: D1501: Factorize StorageReplayer and JournalClient..
May 22 2019, 2:26 PM · Indexer

Apr 24 2019

vlorentz closed T1691: metadata indexer: investigate metadata entries with empty mappings as Resolved.

We should investigate why they are there.

Apr 24 2019, 5:22 PM · Archive content, Indexer
vlorentz closed T1691: metadata indexer: investigate metadata entries with empty mappings, a subtask of T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata, as Resolved.
Apr 24 2019, 5:22 PM · Archive content, Indexer
zack renamed T1691: metadata indexer: investigate metadata entries with empty mappings from metadata indexer: investigate empty mappings to metadata indexer: investigate metadata entries with empty mappings.
Apr 24 2019, 5:21 PM · Archive content, Indexer
zack triaged T1691: metadata indexer: investigate metadata entries with empty mappings as Normal priority.
Apr 24 2019, 5:20 PM · Archive content, Indexer
zack closed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata as Resolved.

This is now done, aside from a minor issue noted below:

softwareheritage-indexer=# select count(*) from revision_intrinsic_metadata where metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb;
 count 
-------
     0
(1 row)
Apr 24 2019, 5:18 PM · Archive content, Indexer

Apr 19 2019

vlorentz triaged T1681: Use project metadata as a "lister" as Low priority.
Apr 19 2019, 11:03 PM · Archive coverage, Indexer, Metadata workflow

Apr 3 2019

vlorentz updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 11:14 AM · Archive content, Indexer
vlorentz updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 10:41 AM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 9:52 AM · Archive content, Indexer

Apr 2 2019

zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:41 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:40 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:37 PM · Archive content, Indexer