Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 5 2019

vlorentz added a comment to T1517: Metadata search is too slow.

Okay so the issue was that the fulltext search wasn't actually using indexes. So lintime instead of logtime :/

Feb 5 2019, 3:53 PM · Metadata workflow, Indexer
vlorentz added a comment to T1517: Metadata search is too slow.
In T1517#27866, @zack wrote:

how is that "understandable"? :-)

Feb 5 2019, 3:49 PM · Metadata workflow, Indexer
zack added a project to T1517: Metadata search is too slow: Metadata workflow.

how is that "understandable"? :-)

Feb 5 2019, 3:25 PM · Metadata workflow, Indexer
vlorentz triaged T1517: Metadata search is too slow as High priority.
Feb 5 2019, 3:10 PM · Metadata workflow, Indexer
ardumont added a comment to T1513: The indexer journal client is unstable.

What's the python3-kafka version?

Feb 5 2019, 2:08 PM · Indexer
douardda triaged T1513: The indexer journal client is unstable as High priority.
Feb 5 2019, 9:43 AM · Indexer

Feb 4 2019

vlorentz added revisions to T1475: Test more edge cases of metadata indexer mappings: D1071: Catch encoding errors when parsing pom.xml., D1070: Prevent name clash when a metadata file has a key named 'dict'..
Feb 4 2019, 2:09 PM · Easy hack, Indexer

Feb 2 2019

zack added a comment to T1511: index project licenses using GitHub's license detector.

P353 is a quick and dirty wrapper to play with licensee locally after git clone

Feb 2 2019, 1:36 PM · Indexer
zack triaged T1511: index project licenses using GitHub's license detector as Normal priority.
Feb 2 2019, 1:34 PM · Indexer

Jan 31 2019

vlorentz closed T1385: Monitor output of metadata indexers as Resolved.
Jan 31 2019, 1:16 PM · Indexer
vlorentz added a comment to T1385: Monitor output of metadata indexers.

The only remaining crashes are caused by T1507.

Jan 31 2019, 1:16 PM · Indexer
vlorentz closed T1488: Tune metadata indexer workers parallelism as Resolved.

Bottleneck: T1507

Jan 31 2019, 1:15 PM · System administration, Indexer
vlorentz lowered the priority of T1485: Show stats on extracted metadata from Normal to Low.
Jan 31 2019, 1:14 PM · Web app, Metadata workflow, Indexer

Jan 30 2019

vlorentz claimed T1485: Show stats on extracted metadata.
Jan 30 2019, 4:14 PM · Web app, Metadata workflow, Indexer
vlorentz added a revision to T1484: Provide stats on extracted metadata in the indexer storage api: D1040: Provide stats on indexed metadata per origin..
Jan 30 2019, 4:14 PM · Metadata workflow, Metrics/monitoring, Indexer

Jan 29 2019

vlorentz closed T1394: Make swh/indexer/tests/test_origin_metadata.py run faster. as Resolved.

Resolved by D987.

Jan 29 2019, 12:21 PM · Indexer
vlorentz added a revision to T1394: Make swh/indexer/tests/test_origin_metadata.py run faster.: D987: Drop scheduling logic from the metadata indexers..
Jan 29 2019, 10:59 AM · Indexer
vlorentz closed T1483: Store the type of intrinsic metadata that were extracted, a subtask of T1484: Provide stats on extracted metadata in the indexer storage api, as Resolved.
Jan 29 2019, 10:58 AM · Metadata workflow, Metrics/monitoring, Indexer
vlorentz closed T1483: Store the type of intrinsic metadata that were extracted as Resolved.

Resolved by D1010.

Jan 29 2019, 10:58 AM · Metadata workflow, Indexer
vlorentz closed T1474: Mapping warnings should log the content/revision id, not the content itself as Resolved.

Resolved by D998.

Jan 29 2019, 10:57 AM · Indexer
vlorentz closed T1328: Add Ruby/Gem metadata indexer as Resolved.

Resolved by D956.

Jan 29 2019, 10:57 AM · Restricted Project, Indexer

Jan 25 2019

vlorentz added a revision to T1483: Store the type of intrinsic metadata that were extracted: D1010: Make metadata indexers store the mappings used to translate metadata..
Jan 25 2019, 3:38 PM · Metadata workflow, Indexer
vlorentz added revisions to T1386: Refactor indexers' initialization step: D990: First step towards a simplification of the indexer's code, D993: Simplify a bit unit tests.
Jan 25 2019, 10:22 AM · Indexer, Scheduling utilities

Jan 24 2019

vlorentz added a revision to T1474: Mapping warnings should log the content/revision id, not the content itself: D998: Make metadata translation warnings reference revisions/content ids instead of dumping the content itself..
Jan 24 2019, 4:35 PM · Indexer
vlorentz merged T1403: Document architecture of metadata mappings. into T1482: Document supported intrinsic metadata.
Jan 24 2019, 2:49 PM · Metadata workflow, Documentation, Indexer
vlorentz merged task T1403: Document architecture of metadata mappings. into T1482: Document supported intrinsic metadata.
Jan 24 2019, 2:49 PM · Documentation, Indexer
vlorentz closed T1482: Document supported intrinsic metadata as Resolved.

Resolved by D994.

Jan 24 2019, 1:33 PM · Metadata workflow, Documentation, Indexer
vlorentz added a revision to T1482: Document supported intrinsic metadata: D994: Document the metadata translation process and list supported metadata sources..
Jan 24 2019, 11:39 AM · Metadata workflow, Documentation, Indexer
vlorentz closed T1327: Add Python metadata indexer as Resolved.

Resolved by D879.

Jan 24 2019, 11:23 AM · Restricted Project, Indexer

Jan 21 2019

vlorentz triaged T1488: Tune metadata indexer workers parallelism as Normal priority.
Jan 21 2019, 5:05 PM · System administration, Indexer
vlorentz added a revision to T1328: Add Ruby/Gem metadata indexer: D957: Factorize list merges in indexer mappings.
Jan 21 2019, 3:54 PM · Restricted Project, Indexer
vlorentz added a revision to T1328: Add Ruby/Gem metadata indexer: D956: Add gemspec mapping..
Jan 21 2019, 3:53 PM · Restricted Project, Indexer
zack added a comment to T1483: Store the type of intrinsic metadata that were extracted.

You are correct, except I will store mapping names, not file names (eg. because gemspec files are usually named project_name.gemspec, which is harder to query).

Jan 21 2019, 3:43 PM · Metadata workflow, Indexer
vlorentz added a comment to T1483: Store the type of intrinsic metadata that were extracted.

You are correct, except I will store mapping names, not file names (eg. because gemspec files are usually named project_name.gemspec, which is harder to query).

Jan 21 2019, 3:40 PM · Metadata workflow, Indexer
zack added a comment to T1483: Store the type of intrinsic metadata that were extracted.

(2) seems the best option to me.

Jan 21 2019, 3:16 PM · Metadata workflow, Indexer
vlorentz closed T1384: Document indexer architecture / metadata pipeline as Resolved.

Resolved by D747.

Jan 21 2019, 2:49 PM · Indexer, Documentation
vlorentz updated subscribers of T1483: Store the type of intrinsic metadata that were extracted.

As some revisions/origins may have more than one metadata file (in which case we merge them), there is a m2m relation between revision/origin metadata rows and mappings. I see three ways to do it:

Jan 21 2019, 2:23 PM · Metadata workflow, Indexer
vlorentz added revisions to T1475: Test more edge cases of metadata indexer mappings: D965: Add more type checks to sanitize Mappings' input., D971: Fix parsing of the Description field in PKG-INFO..
Jan 21 2019, 12:38 PM · Easy hack, Indexer
vlorentz renamed T1475: Test more edge cases of metadata indexer mappings from Add more test for edge cases of indexer mappings. to Add more tests for edge cases of indexer mappings..
Jan 21 2019, 12:37 PM · Easy hack, Indexer
vlorentz claimed T1484: Provide stats on extracted metadata in the indexer storage api.
Jan 21 2019, 12:32 PM · Metadata workflow, Metrics/monitoring, Indexer
zack added a project to T1485: Show stats on extracted metadata: Metadata workflow.
Jan 21 2019, 11:45 AM · Web app, Metadata workflow, Indexer
zack added a project to T1484: Provide stats on extracted metadata in the indexer storage api: Metadata workflow.
Jan 21 2019, 11:45 AM · Metadata workflow, Metrics/monitoring, Indexer
zack added a project to T1483: Store the type of intrinsic metadata that were extracted: Metadata workflow.
Jan 21 2019, 11:45 AM · Metadata workflow, Indexer
zack added a project to T1482: Document supported intrinsic metadata: Metadata workflow.
Jan 21 2019, 11:45 AM · Metadata workflow, Documentation, Indexer
vlorentz added a subtask for T1485: Show stats on extracted metadata: T1484: Provide stats on extracted metadata in the indexer storage api.
Jan 21 2019, 11:42 AM · Web app, Metadata workflow, Indexer
vlorentz added a parent task for T1484: Provide stats on extracted metadata in the indexer storage api: T1485: Show stats on extracted metadata.
Jan 21 2019, 11:42 AM · Metadata workflow, Metrics/monitoring, Indexer
vlorentz triaged T1485: Show stats on extracted metadata as Normal priority.
Jan 21 2019, 11:42 AM · Web app, Metadata workflow, Indexer
vlorentz renamed T1484: Provide stats on extracted metadata in the indexer storage api from Provide stats on extracted metadata to Provide stats on extracted metadata in the indexer storage api.
Jan 21 2019, 11:41 AM · Metadata workflow, Metrics/monitoring, Indexer
vlorentz renamed T1484: Provide stats on extracted metadata in the indexer storage api from Show stats on extracted metadata to Provide stats on extracted metadata.
Jan 21 2019, 11:40 AM · Metadata workflow, Metrics/monitoring, Indexer
vlorentz added a parent task for T1483: Store the type of intrinsic metadata that were extracted: T1484: Provide stats on extracted metadata in the indexer storage api.
Jan 21 2019, 11:39 AM · Metadata workflow, Indexer
vlorentz added a subtask for T1484: Provide stats on extracted metadata in the indexer storage api: T1483: Store the type of intrinsic metadata that were extracted.
Jan 21 2019, 11:39 AM · Metadata workflow, Metrics/monitoring, Indexer
vlorentz added a comment to T1484: Provide stats on extracted metadata in the indexer storage api.

Useful queries:

Jan 21 2019, 11:39 AM · Metadata workflow, Metrics/monitoring, Indexer
vlorentz triaged T1484: Provide stats on extracted metadata in the indexer storage api as Normal priority.
Jan 21 2019, 11:38 AM · Metadata workflow, Metrics/monitoring, Indexer
vlorentz triaged T1483: Store the type of intrinsic metadata that were extracted as Normal priority.
Jan 21 2019, 11:37 AM · Metadata workflow, Indexer
zack added a project to T1482: Document supported intrinsic metadata: Documentation.
Jan 21 2019, 11:24 AM · Metadata workflow, Documentation, Indexer
vlorentz added a project to T1482: Document supported intrinsic metadata: Indexer.
Jan 21 2019, 11:20 AM · Metadata workflow, Documentation, Indexer

Jan 16 2019

vlorentz added a comment to T1475: Test more edge cases of metadata indexer mappings.

" AttributeError on @id with a colon but less than two slashes" https://github.com/digitalbazaar/pyld/issues/91 -> T4436

Jan 16 2019, 6:06 PM · Easy hack, Indexer
vlorentz added revisions to T1475: Test more edge cases of metadata indexer mappings: D961: Maven mapping: fix crash on empty or invalid pom.xml., D960: Maven mapping: fix crash on minimal pom.xml..
Jan 16 2019, 12:08 PM · Easy hack, Indexer
vlorentz triaged T1475: Test more edge cases of metadata indexer mappings as Normal priority.
Jan 16 2019, 11:42 AM · Easy hack, Indexer
vlorentz claimed T1474: Mapping warnings should log the content/revision id, not the content itself.
Jan 16 2019, 11:09 AM · Indexer
vlorentz triaged T1474: Mapping warnings should log the content/revision id, not the content itself as Normal priority.
Jan 16 2019, 11:09 AM · Indexer

Jan 15 2019

vlorentz added revisions to T1385: Monitor output of metadata indexers: D953: Prevent repository normalization from crashing on malformed input., D950: Prevent license normalization from crashing on malformed input., D949: Better error message on content retrieval error + don't write translated_metadata=None to the idx storage., D951: Fix normalization of npm's 'repository' and 'bugs' fields..
Jan 15 2019, 3:57 PM · Indexer
ardumont closed T439: Indexers: compute (and maintain up-to-date) the filetype of all blobs as Resolved.
Jan 15 2019, 2:47 PM · Indexer
ardumont added a comment to T1385: Monitor output of metadata indexers.

I'd say using grafana but first we need to determine what are the sensible metrics ;)

Jan 15 2019, 2:45 PM · Indexer
ardumont closed T359: Indexers: batch content analyzer infrastructure as Resolved.

We need to rework the current indexer implementation to use range instead (T991).
After that, we can schedule 256 ranges of contents to index using the scheduler stack instead.
And see where that goes.

Jan 15 2019, 2:44 PM · Indexer, General
ardumont closed T359: Indexers: batch content analyzer infrastructure, a subtask of T1385: Monitor output of metadata indexers, as Resolved.
Jan 15 2019, 2:44 PM · Indexer
ardumont closed T359: Indexers: batch content analyzer infrastructure, a subtask of T439: Indexers: compute (and maintain up-to-date) the filetype of all blobs, as Resolved.
Jan 15 2019, 2:44 PM · Indexer
ardumont closed T713: Index existing contents (mimetype, language, license) as Resolved.
Jan 15 2019, 2:09 PM · Indexer
douardda added a comment to T1394: Make swh/indexer/tests/test_origin_metadata.py run faster..

Tests should be refactored to use celery's pytest fixtures, see D948

Jan 15 2019, 11:42 AM · Indexer

Jan 14 2019

ardumont closed T1466: Deploy swh-indexer-journal-client as Resolved.
Jan 14 2019, 11:33 AM · Journal, Indexer
ardumont closed T1466: Deploy swh-indexer-journal-client, a subtask of T1385: Monitor output of metadata indexers, as Resolved.
Jan 14 2019, 11:33 AM · Indexer
ardumont closed T1468: journal: Deploy the publisher, a subtask of T1466: Deploy swh-indexer-journal-client, as Resolved.
Jan 14 2019, 11:33 AM · Journal, Indexer

Jan 13 2019

ardumont added a comment to T1466: Deploy swh-indexer-journal-client.

heads up btw: It's deployed.

Jan 13 2019, 12:37 PM · Journal, Indexer
ardumont added a subtask for T1466: Deploy swh-indexer-journal-client: T1468: journal: Deploy the publisher.
Jan 13 2019, 12:36 PM · Journal, Indexer
ardumont closed T1277: swh-journal: Create a journal client for listing origin visits, a subtask of T1466: Deploy swh-indexer-journal-client, as Resolved.
Jan 13 2019, 12:31 PM · Journal, Indexer
ardumont closed T1277: swh-journal: Create a journal client for listing origin visits as Resolved.

D918 closes this.

Jan 13 2019, 12:31 PM · Indexer, Journal
ardumont added a parent task for T1277: swh-journal: Create a journal client for listing origin visits: T1466: Deploy swh-indexer-journal-client.
Jan 13 2019, 12:30 PM · Indexer, Journal
ardumont added a subtask for T1466: Deploy swh-indexer-journal-client: T1277: swh-journal: Create a journal client for listing origin visits.
Jan 13 2019, 12:30 PM · Journal, Indexer

Jan 11 2019

vlorentz lowered the priority of T861: mimetype indexer: edge case makes the indexer fail miserably from Normal to Low.
Jan 11 2019, 11:06 AM · Indexer

Jan 10 2019

vlorentz added a parent task for T1455: Add a journal client that schedules oneshot tasks for metadata indexers: T1466: Deploy swh-indexer-journal-client.
Jan 10 2019, 5:13 PM · Indexer
vlorentz added a subtask for T1466: Deploy swh-indexer-journal-client: T1455: Add a journal client that schedules oneshot tasks for metadata indexers.
Jan 10 2019, 5:13 PM · Journal, Indexer
vlorentz triaged T1466: Deploy swh-indexer-journal-client as High priority.
Jan 10 2019, 5:13 PM · Journal, Indexer
vlorentz closed T1465: Journal notification of full origin visits, a subtask of T1455: Add a journal client that schedules oneshot tasks for metadata indexers, as Resolved.
Jan 10 2019, 5:12 PM · Indexer
vlorentz closed T1455: Add a journal client that schedules oneshot tasks for metadata indexers as Resolved.
Jan 10 2019, 5:12 PM · Indexer
vlorentz closed T1455: Add a journal client that schedules oneshot tasks for metadata indexers, a subtask of T1385: Monitor output of metadata indexers, as Resolved.
Jan 10 2019, 5:12 PM · Indexer
vlorentz closed T1458: docker containers for metadata indexers, a subtask of T1455: Add a journal client that schedules oneshot tasks for metadata indexers, as Resolved.
Jan 10 2019, 3:28 PM · Indexer

Jan 9 2019

vlorentz removed a subtask for T861: mimetype indexer: edge case makes the indexer fail miserably: T1464: Auto-detect indexer tool versions instead of reading them from the config.
Jan 9 2019, 5:45 PM · Indexer
vlorentz removed a parent task for T1464: Auto-detect indexer tool versions instead of reading them from the config: T861: mimetype indexer: edge case makes the indexer fail miserably.
Jan 9 2019, 5:45 PM · Indexer
vlorentz lowered the priority of T1464: Auto-detect indexer tool versions instead of reading them from the config from Normal to Low.
Jan 9 2019, 5:45 PM · Indexer
vlorentz added a revision to T861: mimetype indexer: edge case makes the indexer fail miserably: D896: Use python-magic instead of file_magic..
Jan 9 2019, 3:18 PM · Indexer
ardumont updated the task description for T1455: Add a journal client that schedules oneshot tasks for metadata indexers.
Jan 9 2019, 1:16 PM · Indexer
ardumont updated the task description for T1455: Add a journal client that schedules oneshot tasks for metadata indexers.
Jan 9 2019, 1:16 PM · Indexer
vlorentz renamed T1455: Add a journal client that schedules oneshot tasks for metadata indexers from Add a journal client that schedules metadata indexers to Add a journal client that schedules oneshot tasks for metadata indexers.
Jan 9 2019, 12:15 PM · Indexer
vlorentz added a subtask for T1455: Add a journal client that schedules oneshot tasks for metadata indexers: T1458: docker containers for metadata indexers.
Jan 9 2019, 12:14 PM · Indexer
vlorentz renamed T1455: Add a journal client that schedules oneshot tasks for metadata indexers from Schedule recurrent jobs for metadata indexers to Add a journal client that schedules metadata indexers.
Jan 9 2019, 12:14 PM · Indexer
vlorentz added a comment to T1464: Auto-detect indexer tool versions instead of reading them from the config.

Summary of IRL chat with @zack and @ardumont : for now, we'll use pkg_resources for Python modules, and keep the configuration for other tools.

Jan 9 2019, 11:02 AM · Indexer
vlorentz closed T1456: Make metadata indexers support ranges as Invalid.

Invalid, no need for ranges when dealing with origins.

Jan 9 2019, 9:58 AM · Indexer
vlorentz closed T1456: Make metadata indexers support ranges, a subtask of T1455: Add a journal client that schedules oneshot tasks for metadata indexers, as Invalid.
Jan 9 2019, 9:57 AM · Indexer

Jan 8 2019

vlorentz added a comment to T1464: Auto-detect indexer tool versions instead of reading them from the config.

Proposal for Python packages:

Jan 8 2019, 5:03 PM · Indexer
vlorentz added a comment to T1464: Auto-detect indexer tool versions instead of reading them from the config.

any version self-declared by the tool in use

Jan 8 2019, 3:07 PM · Indexer