Page MenuHomeSoftware Heritage
Feed Advanced Search

Jul 3 2019

vlorentz accepted D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version.
Jul 3 2019, 4:44 PM · Journal, Indexer
ardumont retitled D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version from swh-indexer-journal-client: Adapt configuration to swh-indexer-journal-client: Adapt systemd and configuration according to latest version.
Jul 3 2019, 4:38 PM · Journal, Indexer
ardumont added projects to D1683: swh-indexer-journal-client: Adapt systemd and configuration according to latest version: Indexer, Journal.
Jul 3 2019, 4:34 PM · Journal, Indexer
ardumont placed T1386: Refactor indexers' initialization step up for grabs.
Jul 3 2019, 3:26 PM · Indexer, Scheduling utilities

Jun 25 2019

twitu closed T1527: Have comments on all columns of all databases as Resolved.
Jun 25 2019, 6:25 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 24 2019

twitu updated the task description for T1527: Have comments on all columns of all databases.
Jun 24 2019, 6:28 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 20 2019

twitu added a comment to T1527: Have comments on all columns of all databases.

D1582 has been pushed the task can be closed

Jun 20 2019, 10:26 AM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 18 2019

vlorentz placed T1528: Efficient reindex when adding a metadata mapping up for grabs.
Jun 18 2019, 1:27 PM · Indexer

Jun 15 2019

twitu added a comment to T1527: Have comments on all columns of all databases.

All columns commented in swh-scheduler, waiting review.
Some columns for swh-storage required a small discussion to frame appropriate comments.

Jun 15 2019, 5:22 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 14 2019

ardumont updated the task description for T1527: Have comments on all columns of all databases.
Jun 14 2019, 5:40 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

All columns are already commented in swh-indexer

Jun 14 2019, 5:18 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

Have added a few comments in D1582

Jun 14 2019, 8:30 AM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 13 2019

ardumont added a comment to T1527: Have comments on all columns of all databases.

The latest upgrade is 136.sql while the version in 30-swh-schema.sql is 133. Should I name the next upgrade 137?

Jun 13 2019, 6:59 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

there seems to be an inconsistency between sql/upgrades and latest sql version in swh-storage. The latest upgrade is 136.sql while the version in 30-swh-schema.sql is 133. Should I name the next upgrade 137?

Jun 13 2019, 6:54 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
ardumont added a project to T1527: Have comments on all columns of all databases: Easy hack.
Jun 13 2019, 12:33 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
ardumont updated the task description for T1527: Have comments on all columns of all databases.
Jun 13 2019, 12:31 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

is there anything left to be done to close the task?

Jun 13 2019, 12:09 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 12 2019

twitu added a comment to T1527: Have comments on all columns of all databases.

modules swh-scheduler, swh-indexer, swh-storage, all seem to have column comments written in 30-swh-schema.sql

Jun 12 2019, 7:44 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer
twitu added a comment to T1527: Have comments on all columns of all databases.

Can you provide a few more details so I can work on this? Maybe which packages will be affected and what is expected in the comments.

Jun 12 2019, 6:21 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Jun 7 2019

ardumont added a comment to T1788: indexer-license: Investigate timeouts.

In the mean time, i've stopped those indexers as this impacts other (i see transactions piling-up).

Jun 7 2019, 10:34 AM · Indexer
ardumont triaged T1788: indexer-license: Investigate timeouts as Normal priority.
Jun 7 2019, 10:27 AM · Indexer

May 25 2019

zack renamed T1475: Test more edge cases of metadata indexer mappings from Add more tests for edge cases of indexer mappings. to Add more tests for edge cases of indexer mappings.
May 25 2019, 5:31 PM · Easy hack, Indexer
zack added a project to T1527: Have comments on all columns of all databases: Documentation.
May 25 2019, 5:30 PM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

May 22 2019

vlorentz added a revision to T1513: The indexer journal client is unstable: D1501: Factorize StorageReplayer and JournalClient..
May 22 2019, 2:26 PM · Indexer

Apr 24 2019

vlorentz closed T1691: metadata indexer: investigate metadata entries with empty mappings as Resolved.

We should investigate why they are there.

Apr 24 2019, 5:22 PM · Archive content, Indexer
vlorentz closed T1691: metadata indexer: investigate metadata entries with empty mappings, a subtask of T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata, as Resolved.
Apr 24 2019, 5:22 PM · Archive content, Indexer
zack renamed T1691: metadata indexer: investigate metadata entries with empty mappings from metadata indexer: investigate empty mappings to metadata indexer: investigate metadata entries with empty mappings.
Apr 24 2019, 5:21 PM · Archive content, Indexer
zack triaged T1691: metadata indexer: investigate metadata entries with empty mappings as Normal priority.
Apr 24 2019, 5:20 PM · Archive content, Indexer
zack closed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata as Resolved.

This is now done, aside from a minor issue noted below:

softwareheritage-indexer=# select count(*) from revision_intrinsic_metadata where metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb;
 count 
-------
     0
(1 row)
Apr 24 2019, 5:18 PM · Archive content, Indexer

Apr 19 2019

vlorentz triaged T1681: Use project metadata as a "lister" as Low priority.
Apr 19 2019, 11:03 PM · Archive coverage, Indexer, Metadata workflow

Apr 3 2019

vlorentz updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 11:14 AM · Archive content, Indexer
vlorentz updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 10:41 AM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 3 2019, 9:52 AM · Archive content, Indexer

Apr 2 2019

zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:41 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:40 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:37 PM · Archive content, Indexer
zack updated the task description for T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Apr 2 2019, 4:37 PM · Archive content, Indexer

Mar 28 2019

vlorentz triaged T1614: Add pagination to full-text metadata search as Low priority.
Mar 28 2019, 2:43 PM · Web app, Indexer

Mar 20 2019

zack reassigned T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata from zack to vlorentz.
Mar 20 2019, 12:10 PM · Archive content, Indexer

Mar 15 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

@vlorentz: lather, rinse, repeat.

softwareheritage-indexer=# DELETE FROM revision_intrinsic_metadata
WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
ERROR:  deadlock detected
DETAIL:  Process 23900 waits for ShareLock on transaction 212164862; blocked by process 20175.
Process 20175 waits for ShareLock on transaction 212164381; blocked by process 23900.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (772424,55) in relation "revision_intrinsic_metadata"
Time: 33048,828 ms (00:33,049)

(just happened, after indexers have been restarted including D1218)

Mar 15 2019, 9:25 PM · Archive content, Indexer
vlorentz triaged T1585: Add support for extracting metadata from Python classifiers as Normal priority.
Mar 15 2019, 11:24 AM · Indexer, Metadata workflow

Mar 14 2019

ardumont closed T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same as Resolved.

That's better:

Mar 14 2019, 1:47 PM · Indexer
ardumont updated the task description for T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.
Mar 14 2019, 1:45 PM · Indexer
ardumont added a comment to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.

For the cleanup to actually happen fast, i deactivated the constraints
(done), executed the delete (done) and reinstalled the constraints (in
progress).

Mar 14 2019, 10:53 AM · Indexer

Mar 13 2019

haltode closed T1561: Fix heterogeneity of names in metadata tables as Resolved.

Fixed in 4f6ab3c9ab17.

Mar 13 2019, 1:28 PM · Indexer, Easy hack
ardumont added a comment to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.

the current state is the last delete query is still running (on the indexer configuration).

Mar 13 2019, 10:46 AM · Indexer

Mar 12 2019

vlorentz closed T1448: Use swh.model.hashutil.MultiHash in swh.indexer.tests.test_utils.fill_storage as Resolved by committing rDCIDX339033b63732: Use hashutil.MultiHash in swh.indexer.tests.test_utils.fill_storage.
Mar 12 2019, 10:21 AM · Easy hack, Indexer
haltode added a revision to T1448: Use swh.model.hashutil.MultiHash in swh.indexer.tests.test_utils.fill_storage: D1235: Use hashutil.MultiHash in swh.indexer.tests.test_utils.fill_storage.
Mar 12 2019, 7:39 AM · Easy hack, Indexer
haltode added a revision to T1561: Fix heterogeneity of names in metadata tables: D1226: Fix heterogeneity of names in metadata tables.
Mar 12 2019, 5:08 AM · Indexer, Easy hack

Mar 11 2019

vlorentz closed T1545: Add a CLI tool to list all fields that may be outputted by metadata_dictionary. as Resolved.
Mar 11 2019, 11:55 AM · Metadata workflow, Indexer

Mar 4 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

D1218

Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.

Mar 4 2019, 3:11 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.

Mar 4 2019, 12:51 PM · Archive content, Indexer

Mar 2 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

The update completed, but a first attempt at the second DELETE failed with a deadlock (?!):

softwareheritage-indexer=# DELETE FROM revision_metadata
WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;                                                                                            ERROR:  deadlock detected
DETAIL:  Process 10966 waits for ShareLock on transaction 197265813; blocked by process 11754.
Process 11754 waits for ShareLock on transaction 197264487; blocked by process 10966.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (1380733,15) in relation "revision_metadata"
Time: 170864,091 ms (02:50,864)
Mar 2 2019, 1:18 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

The following fix for the above (suggested by @vlorentz ) is now running:

update revision_metadata
set translated_metadata = origin_intrinsic_metadata.metadata
from origin_intrinsic_metadata
where revision_metadata.id=origin_intrinsic_metadata.from_revision and revision_metadata.translated_metadata='{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}' and origin_intrinsic_metadata.metadata != '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}';
Mar 2 2019, 9:53 AM · Archive content, Indexer

Mar 1 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

As discussed on IRC, even after cleaning up origin_intrinsic_metadata, the DELETE on revision_metadata fails with:

softwareheritage-indexer=# DELETE FROM revision_metadata
softwareheritage-indexer-# WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::json
b ;
Mar 1 2019, 4:49 PM · Archive content, Indexer
zack claimed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Mar 1 2019, 3:12 PM · Archive content, Indexer
zack changed the status of T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata from Open to Work in Progress.

I've started the first of following queries on somerset (in a screen of my user):

DELETE FROM origin_intrinsic_metadata
WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
Mar 1 2019, 2:51 PM · Archive content, Indexer
vlorentz triaged T1561: Fix heterogeneity of names in metadata tables as Low priority.
Mar 1 2019, 2:43 PM · Indexer, Easy hack

Feb 27 2019

vlorentz added a revision to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata: D1206: Prevent origin metadata indexer from writing empty records.
Feb 27 2019, 3:44 PM · Archive content, Indexer
vlorentz claimed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Feb 27 2019, 3:40 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

To know that we went over the origin and did not find anything, without additional tables.

Feb 27 2019, 11:44 AM · Archive content, Indexer

Feb 26 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

I don't understand your comment. What are the remaining arguments for using NULL instead of just deleting rows?

Feb 26 2019, 6:42 PM · Archive content, Indexer
vlorentz added a revision to T1545: Add a CLI tool to list all fields that may be outputted by metadata_dictionary.: D1200: Add a CLI tool to list all supported CodeMeta terms and document them..
Feb 26 2019, 5:57 PM · Metadata workflow, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

and the NULL option

Feb 26 2019, 5:42 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Yes :-)
so, do we agree that the right fix for this task is just to get rid of empty-ish rows? or are there other arguments that we haven't considered yet?

Feb 26 2019, 5:41 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Sounds like a solution for T1528 :)

Feb 26 2019, 4:12 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

What is the provenance map?

Feb 26 2019, 3:12 PM · Archive content, Indexer
vlorentz added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

What is the provenance map?

Feb 26 2019, 2:17 PM · Archive content, Indexer
zack added a project to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata: Archive content.

My tentative proposal is to delete all table entries for which no metadata has been found.
The invariant will be: if an origin/revision has metadata, there will be an entry in the table(s); if not, the origin/revision will not appear.

Feb 26 2019, 12:53 PM · Archive content, Indexer
vlorentz triaged T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata as Low priority.
Feb 26 2019, 11:49 AM · Archive content, Indexer
zack added a comment to T1545: Add a CLI tool to list all fields that may be outputted by metadata_dictionary..

(but of course if you want to have a CLI tool to generate the info, sure; I just wanted to highlight here that the end goal is the doc)

Feb 26 2019, 11:02 AM · Metadata workflow, Indexer
zack added a comment to T1545: Add a CLI tool to list all fields that may be outputted by metadata_dictionary..

More than a CLI tool, I'd like to have documentation about how to use the CodeMeta metadata that we extract, sort of "typing information" for the content of the various intrinsic metadata tables.
It might be something as simple as:

Feb 26 2019, 11:01 AM · Metadata workflow, Indexer
ardumont added a comment to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.

the current state is the last delete query is still running (on the indexer configuration).

Feb 26 2019, 10:04 AM · Indexer
vlorentz triaged T1545: Add a CLI tool to list all fields that may be outputted by metadata_dictionary. as Normal priority.
Feb 26 2019, 10:00 AM · Metadata workflow, Indexer
vlorentz added a comment to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.

s/clean up/deduplicate/

Feb 26 2019, 9:47 AM · Indexer
ardumont added a comment to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.

More like:

Feb 26 2019, 8:20 AM · Indexer

Feb 25 2019

ardumont changed the status of T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same from Open to Work in Progress.
Feb 25 2019, 4:54 PM · Indexer
ardumont updated the task description for T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.
Feb 25 2019, 4:53 PM · Indexer
ardumont added a comment to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.

sql script to migrate (in progress):

Feb 25 2019, 4:53 PM · Indexer
vlorentz closed T1529: Efficient reindex when updating a metadata mapping as Resolved.
Feb 25 2019, 1:48 PM · Indexer
ardumont added a comment to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.

No, only the context key, which doesn't make sense anymore (there's a mappings column in metadata tables).

Feb 25 2019, 11:14 AM · Indexer
vlorentz added a revision to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same: D1185: Drop the 'context' and 'type' keys in the config of metadata indexers..
Feb 25 2019, 10:41 AM · Indexer
vlorentz added a revision to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same: D1186: Drop the 'context' and 'type' config of metadata indexers from the puppet manifest..
Feb 25 2019, 10:41 AM · Indexer
vlorentz raised the priority of T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same from Normal to High.
Feb 25 2019, 10:15 AM · Indexer
vlorentz added a comment to T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same.

A priori, the solution would be to remove the context from the tool_configuration column (seen with @vlorentz).

Feb 25 2019, 10:15 AM · Indexer
ardumont triaged T1540: metadata-indexer: Configuration tool creating multiple different tools even though the same as Normal priority.
Feb 25 2019, 9:18 AM · Indexer

Feb 20 2019

vlorentz added a revision to T1529: Efficient reindex when updating a metadata mapping: D1165: Add a CLI tool to reindex origins based on mapping used..
Feb 20 2019, 5:17 PM · Indexer

Feb 18 2019

vlorentz added a revision to T1529: Efficient reindex when updating a metadata mapping: D1150: Add idx storage endpoint to search metadata by mapping..
Feb 18 2019, 2:27 PM · Indexer
vlorentz claimed T1528: Efficient reindex when adding a metadata mapping.
Feb 18 2019, 2:27 PM · Indexer
vlorentz claimed T1529: Efficient reindex when updating a metadata mapping.
Feb 18 2019, 2:27 PM · Indexer

Feb 14 2019

vlorentz added a comment to T1528: Efficient reindex when adding a metadata mapping.

Or, as the OriginMetadataIndexer already fetches this data anyway, it could write it to the indexer db when it's done.

Feb 14 2019, 12:01 PM · Indexer
vlorentz added a comment to T1529: Efficient reindex when updating a metadata mapping.

Should be easy to do, we already have this info in the indexer db (in origin_intrinsic_metadata.mappings). Then it's just a matter of creating oneshot tasks.

Feb 14 2019, 11:16 AM · Indexer
vlorentz added a comment to T1528: Efficient reindex when adding a metadata mapping.

The naive solution to do this is adding a new indexer that pre-fetches snapshot+revision+root dir of an origin and writes its list of root files in the indexer db. Then we can read that to find which origins have a given file name pattern.

Feb 14 2019, 11:15 AM · Indexer
vlorentz triaged T1529: Efficient reindex when updating a metadata mapping as Normal priority.
Feb 14 2019, 11:12 AM · Indexer
vlorentz triaged T1528: Efficient reindex when adding a metadata mapping as Low priority.
Feb 14 2019, 11:11 AM · Indexer
vlorentz triaged T1527: Have comments on all columns of all databases as Normal priority.
Feb 14 2019, 11:08 AM · Easy hack, Documentation, Storage manager, Scheduling utilities, Indexer

Feb 7 2019

vlorentz closed T1484: Provide stats on extracted metadata in the indexer storage api as Resolved.
Feb 7 2019, 3:50 PM · Metadata workflow, Metrics/monitoring, Indexer
vlorentz closed T1484: Provide stats on extracted metadata in the indexer storage api, a subtask of T1485: Show stats on extracted metadata, as Resolved.
Feb 7 2019, 3:50 PM · Web app, Metadata workflow, Indexer
vlorentz closed T1517: Metadata search is too slow as Resolved.
Feb 7 2019, 3:50 PM · Metadata workflow, Indexer

Feb 5 2019

vlorentz added a revision to T1517: Metadata search is too slow: D1082: Use the index when doing metadata search..
Feb 5 2019, 3:55 PM · Metadata workflow, Indexer